Missing value imputation methods for parameter estimates and psychometric properties of Likert measures
Abstract
Problem. Missing items are a common problem in Likert-type measures consisting of multiple questions. Despite frequent use of imputation methods for missing values, data about the performance of different methods on outcome measures are lacking. Purpose. To assess the performance of four imputation methods on item and scale level statistics and psychometric properties under different data conditions and the mechanism of missing completely at random (MCAR). Methods. This is a secondary data analysis using a dataset consisting of responses to the SF-36. The imputation methods under study include item mean substitution (IMS), person mean substitution (PMS), expectation-maximization algorithm (EM), and stochastic regression imputation (SRI). Missing data conditions include percentage of subjects with missing values (10%, 25% and 40%), percentage of missing items (10%, 20-33%, and 33-50%), sample size (200 and 500), and length of scales (2-, 4- and 10-items). After creation of each missing data condition under MCAR, imputation methods were applied and statistics from the imputed datasets were computed. Accuracy and bias of estimates for item level statistics (mean, SD and correlation), scale level statistics (mean, SD and correlation), and psychometric properties (coefficient alpha, goodness-of-fit statistics of confirmatory factor analysis, factor loadings and factor correlations) were compared and contrasted. One-way ANOVA and GLM were used to analyze data. The significant level was set at alpha ≤ .05. Results. There were differences in performance of imputation methods regardless of data conditions. IMS was consistently the worst in terms of accuracy and bias for all but two parameters. PMS was the second worst in parameter estimates. In contrast, EM and SRI produced more accurate estimates for most parameters considered. EM ranked the best for estimation of item mean and intercorrelations among items and scales while SRI ranked the best for item SD, alpha and goodness-of-fit statistics. The two methods were equivalent for other parameters. Further, their performance was less influenced by missing data conditions. In terms of reducing bias, SRI was better for most of the parameters than EM. Conclusion. Model-based approaches, SRI and EM, are preferred over IMS and PMS for imputing missing items in Likert type measures.Description
University of Maryland, Baltimore. Nursing. Ph.D. 2001Keyword
Biology, BiostatisticsHealth Sciences, Nursing
missing data imputation methods
Statistics as Topic--methods
Likert Scale