×

Multiple imputation of discrete and continuous data by fully conditional specification. (English) Zbl 1122.62382

Summary: The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data. To achieve that goal, imputed values should preserve the structure in the data, as well as the uncertainty about this structure, and include any knowledge about the process that generated the missing data. Two approaches for imputing multivariate data exist: joint modeling (JM) and fully conditional specification (FCS). JM is based on parametric statistical theory, and leads to imputation procedures whose statistical properties are known. JM is theoretically sound, but the joint model may lack flexibility needed to represent typical data features, potentially leading to bias. FCS is a semi-parametric and flexible alternative that specifies the multivariate model by a series of conditional models, one for each incomplete variable. FCS provides tremendous flexibility and is easy to apply, but its statistical properties are difficult to establish. Simulation work shows that FCS behaves very well in the cases studied. The present paper reviews and compares the approaches. JM and FCS were applied to pubertal development data of 3801 Dutch girls that had missing data on menarche (two categories), breast development (five categories) and pubic hair development (six stages). Imputations for these data were created under two models: a multivariate normal model with rounding and a conditionally specified discrete model. The JM approach introduced biases in the reference curves, whereas FCS did not. The paper concludes that FCS is a useful and easily applied flexible alternative to JM when no convenient and realistic joint distribution can be specified.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62N02 Estimation in survival analysis and censored data
65C60 Computational problems in statistics (MSC2010)

Software:

R; Stata; rms
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Rubin DB, Multiple imputation for nonresponse in surveys (1987)
[2] Rubin DB, Journal of the American Statistical Association 91 (434) pp 473– (1996)
[3] Collins LM, Psychological Methods 6 (3) pp 330– (2001)
[4] Scheuren F., American Statistician 59 (4) pp 315– (2005) · Zbl 05680661
[5] Dempster AP, Statistical Methodology 39 pp 1– (1977)
[6] Little Rja, Statistical analysis with missing data (2002)
[7] Schafer JL, Analysis of incomplete multivariate data (1997)
[8] Schafer JL, Statistical Methods in Medical Research 8 (1) pp 3– (1999)
[9] Stern HS, Psychological Methods 6 (3) pp 317– (2001)
[10] Allison PD, Missing data (2002)
[11] Schafer JL, Psychological Methods 7 (2) pp 147– (2002)
[12] Rubin DB, Statistics in Medicine 10 (4) pp 585– (1991)
[13] Barnard J., Methods in Medical Research 8 (1) pp 17– (1999)
[14] Greenland S., American Journal of Epidemiology 142 (12) pp 1255– (1995)
[15] Kmetic A., Epidemiology 13 (4) pp 437– (2002)
[16] Abraham WT, Psychiatry 17 (4) pp 315– (2004)
[17] Croy CD, Journal of the American Academy of Child and Adolescent Psychiatry 44 (12) pp 1230– (2005)
[18] Kneipp SM, Nursing Research 50 (6) pp 384– (2001)
[19] Patrician PA, Research in Nursing and Health 25 (1) pp 76– (2002)
[20] McCleary L., Nursing Research 51 (5) pp 339– (2002)
[21] Fox-Wasylyshyn SM, Research in Nursing and Health 28 (6) pp 488– (2005)
[22] Molenberghs G., Revue d’Epidemiologie et de Sante Publique 47 (6) pp 499– (1999)
[23] Zhou XH, Statistics in Medicine 20 (9) pp 1541– (2001)
[24] Raghunathan TE, Public Health 25 pp 99– (2004)
[25] Crawford SL, Clinical Epidemiology 48 (2) pp 209– (1995)
[26] Faris PD, Clinical Epidemiology 55 (2) pp 184– (2002)
[27] Oostenbrink JB, Health Economics 14 (8) pp 763– (2005)
[28] Catellier DJ, Med Sci Sports Exerc 37 (11) pp S555– (2005)
[29] Wood AM, International Journal of Epidemiology 34 (1) pp 89– (2005)
[30] Smits N., Journal of Educational Measurement 39 (3) pp 187– (2002)
[31] Peugh JL, Review of Educational Research 74 (4) pp 525– (2004)
[32] Walczak B., Systems 58 (1) pp 29– (2001) · Zbl 0986.35120
[33] Longford NT, Methods in Medical Research 10 (6) pp 429– (2001) · Zbl 1121.62636
[34] Olinsky A., Journal of Operational Research 151 (1) pp 53– (2003) · Zbl 1113.62361
[35] Allison PD, Journal of Abnormal Psychology 112 (4) pp 545– (2003)
[36] Twisk J., Journal of Clinical Epidemiology 55 (4) pp 329– (2002)
[37] Demirtas H., Journal of Modern Applied Statistical Methods 3 (2) pp 305– (2004)
[38] Streiner DL, Canadian Journal of Psychiatry 47 (1) pp 68– (2002)
[39] Kristman VL, European Journal of Epidemiology 20 (8) pp 657– (2005)
[40] Little R., Biometrics 52 (4) pp 1324– (1996) · Zbl 0925.62459
[41] Liu G., Journal of Biopharmaceutical Statistics 12 (2) pp 207– (2002)
[42] Houck PR, Psychiatry Research 129 (2) pp 209– (2004)
[43] Tang L., Statistics in Medicine 24 (14) pp 2111– (2005)
[44] Beunckens C., Trials 2 (5) pp 379– (2005)
[45] Barnes SA, Statistics in Medicine 25 (2) pp 233– (2006)
[46] Pigott TD, Evaluation and the Health Professions 24 (3) pp 277– (2001)
[47] Ibrahim JG, Journal of the American Statistical Association 100 (469) pp 332– (2005) · Zbl 1117.62360
[48] Schafer JL, Statistica Neerlandica 57 (1) pp 19– (2003) · Zbl 04575109
[49] Brand Jpl, Statistica Neerlandica 57 (1) pp 36– (2003) · Zbl 04575110
[50] Meng XL, Statistical Science 10 pp 538– (1995)
[51] Van Buuren S., Statistics in Medicine 18 (6) pp 681– (1999)
[52] Abayomi K., Diagnostics for multivariate imputations (2005) · Zbl 1273.62257
[53] Schenker N., Computational Statistics and Data Analysis 22 (4) pp 425– (1996) · Zbl 0875.62095
[54] Rubin DB, Journal of Business Economics and Statistics 4 pp 87– (1986)
[55] Little Rja., Journal of Business Economics and Statistics 6 pp 287– (1988)
[56] Harrell F., Regression modeling strategies, with applications to linear models, logistic regression, and survival analysis (2001) · Zbl 0982.62063
[57] Albert JH, Journal of the American Statistical Association 88 pp 669– (1993)
[58] Yucel RM, Journal of the American Statistical Association 100 (472) pp 1123– (2005) · Zbl 1117.62455
[59] Brand Jpl., Development, implementation and evaluation of multiple imputation strategies for the statistical analysis of incomplete data sets (1999)
[60] Raghunathan TE, Survey Methodology 27 pp 85– (2001)
[61] Parzen M., Biometrika 92 (4) pp 971– (2005) · Zbl 1310.62040
[62] Reilly M., Statistics in Medicine 16 (1) pp 5– (1997)
[63] Junninen H., Atmospheric Environment 38 (18) pp 2895– (2004)
[64] Paddock SM, Biometrika 89 (3) pp 529– (2002) · Zbl 1036.62002
[65] Heckman JJ, Annals of Economic and Social Measurement 5 pp 475– (1976)
[66] Greenlees WS, Journal of the American Statistical Association 77 pp 251– (1983)
[67] Wei Gcg, Biometrics 47 (4) pp 1297– (1991)
[68] Pan W., Analysis 7 (2) pp 111– (2001)
[69] Goetghebeur E., Biometrics 56 (4) pp 1139– (2000) · Zbl 1060.62616
[70] Pan W., Biometrics 56 (1) pp 199– (2000) · Zbl 1060.62649
[71] Bechger TM, Genetics 32 (2) pp 145– (2002)
[72] Hopke PK, Biometrics 57 (1) pp 22– (2001) · Zbl 1209.62359
[73] Lubin JH, Environmental Health Perspectives 112 (17) pp 1691– (2004)
[74] Fridley B., BMC Genetics 4 (1) pp S42– (2003)
[75] Heeringa SG, Multivariate imputation of coarsened survey data on household wealth (2002)
[76] Rubin DB, 1990 Proceedings of the Statistical Computing Section
[77] Rubin DB, Statistica Neerlandica 57 (1) pp 3– (2003) · Zbl 04575108
[78] Van Buuren S., Journal of Statistical Computation and Simulation 76 (12) pp 1049– (2006) · Zbl 1144.62332
[79] Horton NJ, American Statistician 55 pp 244– (2001) · Zbl 05680456
[80] Kennickell AB, Proceedings of the Section on Survey Research Methods pp 1– (1991)
[81] Heckerman D., Journal of Machine Learning Research 1 pp 49– (2001)
[82] Gelman A., Journal of the American Statistical Association 99 (466) pp 537– (2004) · Zbl 1117.62343
[83] Van Buuren S, Life (2000)
[84] Arnold BC, Conditional specification of statistical models (1999)
[85] Goodman LA, Journal of the American Statistical Association 65 pp 226– (1970)
[86] Besag J., Statistical Methodology 36 pp 192– (1974) · Zbl 0311.60028
[87] Arnold BC, Journal of the American Statistical Association 84 pp 152– (1989)
[88] Gelman A., Statistical Methodology 55 pp 185– (1993)
[89] Gelman A., Statistical Science 7 pp 457– (1991) · Zbl 1386.65060
[90] Royston P., The Stata Journal 4 pp 227– (2004)
[91] Royston P., Stata Journal 5 pp 527– (2005)
[92] Fredriks MA, Pediatric Research 47 pp 316– (2000)
[93] Marshall WA, Archives of Diseases in Childhood 44 pp 291– (1969)
[94] Mul D., Pediatric Research 50 pp 479– (2001)
[95] Little Rja., Journal of the American Statistical Association 87 pp 1227– (1992)
[96] McCullagh P, Generalized linear models (1989) · Zbl 0588.62104
[97] Venables WN, Modern applied statistics with S (2002)
[98] Hastie TJ, Generalized additive models (1990)
[99] Horton NJ, American Statistician 57 (4) pp 229– (2003) · Zbl 1182.62002
[100] Ake CF, Proceedings pp 112– (2005)
[101] Allison PD, SUGI 30 Proceedings pp 113– (2005)
[102] Belin TR, Statistics in Medicine 18 pp 3123– (1999)
[103] Gelman A., Statistical Science 16 pp 249– (2001) · Zbl 1059.62511
[104] Briggs A., Health Economics 12 pp 377– (2003)
[105] Chen L., Journal of Modern Applied Statistical Methods 4 (1) pp 288– (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.