×

Combining multiple imputation and inverse-probability weighting. (English) Zbl 1241.62009

Summary: Two approaches commonly used to deal with missing data are multiple imputation (MI) and inverse-probability weighting (IPW). IPW is also used to adjust for unequal sampling fractions. MI is generally more efficient than IPW but more complex. Whereas IPW requires only a model for the probability that an individual has complete data (a univariate outcome), MI needs a model for the joint distribution of the missing data (a multivariate outcome) given the observed data. Inadequacies in either model may lead to important bias if large amounts of data are missing. A third approach combines MI and IPW to give a doubly robust estimator. A fourth approach (IPW/MI) combines MI and IPW but, unlike doubly robust methods, imputes only isolated missing values and uses weights to account for remaining larger blocks of unimputed missing data, such as would arise, e.g., in a cohort study subject to sample attrition, and/or unequal sampling fractions.
We examine the performance, in terms of bias and efficiency, of IPW/MI relative to MI and IPW alone and investigate whether D.B. Rubin’s [Multiple imputation for nonresponse in surveys. Reprint of the 1987 original. Hoboken, NJ: Wiley (2004; Zbl 1070.62007)] rules variance estimator is valid for IPW/MI. We prove that the Rubin’s rules variance estimator is valid for IPW/MI for linear regression with an imputed outcome, present simulations supporting the use of this variance estimator in more general settings, and demonstrate that IPW/MI can have advantages over alternatives. IPW/MI is applied to data from the National Child Development Study.

MSC:

62D05 Sampling theory, sample surveys
62J05 Linear regression; mixed models
62P10 Applications of statistics to biology and medical sciences; meta analysis
65C60 Computational problems in statistics (MSC2010)

Citations:

Zbl 1070.62007

Software:

ice
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Atherton, Loss and representativeness in a biomedical survey at age 45 years: 1958 British Birth Cohort, Journal of Epidemiology and Community Health 62 pp 216– (2008) · doi:10.1136/jech.2006.058966
[2] Caldwell, Lifecourse socioeconomic predictors of midlife drinking patterns, problems and abstention: Findings from the 1958 British Birth Cohort study, Drug and Alcohol Dependence 95 pp 269– (2008) · doi:10.1016/j.drugalcdep.2008.01.014
[3] Bayesian Data Analysis (2004) · Zbl 1039.62018
[4] Goldstein, Handling attrition and non-response in longitudinal data, Longitudinal and Life Course Studies 1 pp 63– (2009)
[5] Höfler, The use of weights to account for non-response and drop-out, Social Psychiatry and Psychiatric Epidemiology 40 pp 291– (2005) · doi:10.1007/s00127-005-0882-5
[6] Jones, Indicator and stratification methods for missing explanatory variables in multiple linear regression, Journal of the American Statistical Association 91 pp 222– (1996) · Zbl 0870.62053 · doi:10.1080/01621459.1996.10476680
[7] Kim, On the bias of the multiple-imputation variance estimator in survey sampling, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 pp 509– (2006) · Zbl 1110.62008 · doi:10.1111/j.1467-9868.2006.00546.x
[8] Statistical Analysis with Missing Data (2002) · Zbl 1011.62004
[9] Meng, Multiple-imputation inferences with uncongenial sources of input, Statistical Science 9 pp 538– (1994) · doi:10.1214/ss/1177010269
[10] Nielsen, Proper and improper multiple imputation, International Statistical Review 71 pp 593– (2003) · Zbl 1114.62323 · doi:10.1111/j.1751-5823.2003.tb00214.x
[11] Power, Cohort profile: 1958 British Birth Cohort (National Child Development Study), International Journal of Epidemiology 35 pp 34– (2006) · doi:10.1093/ije/dyi183
[12] Priebe, Characteristics of teams, staff and patients: Associations with outcomes of patients in assertive outreach, British Journal of Psychiatry 185 pp 306– (2004) · doi:10.1192/bjp.185.4.306
[13] Robins, Non-response models for the analysis of non-monotone ignorable missing data, Statistics in Medicine 16 pp 39– (1997) · doi:10.1002/(SICI)1097-0258(19970115)16:1<39::AID-SIM535>3.0.CO;2-D
[14] Robins, Inference for imputation estimators, Biometrika 87 pp 113– (2000) · Zbl 0974.62016 · doi:10.1093/biomet/87.1.113
[15] Robins, Estimation of regression coefficients when some regressors are not always observed, Journal of the American Statistical Association 89 pp 846– (1994) · Zbl 0815.62043 · doi:10.1080/01621459.1994.10476818
[16] Royston, Multiple imputation of missing values: Update of ice, Stata Journal 5 pp 527– (2005)
[17] Multiple Imputation for Nonresponse in Surveys (1987)
[18] Schafer, Multiple imputation in multivariate problems when the imputation and analysis models differ, Statistica Neerlandica 57 pp 19– (2003) · Zbl 04575109 · doi:10.1111/1467-9574.00218
[19] Schenker, Asymptotic results for multiple imputation, Annals of Statistics 16 pp 1550– (1988) · Zbl 0668.62004 · doi:10.1214/aos/1176351053
[20] Stansfeld, Psychosocial work characteristics and anxiety and depressive disorders in midlife: The effects of prior psychological distress, Occupational and Environmental Medicine 65 pp 634– (2008a) · doi:10.1136/oem.2007.036640
[21] Stansfeld, Childhood and adulthood socio-economic position and midlife depressive and anxiety disorders, Drug and Alcohol Dependence 95 pp 269– (2008b)
[22] Thomas, Prenatal exposures and glucose metabolism in adulthood, Diabetes Care 30 pp 918– (2007) · doi:10.2337/dc06-1881
[23] Van Buuren, Multiple imputation of discrete and continuous data by fully conditional specification, Statistical Methods in Medical Research 16 pp 219– (2007) · Zbl 1122.62382 · doi:10.1177/0962280206074463
[24] Vansteelandt, Analysis of incomplete data using inverse probability weighting and doubly robust estimators, Methodology 6 pp 37– (2010) · doi:10.1027/1614-2241/a000005
[25] Wang, Large-sample theory for parametric multiple imputation procedures, Biometrika 85 pp 935– (1998) · Zbl 1054.62524 · doi:10.1093/biomet/85.4.935
[26] White, Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values, Statistics in Medicine 29 pp 2920– (2010) · doi:10.1002/sim.3944
[27] White, Multiple imputation using chained equations: Issues and guidance for practice, Statistics in Medicine 30 pp 377– (2010) · doi:10.1002/sim.4067
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.