×

Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. (English) Zbl 1246.62073

Summary: When outcomes are missing for reasons beyond an investigator’s control, there are two different ways to adjust a parameter estimate for covariates that may be related both to the outcome and to missingness. One approach is to model the relationships between the covariates and the outcomes and use those relationships to predict the missing values. Another is to model the probabilities of missingness given the covariates and incorporate them into a weighted or stratified estimate. Doubly robust (DR) procedures apply both types of model simultaneously and produce a consistent estimate of the parameter if either of the two models has been correctly specified. We show that DR estimates can be constructed in many ways. We compare the performance of various DR and non-DR estimates of a population mean in a simulated example where both models are incorrect but neither is grossly misspecified. Methods that use inverse-probabilities as weights, whether they are DR or not, are sensitive to misspecification of the propensity model when some estimated propensities are small. Many DR methods perform better than simple inverse-probability weighting. None of the DR methods we tried, however, improved upon the performance of simple regression-based prediction of the missing values. This study does not represent every missing-data problem that will arise in practice. But it does demonstrate that, in at least some settings, two wrong models are not better than one.

MSC:

62F35 Robustness and adaptive procedures (parametric inference)
62F10 Point estimation
62A99 Foundational topics in statistics
65C60 Computational problems in statistics (MSC2010)

Software:

BayesDA

References:

[1] Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669-679. JSTOR: · Zbl 0774.62031 · doi:10.2307/2290350
[2] Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61 962-972. · Zbl 1087.62121 · doi:10.1111/j.1541-0420.2005.00377.x
[3] Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. Internat. Statist. Rev. 51 279-292. JSTOR: · Zbl 0535.62014 · doi:10.2307/1402588
[4] Carpenter, J., Kenward, M. and Vansteelandt, S. (2006). A comparison of multiple imputation and inverse probability weighting for analyses with missing data. J. Roy. Statist. Soc. Ser. A 169 571-584. · doi:10.1111/j.1467-985X.2006.00407.x
[5] Cassel, C. M., Särndal, C. E. and Wretman, J. H. (1976). Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63 615-620. JSTOR: · Zbl 0344.62011 · doi:10.1093/biomet/63.3.615
[6] Cassel, C. M., Särndal, C. E. and Wretman, J. H. (1977). Foundations of Inference in Survey Sampling . Wiley, New York. · Zbl 0391.62007
[7] Cassel, C. M., Särndal, C. E. and Wretman, J. H. (1983). Some uses of statistical models in connection with the nonresponse problem. In Incomplete Data in Sample Surveys III . Symposium on Incomplete Data , Proceedings (W. G. Madow and I. Olkin, eds.). Academic Press, New York.
[8] Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics 24 205-213. JSTOR: · doi:10.2307/2528036
[9] D’Agostino, R. B. Jr. (1998). Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statistics in Medicine 17 2265-2281.
[10] Davidian, M., Tsiatis, A. A. and Leon, S. (2005). Semiparametric estimation of treatment effect in a pretest-posttest study without missing data. Statist. Sci. 20 261-301. · Zbl 1100.62554 · doi:10.1214/088342305000000151
[11] Gelman, A. and Meng, X. L., eds. (2004). Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives . Wiley, New York. · Zbl 1066.62515 · doi:10.1002/0470090456
[12] Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis . Chapman and Hall, London. · Zbl 1039.62018
[13] Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration. Econometrica 57 1317-1339. JSTOR: · Zbl 0683.62068 · doi:10.2307/1913710
[14] Hammersley, J. M. and Handscomb, D. C. (1964). Monte Carlo Methods . Methuen, London. · Zbl 0121.35503
[15] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models . Chapman and Hall, London. · Zbl 0747.62061
[16] Hinkley, D. (1985). Transformation diagnostics for linear models. Biometrika 72 487-496. JSTOR: · Zbl 0586.62111 · doi:10.1093/biomet/72.3.487
[17] Hirano, K. and Imbens, G. (2001). Estimation of causal effects using propensity score weighting: An application to data on right heart catherization. Health Services and Outcome Research Methodology 2 259-278.
[18] Holland, P. W. (1986). Statistics and causal inference. J. Amer. Statist. Assoc. 81 945-970. JSTOR: · Zbl 0607.62001 · doi:10.2307/2289064
[19] Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663-685. JSTOR: · Zbl 0047.38301 · doi:10.2307/2280784
[20] Little, R. J. A. and An, H. (2004). Robust likelihood-based analysis of multivariate data with missing values. Statist. Sinica 14 949-968. · Zbl 1073.62050
[21] Little, R. J. A. (1986). Survey nonresponse adjustments for estimates of means. Internat. Statist. Rev. 54 139-157. · Zbl 0596.62009 · doi:10.2307/1403140
[22] Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data . Wiley, New York. · Zbl 0665.62004
[23] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data , 2nd ed. Wiley, New York. · Zbl 1011.62004
[24] Liu, C. (2004). Robit regression: A simple robust alternative to logistic and probit regression. In Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives (A. Gelman and X. L. Meng, eds.) 227-238. Wiley, New York. · Zbl 05274820 · doi:10.1002/0470090456.ch21
[25] Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine 23 2937-2960.
[26] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models , 2nd ed. Chapman and Hall, London. · Zbl 0588.62104
[27] Neyman, J. (1923). On the application of probability theory to agricultural experiments: Essays on principles, Section 9. Translated from the Polish and edited by D. M. Dabrowska and T. P. Speed. Statist. Sci. 5 (1990) 465-480. · Zbl 0955.01560
[28] Oh, H. L. and Scheuren, F. S. (1983). Weighting adjustments for unit nonresponse. In Incomplete Data in Sample Surveys II . Theory and Annotated Bibliography (W. G. Madow, I. Olkin and D. B. Rubin, eds.) 143-184. Academic Press, New York.
[29] Pregibon, D. (1982). Resistant fits for some commonly used logistic models with medical applications. Biometrics 38 485-498.
[30] Robins, J. M. and Rotnitzky, A. (1995). Semiparametric efficiency in multivariate regression models with missing data. J. Amer. Statist. Assoc. 90 122-129. JSTOR: · Zbl 0818.62043 · doi:10.2307/2291135
[31] Robins, J. M. and Rotnitzky, A. (2001). Comment on “Inference for semiparametric models: some questions and an answer,” by P. J. Bickel and J. Kwon. Statist. Sinica 11 920-936. · Zbl 0997.62028
[32] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89 846-866. JSTOR: · Zbl 0815.62043 · doi:10.2307/2290910
[33] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Amer. Statist. Assoc. 90 106-121. JSTOR: · Zbl 0818.62042 · doi:10.2307/2291134
[34] Rotnitzky, A., Robins, J. M. and Scharfstein, D. O. (1998). Semiparametric regression for repeated outcomes with ignorable nonresponse. J. Amer. Statist. Assoc. 93 1321-1339. JSTOR: · Zbl 1064.62520 · doi:10.2307/2670049
[35] Rosenbaum, P. R. (2002). Observational Studies , 2nd ed. Springer, New York. · Zbl 0985.62091
[36] Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41-55. JSTOR: · Zbl 0522.62091 · doi:10.1093/biomet/70.1.41
[37] Rosenbaum, P. R. and Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. American Statistician 39 33-38.
[38] Rubin, D. B. (1974a). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educational Psychology 66 688-701.
[39] Rubin, D. B. (1974b). Characterizing the estimation of parameters in incomplete data problems. J. Amer. Statist. Assoc. 69 467-474. · Zbl 0291.62036 · doi:10.2307/2285680
[40] Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581-592. JSTOR: · Zbl 0344.62034 · doi:10.1093/biomet/63.3.581
[41] Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34-58. · Zbl 0383.62021 · doi:10.1214/aos/1176344064
[42] Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys . Wiley, New York. · Zbl 1070.62007
[43] Rubin, D. B. (2005). Causal inference using potential outcomes: design, modeling, decisions. J. Amer. Statist. Assoc. 100 322-331. · Zbl 1117.62418 · doi:10.1198/016214504000001880
[44] Särndal, C.-E., Swensson, B. and Wretman, J. (1989). The weighted residual technique for estimating the variance of the general regression estimator of a finite population total. Biometrika 76 527-537. JSTOR: · Zbl 0677.62004 · doi:10.1093/biomet/76.3.527
[45] Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling . Springer, New York. · Zbl 0742.62008
[46] Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data . Chapman and Hall, London. · Zbl 0997.62510
[47] Schafer, J. L. and Kang, J. D. Y. (2005). Discussion of “Semiparametric estimation of treatment effect in a pretest-postest study with missing data” by M. Davidian et al. Statist. Sci. 20 292-295.
[48] Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Amer. Statist. Assoc. 94 1096-1120 (with rejoinder 1135-1146). JSTOR: · Zbl 1072.62644 · doi:10.2307/2669923
[49] van der Laan, M. J. and Robins, J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality . Springer, New York. · Zbl 1013.62034
[50] Vartivarian, S. and Little, R. J. A. (2002). On the formation of weighting adjustment cells for unit nonresponse. Proceedings of the Survey Research Methods Section , American Statistical Association . Amer. Statist. Assoc., Alexandria, VA.
[51] Winship, C. and Sobel, M. E. (2004). Causal inference in sociological studies. In Handbook of Data Analysis (M. Hardy, ed.) 481-503. Thousand Oaks, Sage, CA.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.