×

zbMATH — the first resource for mathematics

Introduction to double robust methods for incomplete data. (English) Zbl 1397.62176
Summary: Most methods for handling incomplete data can be broadly classified as inverse probability weighting (IPW) strategies or imputation strategies. The former model the occurrence of incomplete data; the latter, the distribution of the missing variables given observed variables in each missingness pattern. Imputation strategies are typically more efficient, but they can involve extrapolation, which is difficult to diagnose and can lead to large bias. Double robust (DR) methods combine the two approaches. They are typically more efficient than IPW and more robust to model misspecification than imputation. We give a formal introduction to DR estimation of the mean of a partially observed variable, before moving to more general incomplete-data scenarios. We review strategies to improve the performance of DR estimators under model misspecification, reveal connections between DR estimators for incomplete data and “design-consistent” estimators used in sample surveys, and explain the value of double robustness when using flexible data-adaptive methods for IPW or imputation.

MSC:
62G35 Nonparametric robustness
62E20 Asymptotic distribution theory in statistics
PDF BibTeX XML Cite
Full Text: DOI Euclid
References:
[1] Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics61 962–972. · Zbl 1087.62121
[2] Belloni, A. and Chernozhukov, V. (2011). \(l_{1}\)-Penalized quantile regression in high-dimensional sparse models. Ann. Statist.39 82–130. · Zbl 1209.62064
[3] Belloni, A., Chernozhukov, V. and Hansen, C. (2016). Lasso methods for Gaussian instrumental variables models. Preprint. Available at arXiv:1012.1297.
[4] Brookhart, M. A. and Van der Laan, M. J. (2006). A semiparametric model selection criterion with applications to the marginal structural model. Comput. Statist. Data Anal.50 475–498. · Zbl 1431.62107
[5] Cao, W., Tsiatis, A. A. and Davidian, M. (2009). Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika96 723–734. · Zbl 1170.62007
[6] Cassel, C. M., Sarndal, C. E. and Wretman, J. H. (1976). Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika63 615–620. · Zbl 0344.62011
[7] Cheng, G., Yu, Z. and Huang, J. Z. (2013). The cluster bootstrap consistency in generalized estimating equations. J. Multivariate Anal.115 33–47. · Zbl 1258.62057
[8] Chernozhukov, V., Escanciano, J. C., Ichimura, H. and Newey, W. K. (2016). Locally robust semiparametric estimation. Preprint. Available at arXiv:1608.00033.
[9] Farrell, M. H. (2015). Robust inference on average treatment effects with possibly more covariates than observations. J. Econometrics189 1–23. · Zbl 1337.62113
[10] Gruber, S. and van der Laan, M. J. (2010). A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Int. J. Biostat.6 Article 26.
[11] Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc.47 663–685. · Zbl 0047.38301
[12] Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci.22 523–539. · Zbl 1246.62073
[13] Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory21 21–59. · Zbl 1085.62004
[14] Leeb, H. and Pötscher, B. M. (2006). Performance limits for estimators of the risk or distribution of shrinkage-type estimators, and some general lower risk-bound results. Econometric Theory22 69–97. · Zbl 1083.62060
[15] Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalised linear models. Biometrika73 13–22. · Zbl 0595.62110
[16] Little, R. and An, H. (2004). Robust likelihood-based analysis of multivariate data with missing values. Statist. Sinica14 949–968. · Zbl 1073.62050
[17] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data. Wiley, New York. · Zbl 1011.62004
[18] Long, Q., Zhang, X. and Johnson, B. A. (2011). Robust estimation of area under ROC curve using auxiliary variables in the presence of missing biomarker values. Biometrics67 559–567. · Zbl 1217.62182
[19] Meng, X.-L. (1994). Multiple-imputation inferences with uncongenial sources of input. Statist. Sci.9 538–573.
[20] Newey, W. K., Hsieh, F. and Robins, J. M. (2004). Twicing kernels and a small bias property of semiparametric estimators. Econometrica72 947–962. · Zbl 1091.62024
[21] Paik, M. C. (1997). The generalized estimating equations approach when data are not missing completely at random. J. Amer. Statist. Assoc.92 1320–1329. · Zbl 0913.62052
[22] Porter, K. E., Gruber, S., van der Laan, M. J. and Sekhon, J. S. (2011). The relative performance of targeted maximum likelihood estimators. Int. J. Biostat.7 Article 31.
[23] Qi, L., Wang, C. Y. and Prentice, R. L. (2005). Weighted estimators for proportional hazards regression with missing covariates. J. Amer. Statist. Assoc.100 1250–1263. · Zbl 1117.62413
[24] Robins, J. and Rotnitzky, A. (1998). Discussion on the paper by Firth and Bennett. J. Roy. Statist. Soc. Ser. B60 51–52.
[25] Robins, J., Sued, M., Lei-Gomez, Q. and Rotnitzky, A. (2007). Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable [MR2420458]. Statist. Sci.22 544–559. · Zbl 1246.62076
[26] Robins, J. M. (2000). Robust estimation in sequentially ignorable missing data and causal inference models. In Proceedings of the American Statistical Association Section on Bayesian Statistical Science 1999 6–10. Amer. Statist. Assoc., Alexandria, VA.
[27] Robins, J. M. and Gill, R. D. (1997). Non-response models for the analysis of non-monotone ignorable missing data. Stat. Med.16 39–56.
[28] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc.89 846–866. · Zbl 0815.62043
[29] Rotnitzky, A., Faraggi, D. and Schisterman, E. (2006). Doubly robust estimation of the area under the receiver-operating characteristic curve in the presence of verification bias. J. Amer. Statist. Assoc.101 1276–1288. · Zbl 1120.62336
[30] Rotnitzky, A., Lei, Q. H., Sued, M. and Robins, J. M. (2012). Improved double-robust estimation in missing data and causal inference models. Biometrika99 439–456. · Zbl 1239.62071
[31] Rotnitzky, A. and Vansteelandt, S. (2014). Double-robust methods. In Handbook of Missing Data Methodology (G. Molenberghs, G. Fitzmaurice, M. G. Kenward, A. Tsiatis and G. Verbeke, eds.) 185–212. CRC Press, Boca Raton, FL.
[32] Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models: Rejoinder. J. Amer. Statist. Assoc.94 1135–1146. · Zbl 1072.62644
[33] Schnitzer, M. E., Lok, J. J. and Bosch, R. J. (2016). Double robust and efficient estimation of a prognostic model for events in the presence of dependent censoring. Biostatistics17 165–177.
[34] Seaman, S. and Copas, A. (2009). Doubly robust generalized estimating equations for longitudinal data. Stat. Med.28 937–955.
[35] Seaman, S. R., Galati, J., Jackson, D. and Carlin, J. (2013). What is meant by “missing at random”? Statist. Sci.28 257–268. · Zbl 1331.62036
[36] Seaman, S. R. and Vansteelandt, S. (2018). Supplement to “Introduction to double robust methods for incomplete data.” DOI:10.1214/18-STS647SUPP.
[37] Tan, Z. (2006). A distributional approach for causal inference using propensity scores. J. Amer. Statist. Assoc.101 1619–1637. · Zbl 1171.62320
[38] Tan, Z. (2008). Comment: Improved local efficiency and double robustness. Int. J. Biostat.4 Article 10.
[39] Tan, Z. (2010). Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika97 661–682. · Zbl 1195.62037
[40] Tsiatis, A. A. (2006). Semiparametric Theory and Missing Data. Springer, New York. · Zbl 1105.62002
[41] Tsiatis, A. A. and Davidian, M. (2014). Missing data methods: A semi-parametric perspective. In Handbook of Missing Data Methodology (G. Molenberghs, G. Fitzmaurice, M. G. Kenward, A. Tsiatis and G. Verbeke, eds.) Chapter 8. CRC Press, Boca Raton, FL.
[42] Tsiatis, A. A., Davidian, M. and Cao, W. (2011). Improved doubly robust estimation when data are monotonely coarsened, with application to longitudinal studies with dropout. Biometrics67 536–545. · Zbl 1217.62146
[43] van der Laan, M. J. and Rubin, D. B. (2006). Targeted maximum likelihood learning. Int. J. Biostat.2 Art. 11.
[44] Vansteelandt, S., Carpenter, J. and Kenward, M. G. (2015). Analysis of incomplete data using inverse probability weighting and doubly robust estimators. Methodology6 37–48.
[45] van der Laan, M. J. and Gruber, S. (2010). Collaborative double robust targeted maximum likelihood estimation. Int. J. Biostat.6 Article 17.
[46] Vermeulen, K. and Vansteelandt, S. (2015). Bias-reduced doubly robust estimation. J. Amer. Statist. Assoc.110 1024–1036. · Zbl 1373.62218
[47] Wilson, A. and Reich, B. J. (2014). Confounder selection via penalized credible regions. Biometrics70 852–861. · Zbl 1393.62107
[48] Wirth, K. E., Tchetgen Tchetgen, E. J. and Murray, M. (2010). Adjustment for missing data in complex surveys using doubly robust estimation. Epidemiology21 863–871.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.