zbMATH — the first resource for mathematics

Automated versus do-it-yourself methods for causal inference: lessons learned from a data analysis competition. (English) Zbl 1420.62345
Summary: Statisticians have made great progress in creating methods that reduce our reliance on parametric assumptions. However, this explosion in research has resulted in a breadth of inferential strategies that both create opportunities for more reliable inference as well as complicate the choices that an applied researcher has to make and defend. Relatedly, researchers advocating for new methods typically compare their method to at best 2 or 3 other causal inference strategies and test using simulations that may or may not be designed to equally tease out flaws in all the competing methods. The causal inference data analysis challenge, “Is Your SATT Where It’s At?”, launched as part of the 2016 Atlantic Causal Inference Conference, sought to make progress with respect to both of these issues. The researchers creating the data testing grounds were distinct from the researchers submitting methods whose efficacy would be evaluated. Results from 30 competitors across the two versions of the competition (black-box algorithms and do-it-yourself analyses) are presented along with post-hoc analyses that reveal information about the characteristics of causal inference strategies and settings that affect performance. The most consistent conclusion was that methods that flexibly model the response surface perform better overall than methods that fail to do so. Finally new methods are proposed that combine features of several of the top-performing submitted methods.

62K20 Response surface designs
68T05 Learning and adaptive systems in artificial intelligence
62B15 Theory of statistical experiments
BayesTree; Matching
Full Text: DOI Euclid
[1] Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica74 235-267. · Zbl 1112.62042
[2] Athanasopoulos, G. and Hyndman, R. J. (2011). The value of feedback in forecasting competitions. Int. J. Forecast.27 845-849.
[3] Athey, S. and Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. USA113 7353-7360. · Zbl 1357.62190
[4] Austin, P. C., Grootendorst, P. and Anderson, G. M. (2007). A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: A Monte Carlo study. Stat. Med.26 734-753.
[5] Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics61 962-972. · Zbl 1087.62121
[6] Barnow, B. S., Cain, G. G. and Goldberger, A. S. (1980). Issues in the analysis of selectivity bias. In Evaluation Studies (E. Stromsdorfer and G. Farkas, eds.) 5 42-59. Sage, San Francisco, CA.
[7] Breiman, L. (2001). Random forests. Mach. Learn.45 5-32. · Zbl 1007.68152
[8] Carpenter, J. (2011). May the best analyst win. Science331 698-699.
[9] Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. Ann. Appl. Stat.4 266-298. · Zbl 1189.62066
[10] Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. J. Abnorm. Soc. Psychol.65 145-163.
[11] Crump, R. K., Hotz, V. J., Imbens, G. W. and Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika96 187-199. · Zbl 1163.62083
[12] Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems 2292-2300.
[13] Dietterich, T. G. (2000). Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems 1-15. Springer, Berlin.
[14] Ding, P. and Miratrix, L. (2014). To adjust or not to adjust? Sensitivity analysis of M-bias and butterfly-bias. J. Causal Inference3 41-57.
[15] Dorie, V., Harada, M., Carnegie, N. B. and Hill, J. (2016). A flexible, interpretable framework for assessing sensitivity to unmeasured confounding. Stat. Med.35 3453-3470.
[16] Dorie, V., Hill, J., Shalit, U., Scott, M. and Cervone, D. (2019). Supplement to “Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition.” DOI:10.1214/18-STS667SUPP.
[17] Enders, C. K. and Tofighi, D. (2007). Centering predictor variables in cross-sectional multilevel models: A new look at an old issue. Psychol. Methods12 121-138.
[18] Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge Univ. Press, New York.
[19] Greenland, S. and Robins, J. M. (1986). Identifiability, exchangeability, and epidemiological confounding. Int. J. Epidemiol.15 413-419.
[20] Greenland, S., Robins, J. M. and Pearl, J. (1999). Confounding and collapsibility in causal inference. Statist. Sci.14 29-46. · Zbl 1059.62506
[21] Guyon, I., Aliferis, C. F., Cooper, G. F., Elisseeff, A., Pellet, J.-P., Spirtes, P. and Statnikov, A. R. (2008). Design and analysis of the causation and prediction challenge. In WCCI Causation and Prediction Challenge 1-33.
[22] Haberman, S. J. (1984). Adjustment by minimum discriminant information. Ann. Statist.12 971-988. · Zbl 0583.62020
[23] Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica66 315-331. · Zbl 1055.62572
[24] Hahn, P. R., Murray, J. S. and Carvalho, C. (2017). Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects. Preprint. Available at arXiv:1706.09523.
[25] Hartman, E., Grieve, R., Ramsahai, R. and Sekhon, J. S. (2015). From sample average treatment effect to population average treatment effect on the treated: Combining experimental with observational studies to estimate population treatment effects. J. Roy. Statist. Soc. Ser. A178 757-778.
[26] Hill, J. (2008). Discussion of research using propensity-score matching: Comments on ‘A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003’ by Peter Austin. Stat. Med.27 2055-2061.
[27] Hill, J. L. (2011). Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Statist.20 217-240.
[28] Hill, J. L., Reiter, J. P. and Zanutto, E. L. (2004). A comparison of experimental and observational data analyses. In Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives (X.-L. Meng and A. Gelman, eds.) 49-60. Wiley, Chichester. · Zbl 05274804
[29] Hill, J. and Su, Y.-S. (2013). Assessing lack of common support in causal inference using Bayesian nonparametrics: Implications for evaluating the effect of breastfeeding on children’s cognitive outcomes. Ann. Appl. Stat.7 1386-1420. · Zbl 1283.62220
[30] Hirano, K. and Imbens, G. W. (2001). Estimation of causal effects using propensity score weighting: An application of data on right ear catheterization. Health Serv. Outcomes Res. Methodol.1 259-278.
[31] Hirano, K., Imbens, G. W. and Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica71 1161-1189. · Zbl 1152.62328
[32] Imai, K., and Ratkovic, M. (2014). Covariate balancing propensity score. J. Roy. Statist. Soc. Ser. B76 243-263. · Zbl 1411.62025
[33] Imbens, G. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. Rev. Econ. Stat.86 4-29.
[34] Kern, H. L., Stuart, E. A., Hill, J. L. and Green, D. P. (2016). Assessing methods for generalizing experimental impact estimates to target samples. J. Res. Educ. Eff.9 103-127.
[35] Kurth, T., Walker, A. M., Glynn, R. J., Chan, K. A., Gaziano, J. M., Berger, K. and Robins, J. M. (2006). Results of multivariable logistic regression, propensity matching, propensity adjustment, and propensity-based weighting under conditions of non-uniform effect. Am. J. Epidemiol.163 262-270.
[36] LaLonde, R. and Maynard, R. (1987). How precise are evaluations of employment and training programs: Evidence from a field experiment. Eval. Rev.11 428-451.
[37] Lechner, M. (2001). Identification and estimation of causal effects of multiple treatments under the conditional independence assumption. In Econometric Evaluation of Labour Market Policies (M. Lechner and F. Pfeiffer, eds.). ZEW Economic Studies13 43-58. Physica-Verlag, Heidelberg.
[38] Lee, B. K., Lessler, J. and Stuart, E. A. (2010). Improving propensity score weighting using machine learning. Stat. Med.29 337-346.
[39] Little, R. J. (1988). Missing-data adjustments in large surveys. J. Bus. Econom. Statist.6 287-296.
[40] Middleton, J., Scott, M., Diakow, R. and Hill, J. (2016). Bias amplification and bias unmasking. Polit. Anal.24 307-323.
[41] Niswander, K. R. and Gordon, M. (1972). The Collaborative Perinatal Study of the National Institute of Neurological Diseases and Stroke: The Women and Their Pregnancies. W.B. Saunders, Philadelphia, PA.
[42] Paulhamus, B., Ebaugh, A., Boylls, C., Bos, N., Hider, S. and Giguere, S. (2012). Crowdsourced cyber defense: Lessons from a large-scale, game-based approach to threat identification on a live network. In International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction 35-42. Springer, Berlin.
[43] Pearl, J. (2009a). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge. · Zbl 1188.68291
[44] Pearl, J. (2009b). Causal inference in statistics: An overview. Stat. Surv.3 96-146. · Zbl 1300.62013
[45] Pearl, J. (2010). On a class of bias-amplifying variables that endanger effect estimates. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence 425-432. Accessed 02/02/2016.
[46] Ranard, B. L., Ha, Y. P., Meisel, Z. F., Asch, D. A., Hill, S. S., Becker, L. B., Seymour, A. K. and Merchant, R. M. (2014). Crowdsourcing—Harnessing the masses to advance health and medicine, a systematic review. J. Gen. Intern. Med.29 187-203.
[47] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA. · Zbl 1177.68165
[48] Robins, J. M. (1999). Association, causation, and marginal structural models. Synthese121 151-179. · Zbl 1078.62523
[49] Robins, J. M. and Rotnitzky, A. (2001). Comment on ‘Inference for semiparametric models: Some questions and an answer,’ by P. J. Bickel and J. Kwon. Statist. Sinica11 920-936.
[50] Rokach, L. (2009). Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography. Comput. Statist. Data Anal.53 4046-4072. · Zbl 1453.62185
[51] Rosenbaum, P. R. (1987). Model-based direct adjustment. J. Amer. Statist. Assoc.82 387-394. · Zbl 0622.62010
[52] Rosenbaum, P. R. (2002). Observational Studies, 2nd ed. Springer, New York. · Zbl 0985.62091
[53] Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika70 41-55. · Zbl 0522.62091
[54] Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. J. Amer. Statist. Assoc.79 516-524.
[55] Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychol. Bull.86 638-641.
[56] Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist.6 34-58. · Zbl 0383.62021
[57] Rubin, D. B. (2006). Matched Sampling for Causal Effects. Cambridge Univ. Press, Cambridge. · Zbl 1118.62113
[58] Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Amer. Statist. Assoc.94 1096-1146. · Zbl 1072.62644
[59] Sekhon, J. S. (2007). Multivariate and propensity score matching software with automated balance optimization: The matching package for R. J. Stat. Softw..
[60] Shadish, W. R., Clark, M. H. and Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. J. Amer. Statist. Assoc.103 1334-1343. · Zbl 1286.62013
[61] Steiner, P. and Kim, Y. (2016). The mechanics of omitted variable bias: Bias amplification and cancellation of offsetting biases. J. Causal Inference4.
[62] Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statist. Sci.25 1-21. · Zbl 1328.62007
[63] Taddy, M., Gardner, M., Chen, L. and Draper, D. (2016). A nonparametric Bayesian analysis of heterogenous treatment effects in digital experimentation. J. Bus. Econom. Statist.34 661-672.
[64] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B58 267-288. · Zbl 0850.62538
[65] Vanschoren, J., Van Rijn, J. N., Bischl, B. and Torgo, L. (2014). OpenML: Networked science in machine learning. ACM SIGKDD Explor. Newsl.15 49-60.
[66] van der Laan, M. J. and Robins, J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer, New York. · Zbl 1013.62034
[67] van der Laan, M. J. and Rubin, D. (2006). Targeted maximum likelihood learning. Int. J. Biostat.2 Art. 11, 40.
[68] Wager, S. and Athey, S. (2015). Estimation and inference of heterogeneous treatment effects using random forests. Preprint. Available at arXiv:1510.04342. · Zbl 1402.62056
[69] Westreich, D., Lessler, J. and Funk, M. J. (2010). Propensity score estimation: Neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J. Clin. Epidemiol.63 826-833.
[70] Wind, D. K. and Winther, O. (2014). Model selection in data analysis competitions. In MetaSel@ ECAI 55-60.
[71] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol.67 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.