Estimating treatment effect heterogeneity in randomized program evaluation. (English) Zbl 1376.62036

Summary: When evaluating the efficacy of social programs and medical treatments using randomized experiments, the estimated overall average causal effect alone is often of limited value and the researchers must investigate when the treatments do and do not work. Indeed, the estimation of treatment effect heterogeneity plays an essential role in (1) selecting the most effective treatment from a large number of available treatments, (2) ascertaining subpopulations for which a treatment is effective or harmful, (3) designing individualized optimal treatment regimes, (4) testing for the existence or lack of heterogeneous treatment effects, and (5) generalizing causal effect estimates obtained from an experimental sample to a target population. In this paper, we formulate the estimation of heterogeneous treatment effects as a variable selection problem. We propose a method that adapts the Support Vector Machine classifier by placing separate sparsity constraints over the pre-treatment parameters and causal heterogeneity parameters of interest. The proposed method is motivated by and applied to two well-known randomized evaluation studies in the social sciences. Our method selects the most effective voter mobilization strategies from a large number of alternative strategies, and it also identifies the characteristics of workers who greatly benefit from (or are negatively affected by) a job training program. In our simulation studies, we find that the proposed method often outperforms some commonly used alternatives.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G08 Nonparametric regression and quantile regression
62J07 Ridge regression; shrinkage estimators (Lasso)
62P10 Applications of statistics to biology and medical sciences; meta analysis
62P25 Applications of statistics to social sciences
Full Text: DOI arXiv Euclid


[1] Bradley, P. and Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In Machine Learning Proceedings of the Fifteenth International Conference 82-90. Morgan Kaufmann, San Francisco, CA.
[2] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees . Wadsworth Advanced Books and Software, Belmont, CA. · Zbl 0541.62042
[3] Cai, T., Tian, L., Wong, P. H. and Wei, L. J. (2011). Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics 12 270-282.
[4] Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). Bart: Bayesian additive regression trees. Ann. Appl. Stat. 4 266-298. · Zbl 1189.62066
[5] Cole, S. R. and Stuart, E. A. (2010). Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. Am. J. Epidemiol. 172 107-115.
[6] Crump, R. K., Hotz, V. J., Imbens, G. W. and Mitnik, O. A. (2008). Nonparametric tests for treatment effect heterogeneity. The Review of Economics and Statistics 90 389-405.
[7] Davison, A. C. (1992). Treatment effect heterogeneity in paired data. Biometrika 79 463-474. · Zbl 1073.62501
[8] Dehejia, R. H. and Wahba, S. (1999). Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. J. Amer. Statist. Assoc. 94 1053-1062.
[9] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-499. · Zbl 1091.62054
[10] Franc, V., Zien, A. and Schölkopf, B. (2011). Support vector machines as probabilistic models. In The 28 th International Conference on Machine Learning 665-672. ACM, Bellevue, WA.
[11] Frangakis, C. (2009). The calibration of treatment effects from clinical trials to target populations. Clin. Trials 6 136-140.
[12] Freund, Y. and Schapire, R. E. (1999). A short introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence 1401-1406. Morgan Kaufmann, San Francisco, CA.
[13] Friedman, J. H., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 1-22.
[14] Gail, M. and Simon, R. (1985). Testing for qualitative interactions between treatment effects and patient subsets. Biometrics 41 361-372. · Zbl 0614.62140
[15] Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2 1360-1383. · Zbl 1156.62017
[16] Gerber, A. S. and Green, D. P. (2000). The effects of canvassing, telephone calls, and direct mail on voter turnout: A field experiment. American Political Science Review 94 653-663.
[17] Gerber, A., Green, D. and Larimer, C. (2008). Social pressure and voter turnout: Evidence from a large-scale field experiment. American Political Science Review 102 33-48.
[18] Green, D. P. and Kern, H. L. (2010a). Detecting heterogenous treatment effects in large-scale experiments using Bayesian additive regression trees. In The Annual Summer Meeting of the Society of Political Methodology . Univ. Iowa.
[19] Green, D. P. and Kern, H. L. (2010b). Generalizing experimental results. In The Annual Meeting of the American Political Science Association . Washington, D.C.
[20] Gunter, L., Zhu, J. and Murphy, S. A. (2011). Variable selection for qualitative interactions. Stat. Methodol. 8 42-55. · Zbl 05898213
[21] Hartman, E., Grieve, R. and Sekhon, J. S. (2010). From SATE to PATT: The essential role of placebo test combining experimental and observational studies. In The Annual Meeting of the American Political Science Association . Washington, D.C.
[22] Hill, J. L. (2011). Challenges with propensity score matching in a high-dimensional setting and a potential alternative. Multivariate and Behavioral Research 46 477-513.
[23] Hothorn, T., Hornik, K. and Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Statist. 15 651-674.
[24] Imai, K. (2005). Do get-out-the-vote calls reduce turnout?: The importance of statistical methods for field experiments. American Political Science Review 99 283-300.
[25] Imai, K. and Strauss, A. (2011). Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign. Political Analysis 19 1-19.
[26] Kang, J., Su, X., Hitsman, B., Liu, K. and Lloyd-Jones, D. (2012). Tree-structured analysis of treatment effects with large observational data. J. Appl. Stat. 39 513-529.
[27] Lagakos, S. W. (2006). The challenge of subgroup analyses-reporting without distorting. N. Engl. J. Med. 354 1667-1669.
[28] LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review 76 604-620.
[29] LeBlanc, M. and Kooperberg, C. (2010). Boosting predictions of treatment success. Proc. Natl. Acad. Sci. USA 107 13559-13560.
[30] Lee, Y., Lin, Y. and Wahba, G. (2004). Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. J. Amer. Statist. Assoc. 99 67-81. · Zbl 1089.62511
[31] Lin, Y. (2002). Support vector machines and the Bayes rule in classification. Data Min. Knowl. Discov. 6 259-275. · Zbl 05660804
[32] Lipkovich, I., Dmitrienko, A., Denne, J. and Enas, G. (2011). Subgroup identification based on differential effect search-a recursive partitioning method for establishing response to treatment in patient subpopulations. Stat. Med. 30 2601-2621.
[33] Loh, W. Y., Piper, M. E., Schlam, T. R., Fiore, M. C., Smith, S. S., Jorenby, D. E., Cook, J. W., Bolt, D. M. and Baker, T. B. (2012). Should all smokers use combination smoking cessation pharmacotherapy? Using novel analytic methods to detect differential treatment effects over eight weeks of pharmacotherapy. Nicotine and Tobacco Research 14 131-141.
[34] Manski, C. F. (2004). Statistical treatment rules for heterogeneous populations. Econometrica 72 1221-1246. · Zbl 1142.62308
[35] Menon, A. K., Jiang, X., Vembu, S., Elkan, C. and Ohno-Machado, L. (2012). Predicting accurate probabilities with a ranking loss. In Proceedings of the 29 th International Conference on Machine Learning . Edinburgh, Scotland.
[36] Moodie, E. E. M., Platt, R. W. and Kramer, M. S. (2009). Estimating response-maximized decision rules with applications to breastfeeding. J. Amer. Statist. Assoc. 104 155-165. · Zbl 06448240
[37] Murphy, S. A. (2003). Optimal dynamic treatment regimes. J. R. Stat. Soc. Ser. B Stat. Methodol. 65 331-366. · Zbl 1065.62006
[38] Nickerson, D. W. (2008). Is voting contagious?: Evidence from two field experiments. American Political Science Review 102 49-57.
[39] Pineau, J., Bellemare, M. G., Rush, A. J., Ghizaru, A. and Murphy, S. A. (2007). Constructing evidence-based treatment strategies using methods from computer science. Drug and Alcohol Dependence 88S S52-S60.
[40] Platt, J. (1999). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Margin Classifiers 61-74. MIT Press, Cambridge, MA.
[41] Qian, M. and Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. Ann. Statist. 39 1180-1210. · Zbl 1216.62178
[42] Ratkovic, M. and Imai, K. (2012). FindIt: R package for finding heterogeneous treatment effects. Available at Comprehensive R Archive Network ( ).
[43] Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41-55. · Zbl 0522.62091
[44] Rothwell, P. M. (2005). Subgroup analysis in randomized controlled trials: Importance, indications, and interpretation. The Lancet 365 176-186.
[45] Rubin, D. B. (1990). Comment on J. Neyman and causal inference in experiments and observational studies: “On the application of probability theory to agricultural experiments. Essay on principles. Section 9” [ Ann. Agric. Sci. 10 (1923) 1-51]. Statist. Sci. 5 472-480.
[46] Sollich, P. (2002). Bayesian methods for support vector machines: Evidence and predictive class probabilities. Machine Learning 46 21-52. · Zbl 0998.68098
[47] Stuart, E. A., Cole, S. R., Bradshaw, C. P. and Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. J. Roy. Statist. Soc. Ser. A 174 369-386. · Zbl 05870692
[48] Su, X., Tsai, C. L., Wang, H., Nickerson, D. M. and Li, B. (2009). Subgroup analysis via recursive partitioning. J. Mach. Learn. Res. 10 141-158.
[49] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[50] Vapnik, V. N. (1995). The Nature of Statistical Learning Theory . Springer, New York. · Zbl 0833.62008
[51] Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59 . SIAM, Philadelphia, PA. · Zbl 0813.62001
[52] Wahba, G. (2002). Soft and hard classification by reproducing kernel Hilbert space methods. Proc. Natl. Acad. Sci. USA 99 16524-16530 (electronic). · Zbl 1106.62338
[53] Yang, Y. and Zou, H. (2012). An efficient algorithm for computing the HHSVM and its generalizations. J. Comput. Graph. Statist.
[54] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030
[55] Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 56-85. · Zbl 1105.62323
[56] Zhang, H. H. (2006). Variable selection for support vector machines via smoothing spline ANOVA. Statist. Sinica 16 659-674. · Zbl 1096.62072
[57] Zhang, B., Tsiatis, A. A., Laber, E. B. and Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics . · Zbl 1258.62116
[58] Zhao, Y., Zeng, D., Socinski, M. A. and Kosorok, M. R. (2011). Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics 67 1422-1433. · Zbl 1274.62922
[59] Zhao, Y., Zeng, D., Rush, J. A. and Kosorok, M. R. (2012). Estimating individualized treatment rules using outcome weighted learning. J. Amer. Statist. Assoc. 107 1106-1118. · Zbl 1443.62396
[60] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301-320. · Zbl 1069.62054
[61] Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173-2192. · Zbl 1126.62061
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.