Covariate balancing propensity score by tailored loss functions. (English) Zbl 1420.62464

It is known that in observational studies the propensity scores are commonly estimated by maximum likelihood but may fail to balance high-dimensional pretreatment covariates even after specification search.
The aim of the present paper is to introduce a general framework that unifies and generalizes several recent proposals to improve covariate balance of a propensity score model in the designing of an observational study. Following a conceptually simple solution the author in order to estimate the propensity score proposed to optimize, instead of the most widely used negative Bernoulli likelihood, special loss functions (covariate balancing scoring rules (CBSR)) determined uniquely by the link function in the generalized linear model (GLM) and the estimand. As a result the author managed to show that the CBSR is much more robust in finite samples and it does not lose asymptotic efficiency in estimating the weighted average treatment effect compared to the Bernoulli likelihood.
Furthermore the author managed to propose practical strategies to balance covariate functions in much more rich functions classes in order to estimate the maximum bias of the inverse probability weighting estimators and to construct honest confidence intervals in finite samples.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62J12 Generalized linear models (logistic models)
62F10 Point estimation
62F25 Parametric tolerance and confidence regions
Full Text: DOI arXiv Euclid


[1] Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica74 235-267. · Zbl 1112.62042 · doi:10.1111/j.1468-0262.2006.00655.x
[2] Athey, S., Imbens, G. W., Wager, S. et al. (2016). Approximate residual balancing: De-biased inference of average treatment effects in high dimensions. J. R. Stat. Soc. Ser. B. Stat. Methodol.80 597-623. · Zbl 1398.62194 · doi:10.1111/rssb.12268
[3] Austin, P. C. and Stuart, E. A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat. Med.34 3661-3679.
[4] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Advanced Books and Software, Belmont, CA. · Zbl 0541.62042
[5] Caliendo, M. and Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching. J. Econ. Surv.22 31-72.
[6] Chan, K. C. G., Yam, S. C. P. and Zhang, Z. (2016). Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting. J. R. Stat. Soc. Ser. B. Stat. Methodol.78 673-700. · Zbl 1414.62107
[7] Cochran, W. G. (1953). Matching in analytical studies. Am. J. Public Health Nation’s Health43 684-691.
[8] Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics24 295-313.
[9] Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. J. Amer. Statist. Assoc.87 376-382. · Zbl 0760.62010
[10] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist.32 407-499. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[11] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Ann. Statist.28 337-407. · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[12] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist.29 1189-1232. · Zbl 1043.62034 · doi:10.1214/aos/1013203451
[13] Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc.102 359-378. · Zbl 1284.62093 · doi:10.1198/016214506000001437
[14] Graham, B. S., De Xavier Pinto, C. C. and Egel, D. (2012). Inverse probability tilting for moment condition model with missing data. Rev. Econ. Stud.79 1053-1079. · Zbl 1409.62104
[15] Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B. and Smola, A. (2012). A kernel two-sample test. J. Mach. Learn. Res.13 723-773. · Zbl 1283.62095
[16] Hainmueller, J. (2011). Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Polit. Anal.20 25-46. DOI:10.1093/pan/mpr025.
[17] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York. · Zbl 1273.62005
[18] Hazlett, C. (2016). Kernel balancing: A flexible non-parametric weighting procedure for estimating causal effects. Available at arXiv:1605.00155. · Zbl 1453.62431
[19] Heckman, J. J., Ichimura, H. and Todd, P. E. (1997). Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme. Rev. Econ. Stud.64 605-654. · Zbl 0887.90039 · doi:10.2307/2971733
[20] Hirano, K. and Imbens, G. W. (2001). Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Serv. Outcomes Res. Methodol.2 259-278. DOI:10.1023/A:1020371312283.
[21] Hirano, K., Imbens, G. W. and Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica71 1161-1189. · Zbl 1152.62328 · doi:10.1111/1468-0262.00442
[22] Hofmann, T., Schölkopf, B. and Smola, A. J. (2008). Kernel methods in machine learning. Ann. Statist.36 1171-1220. · Zbl 1151.30007 · doi:10.1214/009053607000000677
[23] Imai, K., King, G. and Stuart, E. A. (2008). Misunderstanding between experimentalists and observationalists about causal inference. J. Roy. Statist. Soc. Ser. A171 481-502.
[24] Imai, K. and Ratkovic, M. (2014). Covariate balancing propensity score. J. R. Stat. Soc. Ser. B. Stat. Methodol.76 243-263. · Zbl 1411.62025
[25] Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. Rev. Econ. Stat.86 4-29.
[26] Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge Univ. Press, New York. · Zbl 1355.62002
[27] Janson, L., Foygel Barber, R. and Candès, E. (2017). EigenPrism: Inference for high dimensional signal-to-noise ratios. J. R. Stat. Soc. Ser. B. Stat. Methodol.79 1037-1065. · Zbl 1373.62355 · doi:10.1111/rssb.12203
[28] Kallus, N. (2016). Generalized optimal matching methods for causal inference. Available at arXiv:1612.08321. · Zbl 1498.62035
[29] Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci.22 523-539. · Zbl 1246.62073 · doi:10.1214/07-STS227
[30] Li, F., Morgan, K. L. and Zaslavsky, A. M. (2016). Balancing covariates via propensity score weighting. J. Amer. Statist. Assoc.113 390-400. · Zbl 1398.62075 · doi:10.1080/01621459.2016.1260466
[31] Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Stat. Med.23 2937-2960.
[32] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman & Hall, London. · Zbl 0744.62098
[33] Müller, A. (1997). Integral probability metrics and their generating classes of functions. Adv. in Appl. Probab.29 429-443. · Zbl 0890.60011
[34] Normand, S.-L. T., Landrum, M. B., Guadagnoli, E., Ayanian, J. Z., Ryan, T. J., Cleary, P. D. and McNeil, B. J. (2001). Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: A matched analysis using propensity scores. J. Clin. Epidemiol.54 387-398.
[35] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc.89 846-866. · Zbl 0815.62043 · doi:10.1080/01621459.1994.10476818
[36] Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika70 41-55. · Zbl 0522.62091 · doi:10.1093/biomet/70.1.41
[37] Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. J. Amer. Statist. Assoc.79 516-524. · doi:10.1080/01621459.1984.10478082
[38] Rosenbaum, P. R. and Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Amer. Statist.39 33-38.
[39] Rubin, D. B. (1973). Matching to remove bias in observational studies. Biometrics 159-183.
[40] Rubin, D. B. (2008). For objective causal inference, design trumps analysis. Ann. Appl. Stat.2 808-804. · Zbl 1149.62089 · doi:10.1214/08-AOAS187
[41] Rubin, D. B. (2009). Author’s reply: Should observational studies be designed to allow lack of balance in covariate distributions across treatment groups? Stat. Med.28 1420-1423. · doi:10.1002/sim.3565
[42] Savage, L. J. (1971). Elicitation of personal probabilities and expectations. J. Amer. Statist. Assoc.66 783-801. · Zbl 0253.92008 · doi:10.1080/01621459.1971.10482346
[43] Smith, J. A. and Todd, P. E. (2005). Does matching overcome LaLonde’s critique of nonexperimental estimators? J. Econometrics125 305-353. · Zbl 1334.62225 · doi:10.1016/j.jeconom.2004.04.011
[44] Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statist. Sci.25 1-21. · Zbl 1328.62007 · doi:10.1214/09-STS313
[45] Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics59. SIAM, Philadelphia, PA. · Zbl 0813.62001
[46] Wang, Y. and Zubizarreta, J. R. (2017). Approximate balancing weights: Characterizations from a shrinkage estimation perspective. Available at arXiv:1705.00998.
[47] Wong, R. K. W. and Chan, K. C. G. (2018). Kernel-based covariate functional balancing for observational studies. Biometrika105 199-213. · Zbl 07072401
[48] Zhao, Q. (2018). Supplement to “Covariate balancing propensity score by tailored loss functions.” DOI:10.1214/18-AOS1698SUPP.
[49] Zhao, Q. and Percival, D. (2017). Entropy balancing is doubly robust. J. Causal Inference5.
[50] Zubizarreta, J. R. (2015). Stable weights that balance covariates for estimation with incomplete outcome data. J. Amer. Statist. Assoc.110 910-922. · Zbl 1373.62051 · doi:10.1080/01621459.2015.1023805
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.