×

Adversarial balancing-based representation learning for causal effect inference with observational data. (English) Zbl 1473.68140

Summary: Learning causal effects from observational data greatly benefits a variety of domains such as health care, education, and sociology. For instance, one could estimate the impact of a new drug on specific individuals to assist clinical planning and improve the survival rate. In this paper, we focus on studying the problem of estimating the Conditional Average Treatment Effect (CATE) from observational data. The challenges for this problem are two-fold: on the one hand, we have to derive a causal estimator to estimate the causal quantity from observational data, in the presence of confounding bias; on the other hand, we have to deal with the identification of the CATE when the distributions of covariates over the treatment group units and the control units are imbalanced. To overcome these challenges, we propose a neural network framework called Adversarial Balancing-based representation learning for Causal Effect Inference (ABCEI), based on recent advances in representation learning. To ensure the identification of the CATE, ABCEI uses adversarial learning to balance the distributions of covariates in the treatment and the control group in the latent representation space, without any assumptions on the form of the treatment selection/assignment function. In addition, during the representation learning and balancing process, highly predictive information from the original covariate space might be lost. ABCEI can tackle this information loss problem by preserving useful information for predicting causal effects under the regularization of a mutual information estimator. The experimental results show that ABCEI is robust against treatment selection bias, and matches/outperforms the state-of-the-art approaches. Our experiments show promising results on several datasets, encompassing several health care (and other) domains.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62D20 Causal inference from observational studies
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker PA, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) Tensorflow: a system for large-scale machine learning. In: Keeton K, Roscoe T (eds) 12th USENIX symposium on operating systems design and implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016, USENIX Association, pp 265-283
[2] Abrevaya, J.; Hsu, YC; Lieli, RP, Estimating conditional average treatment effects, J Bus Econ Stat, 33, 4, 485-505 (2015) · doi:10.1080/07350015.2014.975555
[3] Almond, D.; Chay, KY; Lee, DS, The costs of low birth weight, Q J Econ, 120, 3, 1031-1083 (2005)
[4] Autier, P.; Gandini, S., Vitamin D supplementation and total mortality: a meta-analysis of randomized controlled trials, Arch Internal Med, 167, 16, 1730-1737 (2007) · doi:10.1001/archinte.167.16.1730
[5] Bareinboim E, Pearl J (2012) Controlling selection bias in causal inference. In: Lawrence ND, Girolami MA (eds) Proceedings of the fifteenth international conference on artificial intelligence and statistics, AISTATS 2012, La Palma, Canary Islands, Spain, April 21-23, 2012, JMLR Proceedings, vol 22, pp 100-108
[6] Belghazi MI, Baratin A, Rajeswar S, Ozair S, Bengio Y, Hjelm RD, Courville AC (2018) Mutual information neural estimation. In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, PMLR, Proceedings of Machine Learning Research, vol 80, pp 530-539
[7] Benson, K.; Hartz, AJ, A comparison of observational studies and randomized, controlled trials, New England J Med, 342, 25, 1878-1886 (2000) · doi:10.1056/NEJM200006223422506
[8] Breiman, L., Random forests, Mach Learn, 45, 1, 5-32 (2001) · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[9] Casucci, S.; Lin, L.; Hewner, S.; Nikolaev, A., Estimating the causal effects of chronic disease combinations on 30-day hospital readmissions based on observational medicaid data, J Am Med Inform Assoc, 25, 6, 670-678 (2017) · doi:10.1093/jamia/ocx141
[10] Casucci, S.; Zhou, Y.; Bhattacharya, B.; Sun, L.; Nikolaev, A.; Lin, L., Causal analysis of the impact of homecare services on patient discharge disposition, Home Health Care Serv Q, 38, 3, 162-181 (2019) · doi:10.1080/01621424.2019.1617215
[11] Clevert D, Unterthiner T, Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units (ELUs). In: Bengio Y, LeCun Y (eds) 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings
[12] Correa JD, Tian J, Bareinboim E (2019) Identification of causal effects in the presence of selection bias. In: the Thirty-Third AAAI conference on artificial intelligence, AAAI 2019, Honolulu, Hawaii, USA, January 27-February 1, 2019, AAAI Press, pp 2744-2751
[13] Crump, RK; Hotz, VJ; Imbens, GW; Mitnik, OA, Nonparametric tests for treatment effect heterogeneity, Rev Econ Stat, 90, 3, 389-405 (2008) · doi:10.1162/rest.90.3.389
[14] Daume, H. III; Marcu, D., Domain adaptation for statistical classifiers, J Artif Intell Res, 26, 101-126 (2006) · Zbl 1161.68724 · doi:10.1613/jair.1872
[15] Dehejia, RH; Wahba, S., Propensity score-matching methods for nonexperimental causal studies, Rev Econ Stat, 84, 1, 151-161 (2002) · doi:10.1162/003465302317331982
[16] Diamond, A.; Sekhon, JS, Genetic matching for estimating causal effects: a general multivariate matching method for achieving balance in observational studies, Rev Econ Stat, 95, 3, 932-945 (2013) · doi:10.1162/REST_a_00318
[17] Donsker, MD; Varadhan, SRS, Asymptotic evaluation of certain Markov process expectations for large time: IV, Commun Pure Appl Math, 36, 2, 183-212 (1983) · Zbl 0512.60068 · doi:10.1002/cpa.3160360204
[18] Dorie V (2016) NPCI: non-parametrics for causal inference. https://github.com/vdorie/npci
[19] Dorie, V.; Hill, J.; Shalit, U.; Scott, M.; Cervone, D., Automated versus do-it-yourself methods for causal inference: lessons learned from a data analysis competition, Stat Sci, 34, 1, 43-68 (2019) · Zbl 1420.62345
[20] Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y, (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27: annual conference on neural information processing systems 2014(December), pp. 8-13, (2014) Montreal. Quebec, Canada, pp 2672-2680
[21] Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC, (2017) Improved training of Wasserstein GANs. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017(December), pp. 4-9, (2017) Long Beach. CA, USA, pp 5767-5777
[22] Hill, JL, Bayesian nonparametric modeling for causal inference, J Comput Graph Stat, 20, 1, 217-240 (2011) · doi:10.1198/jcgs.2010.08162
[23] Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2019) Learning deep representations by mutual information estimation and maximization. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019
[24] Ho, DE; Imai, K.; King, G.; Stuart, EA, Matchit: nonparametric preprocessing for parametric causal inference, J Stat Softw, 42, 8, 1-28 (2011) · doi:10.18637/jss.v042.i08
[25] Imai, K.; Ratkovic, M., Covariate balancing propensity score, J R Stat Soc Ser B (Stat Methodol), 76, 1, 243-263 (2014) · Zbl 1411.62025 · doi:10.1111/rssb.12027
[26] Johansson FD, Shalit U, Sontag DA (2016) Learning representations for counterfactual inference. In: Balcan M, Weinberger KQ (eds) Proceedings of the 33nd international conference on machine learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, JMLR Workshop and Conference Proceedings, vol 48, pp 3020-3029
[27] Johnson, A.; Pollard, T.; Mark, R., MIMIC-III clinical database demo (version 1.4), PhysioNet (2019) · doi:10.13026/C2HM2Q
[28] Johnson, AE; Pollard, TJ; Shen, L.; Li-Wei, HL; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Celi, LA; Mark, RG, MIMIC-III, a freely accessible critical care database, Sci Data, 3, 160035 (2016) · doi:10.1038/sdata.2016.35
[29] Kallus N (2018) Balanced policy evaluation and learning. In: Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp 8909-8920
[30] Kallus N (2020) Deepmatch: Balancing deep covariate representations for causal inference using adversarial training. In: Proceedings of the 37th International conference on machine learning, ICML 2020, 13-18 July 2020, Virtual Event, PMLR, Proceedings of Machine Learning Research, vol 119, pp 5067-5077
[31] Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
[32] LaLonde, RJ, Evaluating the econometric evaluations of training programs with experimental data, Am Econ Rev, 76, 4, 604-620 (1986)
[33] Li S, Fu Y, (2017) Matching on balanced nonlinear representations for treatment effects estimation. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017(December), pp. 4-9, (2017) Long Beach. CA, USA, pp 929-939
[34] Louizos C, Shalit U, Mooij JM, Sontag DA, Zemel RS, Welling M, (2017) Causal effect inference with deep latent-variable models. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017(December), pp. 4-9, (2017) Long Beach. CA, USA, pp 6446-6456
[35] Marx A, Vreeken J (2019) Identifiability of cause and effect using regularized regression. In: Teredesai A, Kumar V, Li Y, Rosales R, Terzi E, Karypis G (eds) Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019, ACM, pp 852-861
[36] Mooij, JM; Peters, J.; Janzing, D.; Zscheischler, J.; Schölkopf, B., Distinguishing cause from effect using observational data: methods and benchmarks, J Mach Learn Res, 17, 1, 1103-1204 (2016) · Zbl 1360.68700
[37] Morgan, SL; Harding, DJ, Matching estimators of causal effects: prospects and pitfalls in theory and practice, Sociol Methods Res, 35, 1, 3-60 (2006) · doi:10.1177/0049124106289164
[38] Nikolaev, AG; Jacobson, SH; Cho, WKT; Sauppe, JJ; Sewell, EC, Balance optimization subset selection (boss): an alternative approach for causal inference with observational data, Oper Res, 61, 2, 398-412 (2013) · Zbl 1329.62047 · doi:10.1287/opre.1120.1118
[39] Ning, Y.; Sida, P.; Imai, K., Robust estimation of causal effects via a high-dimensional covariate balancing propensity score, Biometrika, 107, 3, 533-554 (2020) · Zbl 1451.62050 · doi:10.1093/biomet/asaa020
[40] Ozery-Flato M, Thodoroff P, El-Hay T (2018) Adversarial balancing for causal inference. Preprint arXiv:1810.07406
[41] Pearl J (2009) Causality. Cambridge University Press · Zbl 1188.68291
[42] Rosenbaum, PR; Rubin, DB, The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 1, 41-55 (1983) · Zbl 0522.62091 · doi:10.1093/biomet/70.1.41
[43] Rubin, DB, Using propensity scores to help design observational studies: application to the tobacco litigation, Health Serv Outcomes Res Methodol, 2, 3-4, 169-188 (2001) · doi:10.1023/A:1020363010465
[44] Rubin, DB, Causal inference using potential outcomes: design, modeling, decisions, J Am Stat Assoc, 100, 469, 322-331 (2005) · Zbl 1117.62418 · doi:10.1198/016214504000001880
[45] Shalit U, Johansson FD, Sontag DA (2017) Estimating individual treatment effect: generalization bounds and algorithms. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, PMLR, Proceedings of Machine Learning Research, vol 70, pp 3076-3085
[46] Shannon, CE, A mathematical theory of communication, Bell Syst Techn J, 27, 3, 379-423 (1948) · Zbl 1154.94303 · doi:10.1002/j.1538-7305.1948.tb01338.x
[47] Smith, JA; Todd, PE, Does matching overcome LaLonde’s critique of nonexperimental estimators?, J Econom, 125, 1-2, 305-353 (2005) · Zbl 1334.62225 · doi:10.1016/j.jeconom.2004.04.011
[48] Sparapani, RA; Logan, BR; McCulloch, RE; Laud, PW, Nonparametric survival analysis using Bayesian additive regression trees (BART), Stat Med, 35, 16, 2741-2753 (2016) · doi:10.1002/sim.6893
[49] Sun, L.; Nikolaev, AG, Mutual information based matching for causal inference with observational data, J Mach Learn Res, 17, 1, 6990-7020 (2016) · Zbl 1436.62711
[50] Swaminathan A, Joachims T (2015) Counterfactual risk minimization: learning from logged bandit feedback. In: Bach FR, Blei DM (eds) Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6-11 July 2015, JMLR Workshop and Conference Proceedings, vol 37, pp 814-823 · Zbl 1351.68236
[51] Tam Cho, WK; Sauppe, JJ; Nikolaev, AG; Jacobson, SH; Sewell, EC, An optimization approach for making causal inferences, Stat Neerlandica, 67, 2, 211-226 (2013) · doi:10.1111/stan.12004
[52] Tian J, Pearl J (2002) A general identification condition for causal effects. In: Dechter R, Kearns MJ, Sutton RS (eds) Proceedings of the eighteenth national conference on artificial intelligence and fourteenth conference on innovative applications of artificial intelligence, July 28-August 1, 2002, Edmonton, Alberta, Canada, AAAI Press/The MIT Press, pp 567-573
[53] van der Maaten, L.; Hinton, G., Visualizing data using t-SNE, J Mach Learn Res, 9, 86, 2579-2605 (2008) · Zbl 1225.68219
[54] Wager, S.; Athey, S., Estimation and inference of heterogeneous treatment effects using random forests, J Am Stat Assoc, 113, 523, 1228-1242 (2018) · Zbl 1402.62056 · doi:10.1080/01621459.2017.1319839
[55] Yao L, Li S, Li Y, Huai M, Gao J, Zhang A (2018) Representation learning for treatment effect estimation from observational data. In: Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp 2638-2648
[56] Zhao S, Heffernan NT (2017) Estimating individual treatment effect from educational studies with residual counterfactual networks. In: Hu X, Barnes T, Hershkovitz A, Paquette L (eds) Proceedings of the 10th international conference on educational data mining, EDM 2017, Wuhan, Hubei, China, June 25-28, 2017, International Educational Data Mining Society (IEDMS)
[57] Zubizarreta, JR, Using mixed integer programming for matching in an observational study of kidney failure after surgery, J Am Stat Assoc, 107, 500, 1360-1371 (2012) · Zbl 1258.62119 · doi:10.1080/01621459.2012.703874
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.