×

Causal rule sets for identifying subgroups with enhanced treatment effects. (English) Zbl 07552226

Summary: A key question in causal inference analyses is how to find subgroups with elevated treatment effects. This paper takes a machine learning approach and introduces a generative model, causal rule sets (CRS), for interpretable subgroup discovery. A CRS model uses a small set of short decision rules to capture a subgroup in which the average treatment effect is elevated. We present a Bayesian framework for learning a causal rule set. The Bayesian model consists of a prior that favors simple models for better interpretability as well as avoiding overfitting and a Bayesian logistic regression that captures the likelihood of data, characterizing the relation between outcomes, attributes, and subgroup membership. The Bayesian model has tunable parameters that can characterize subgroups with various sizes, providing users with more flexible choices of models from the treatment-efficient frontier. We find maximum a posteriori models using iterative discrete Monte Carlo steps in the joint solution space of rules sets and parameters. To improve search efficiency, we provide theoretically grounded heuristics and bounding strategies to prune and confine the search space. Experiments show that the search algorithm can efficiently recover true underlying subgroups. We apply CRS on public and real-world data sets from domains in which interpretability is indispensable. We compare CRS with state-of-the-art rule-based subgroup discovery models. Results show that CRS achieves consistently competitive performance on data sets from various domains, represented by high treatment-efficient frontiers.
Summary of contribution: This paper is motivated by the large heterogeneity of treatment effect in many applications and the need to accurately locate subgroups for enhanced treatment effect. Existing methods either rely on prior hypotheses to discover subgroups or greedy methods, such as tree-based recursive partitioning. Our method adopts a machine learning approach to find an optimal subgroup learned with a carefully global objective. Our model is more flexible in capturing subgroups by using a set of short decision rules compared with tree-based baselines. We evaluate our model using a novel metric, treatment-efficient frontier, that characterizes the trade-off between the subgroup size and achievable treatment effect, and our model demonstrates better performance than baseline models.

MSC:

90-XX Operations research, mathematical programming
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Angelino E, Larus-Stone N, Alabi D, Seltzer M, Rudin C (2017) Learning certifiably optimal rule lists for categorical data. J. Machine Learn. Res. 18(1):8753-8830.Google Scholar · Zbl 1473.68134
[2] Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91(434):444-455.Crossref, Google Scholar · Zbl 0897.62130 · doi:10.1080/01621459.1996.10476902
[3] Assmann SF, Pocock SJ, Enos LE, Kasten LE (2000) Subgroup analysis and other (mis) uses of baseline data in clinical trials. Lancet 355(9209):1064-1069.Crossref, Google Scholar · doi:10.1016/S0140-6736(00)02039-0
[4] Atzmueller M (2015) Subgroup discovery. Wiley Interdisciplinary Rev. Data Mining Knowledge Discovery 5(1):35-49.Crossref, Google Scholar · doi:10.1002/widm.1144
[5] Atzmueller M, Puppe F (2006) Sd-Map—A fast algorithm for exhaustive subgroup discovery. Eur. Conf. Principles Data Mining Knowledge Discovery (Springer, Berlin, Heidelberg), 6-17.Google Scholar
[6] Baselga J, Perez EA, Pienkowski T, Bell R (2006) Adjuvant trastuzumab: A milestone in the treatment of her-2-positive early breast cancer. Oncologist 11(1):4-12. Crossref, Google Scholar · doi:10.1634/theoncologist.11-90001-4
[7] Borgelt C (2005) An implementation of the FP-growth algorithm. Proc. First Internat. Workshop Open Source Data Mining Frequent Pattern Mining Implementations (ACM), 1-5.Google Scholar
[8] Brijain M, Patel R, Kushik M, Rana K (2014) A survey on decision tree algorithm for classification. Internat. J. Engrg. Development Res. 2(1).Google Scholar
[9] Carmona CJ, González P, del Jesus MJ, Herrera F (2009) An analysis of evolutionary algorithms with different types of fuzzy rules in subgroup discovery. IEEE Internat. Conf. Fuzzy Systems (IEEE, Jeju Island, Korea), 1706-1711. Google Scholar
[10] Chen G, Liu H, Yu L, Wei Q, Zhang X (2006) A new approach to classification based on association rule mining. Decision Support Systems 42(2):674-689.Crossref, Google Scholar · doi:10.1016/j.dss.2005.03.005
[11] Chetty R, Hendren N, Katz LF (2016) The effects of exposure to better neighborhoods on children: New evidence from the moving to opportunity experiment. Amer. Econom. Rev. 106(4):855-902.Crossref, Google Scholar · doi:10.1257/aer.20150572
[12] Cleophas TJ, Zwinderman AH, Cleophas TF (2002) Subgroup analysis using multiple linear regression: confounding, interaction, synergism. Statistics Applied to Clinical Trials (Springer, Dordrecht), 95-104.Crossref, Google Scholar · doi:10.1007/978-94-010-0337-7_9
[13] Cohen J, Cohen P, West SG, Aiken LS (2013) Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (Routledge).Crossref, Google Scholar · doi:10.4324/9780203774441
[14] Cook SA (1971) The complexity of theorem-proving procedures. Proc. Third Annual ACM Sympo. Theory Comput. (ACM), 151-158.Google Scholar · Zbl 0253.68020
[15] Dash S, Gunluk O, Wei D (2018) Boolean decision rules via column generation. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems, Montréal, Canada, 4655-4665.Google Scholar
[16] Del Jesus MJ, González P, Herrera F, Mesonero M (2007) Evolutionary fuzzy rule induction process for subgroup discovery: A case study in marketing. IEEE Trans. Fuzzy Systems 15(4):578-592.Crossref, Google Scholar · doi:10.1109/TFUZZ.2006.890662
[17] Dusseldorp E, Van Mechelen I (2014) Qualitative interaction trees: A tool to identify qualitative treatment-subgroup interactions. Statist. Medicine 33(2):219-237.Crossref, Google Scholar · doi:10.1002/sim.5933
[18] Figlio D, Guryan J, Karbownik K, Roth J (2014) The effects of poor neonatal health on children’s cognitive development. Amer. Econom. Rev. 104(12):3921-3955.Crossref, Google Scholar · doi:10.1257/aer.104.12.3921
[19] Foster JC, Taylor JM, Ruberg SJ (2011) Subgroup identification from randomized clinical trial data. Statist. Medicine 30(24):2867-2880.Crossref, Google Scholar · doi:10.1002/sim.4322
[20] Gamberger D, Lavrac N (2002) Expert-guided subgroup discovery: Methodology and application. J. Artificial Intelligence Res. 17(1):501-527.Crossref, Google Scholar · Zbl 1045.68134 · doi:10.1613/jair.1089
[21] Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec. 29(2):1-12.Crossref, Google Scholar · doi:10.1145/335191.335372
[22] Holbein JB, Hillygus DS (2016) Making young voters: The impact of preregistration on youth turnout. Amer. J. Political Sci. 60(2):364-382.Crossref, Google Scholar · doi:10.1111/ajps.12177
[23] Hu X, Rudin C, Seltzer M (2019) Optimal sparse decision trees. Advances in Neural Inform. Processing Systems, Vancouver, Canada, 7267-7275.Google Scholar
[24] Kavšek B, Lavrač N (2006) APRIORI-SD: Adapting association rule learning to subgroup discovery. Appl. Artificial Intelligence 20(7):543-583.Crossref, Google Scholar · doi:10.1080/08839510600779688
[25] Kim K (2016) A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree. Pattern Recognition 60:157-163.Crossref, Google Scholar · doi:10.1016/j.patcog.2016.04.016
[26] Lagakos SW (2006) The challenge of subgroup analyses-reporting without distorting. New England J. Medicine 354(16):1667-1669.Crossref, Google Scholar · doi:10.1056/NEJMp068070
[27] Lakkaraju H, Rudin C (2017) Learning cost-effective and interpretable treatment regimes. Proc. 20th Internat. Conf. Artificial Intelligence Statist., Ft. Lauderdale, FL, 166-175.Google Scholar
[28] Lakkaraju H, Bach SH, Leskovec J (2016) Interpretable decision sets: A joint framework for description and prediction. ACM SIGKDD (ACM, San Francisco, California), 1675-1684.Google Scholar
[29] Lavrac N, Kavsek B, Flach P, Todorovski L (2004) Subgroup discovery with CN2-SD. J. Machine Learn. Res. 5(2):153-188.Google Scholar
[30] Lee K, Bargagli-Stoffi FJ, Dominici F (2020) Causal rule ensemble: Interpretable inference of heterogeneous treatment effects. Preprint, submitted September 18, https://arxiv.org/abs/2009.09036.Google Scholar
[31] Lemmerich F, Atzmueller M, Puppe F (2016) Fast exhaustive subgroup discovery with numerical target concepts. Data Mining Knowledge Discovery 30(3):711-762.Crossref, Google Scholar · Zbl 1411.68113 · doi:10.1007/s10618-015-0436-8
[32] Letham B, Rudin C, McCormick TH, Madigan D (2015) Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann. Appl. Statist. 9(3):1350-1371.Crossref, Google Scholar · Zbl 1454.62348 · doi:10.1214/15-AOAS848
[33] Li W, Han J, Pei J (2001) CMAR: Accurate and efficient classification based on multiple class-association rules. Proc. 2001 IEEE Internat. Conf. Data Mining (IEEE), 369-376.Google Scholar
[34] Lin J, Zhong C, Hu D, Rudin C, Seltzer M (2020) Generalized and scalable optimal sparse decision trees. Internat. Conf. Machine Learn.Google Scholar
[35] Lipkovich I, Dmitrienko A, Denne J, Enas G (2011) Subgroup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations. Statist. Medicine 30(21):2601-2621.Crossref, Google Scholar · doi:10.1002/sim.4289
[36] Malhotra R, Craven T, Ambrosius WT, Killeen AA, Haley WE, Cheung AK, Chonchol M, et al. (2019) Effects of intensive blood pressure lowering on kidney tubule injury in CKD: A longitudinal subgroup analysis in sprint. Amer. J. Kidney Disease 73(1):21-30.Crossref, Google Scholar · doi:10.1053/j.ajkd.2018.07.015
[37] McFowland E III, Somanchi S, Neill DB (2018) Efficient discovery of heterogeneous treatment effects in randomized experiments via anomalous pattern detection. Preprint, submitted March 24, https://arxiv.org/abs/1803.09159.Google Scholar
[38] Michalski RS, Carbonell JG, Mitchell TM (2013) Machine Learning: An Artificial Intelligence Approach (Springer Science & Business Media).Google Scholar
[39] Moodie EE, Chakraborty B, Kramer MS (2012) Q-learning for estimating optimal dynamic treatment rules from observational data. Canadian J. Statist. 40(4):629-645.Crossref, Google Scholar · Zbl 1349.62371 · doi:10.1002/cjs.11162
[40] Morucci M, Orlandi V, Roy S, Rudin C, Volfovsky A (2020) Adaptive hyper-box matching for interpretable individualized treatment effect estimation. Proc. 36th Conf. Uncertainty Artificial Intelligence (PMLR), 1089-1098.Google Scholar
[41] Nagpal C, Wei D, Vinzamuri B, Shekhar M, Berger SE, Das S, Varshney KR (2020) Interpretable subgroup discovery in treatment effect estimation with application to opioid prescribing guidelines. Proc. ACM Conf. Health Inference Learn., 19-29.Crossref, Google Scholar · doi:10.1145/3368555.3384456
[42] Neill DB (2012) Fast subset scan for spatial pattern detection. J. Roy. Statist. Soc. Ser. B. Statist. Methodology 74(2):337-360.Crossref, Google Scholar · Zbl 1411.94028 · doi:10.1111/j.1467-9868.2011.01014.x
[43] Novak PK, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining. J. Machine Learn. Res. 10:377-403.Google Scholar · Zbl 1235.68178
[44] Osofsky JD (1995) The effect of exposure to violence on young children. Amer. Psych. 50(9):782-788.Crossref, Google Scholar · doi:10.1037/0003-066X.50.9.782
[45] Pan D, Wang T, Hara S (2020) Interpretable companions for black-box models. Artificial Intelligence Statist. 108:2444-2454.Google Scholar
[46] Rijnbeek PR, Kors JA (2010) Finding a short and accurate decision rule in disjunctive normal form by exhaustive search. Machine Learn. 80(1):33-62.Crossref, Google Scholar · Zbl 1470.68164 · doi:10.1007/s10994-010-5168-9
[47] Rothwell PM (2005) Subgroup analysis in randomised controlled trials: Importance, indications, and interpretation. Lancet 365(9454):176-186.Crossref, Google Scholar · doi:10.1016/S0140-6736(05)17709-5
[48] Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J. Ed. Psych. 66(5):688-701.Crossref, Google Scholar · doi:10.1037/h0037350
[49] Sekhon JS (2011) Multivariate and propensity score matching software with automated balance optimization: The matching package for r. J. Statist. Software 42(7):1-52.Crossref, Google Scholar · doi:10.18637/jss.v042.i07
[50] Solomon A, Turunen H, Ngandu T, Peltonen M, Levälahti E, Helisalmi S, Antikainen R, et al. (2018) Effect of the apolipoprotein e genotype on cognitive change during a multidomain lifestyle intervention: A subgroup analysis of a randomized clinical trial. JAMA Neurology 75(4):462-470.Crossref, Google Scholar · doi:10.1001/jamaneurol.2017.4365
[51] Su X, Tsai CL, Wang H, Nickerson DM, Li B (2009) Subgroup analysis via recursive partitioning. J. Machine Learn. Res. 10:141-158.Google Scholar
[52] Wang T (2018) Multi-value rule sets for interpretable classification with feature-efficient representations. Adv. Neural Inform. Processing Systems 31:10835-10845.Google Scholar
[53] Wang T, Lin Q (2021) Hybrid predictive models: When an interpretable model collaborates with a black-box model. J. Machine Learn. Res. 22(137):1-38.Google Scholar · Zbl 07415080
[54] Wang F, Rudin C (2015) Falling rule lists. Proc. 18th Internat. Conf. Artificial Intelligence Statist. (PMLR), 1013-1022.Google Scholar
[55] Wang T, Rudin C, Doshi F, Liu Y, Klampfl E, MacNeille P (2017) A Bayesian framework for learning rule sets for interpretable classification. J. Machine Learn. Res. 18(70):1-37.Google Scholar · Zbl 1434.68467
[56] Wang T, Morucci M, Awan MU, Liu Y, Roy S, Rudin C, Volfovsky A (2021) Flame: A fast large-scale almost matching exactly approach to causal inference. J. Machine Learn. Res. 22:31-1.Google Scholar · Zbl 07370548
[57] Wei D, Dash S, Gao T, Gunluk O (2019) Generalized linear rule models. Internat. Conf. Machine Learn., Long Beach, California, 6687-6696. Google Scholar
[58] Yang H, Rudin C, Seltzer M (2017) Scalable Bayesian rule lists. Internat. Conf. Machine Learn. (PMLR), 3921-3930.Google Scholar
[59] Yin X, Han J (2003) CPAR: Classification based on predictive association rules. Proc. 2003 SIAM Internat. Conf. Data Mining (SIAM, San Francisco), 331-335.Crossref, Google Scholar · doi:10.1137/1.9781611972733.40
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.