Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. (English) Zbl 1454.62348

Summary: We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if…then… statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the \(\mathrm{CHADS}_{2}\) score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as \(\mathrm{CHADS}_{2}\), but more accurate.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62C10 Bayesian problems; characterization of Bayes procedures
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI arXiv Euclid


[1] Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. In VLDB’ 94 Proceedings of the 20 th International Conference on Very Large Databases 487-499. Morgan Kaufmann, San Francisco, CA.
[2] Antman, E. M., Cohen, M., Bernink, P. J. L. M., McCabe, C. H., Horacek, T., Papuchis, G., Mautner, B., Corbalan, R., Radley, D. and Braunwald, E. (2000). The TIMI risk score for unstable angina/non-ST elevation MI: A method for prognostication and therapeutic decision making. JAMA 284 835-842.
[3] Bache, K. and Lichman, M. (2013). UCI machine learning repository. Available at .
[4] Borgelt, C. (2005). An implementation of the FP-growth algorithm. In OSDM’ 05 Proceedings of the 1 st International Workshop on Open Source Data Mining : Frequent Pattern Mining Implementations 1-5. ACM, New York.
[5] Bratko, I. (1997). Machine learning: Between accuracy and interpretability. In Learning , Networks and Statistics (G. Della Riccia, H.-J. Lenz and R. Kruse, eds.). International Centre for Mechanical Sciences 382 163-177. Springer, Vienna. · Zbl 0932.68088
[6] Breiman, L. (1996a). Bagging predictors. Mach. Learn. 24 123-140. · Zbl 0858.68080
[7] Breiman, L. (1996b). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350-2383. · Zbl 0867.62055
[8] Breiman, L. (2001a). Random forests. Mach. Learn. 45 5-32. · Zbl 1007.68152
[9] Breiman, L. (2001b). Statistical modeling: The two cultures. Statist. Sci. 16 199-231. · Zbl 1059.62505
[10] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees . Wadsworth, Belmont, CA. · Zbl 0541.62042
[11] Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2 27:1-27:27.
[12] Chipman, H. A., George, E. I. and McCulloch, R. E. (1998). Bayesian CART model search. J. Amer. Statist. Assoc. 93 935-948. · Zbl 1072.62650
[13] Chipman, H. A., George, E. I. and McCulloch, R. E. (2002). Bayesian treed models. Mach. Learn. 48 299-320. · Zbl 0998.68072
[14] Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. Ann. Appl. Stat. 4 266-298. · Zbl 1189.62066
[15] Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American Psychologist 34 571-582.
[16] Denison, D. G. T., Mallick, B. K. and Smith, A. F. M. (1998). A Bayesian CART algorithm. Biometrika 85 363-377. · Zbl 1048.62502
[17] Dougherty, J., Kohavi, R. and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In ICML’ 95 Proceedings of the 12 th International Conference on Machine Learning 194-202. Morgan Kaufmann, San Francisco, CA.
[18] Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R. and Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9 1871-1874. · Zbl 1225.68175
[19] Fayyad, U. M. and Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In IJCAI’ 93 Proceedings of the 1993 International Joint Conference on Artificial Intelligence 1022-1027. Morgan Kaufmann, San Francisco, CA.
[20] Freitas, A. A. (2014). Comprehensible classification models: A position paper. ACM SIGKDD Explorations Newsletter 15 1-10.
[21] Friedman, J. H. and Popescu, B. E. (2008). Predictive learning via rule ensembles. Ann. Appl. Stat. 2 916-954. · Zbl 1149.62051
[22] Gage, B. F., Waterman, A. D., Shannon, W., Boechler, M., Rich, M. W. and Radford, M. J. (2001). Validation of clinical classification schemes for predicting stroke. Journal of the American Medical Association 285 2864-2870.
[23] Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457-472. · Zbl 1386.65060
[24] Giraud-Carrier, C. (1998). Beyond predictive accuracy: What? Technical report, Univ. Bristol, Bristol, UK.
[25] Goh, S. T. and Rudin, C. (2014). Box drawings for learning with imbalanced data. In KDD’ 14 Proceedings of the 20 th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 333-342. .
[26] Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11 63-91. · Zbl 0850.68278
[27] Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J. and Baesens, B. (2011). An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Support Systems 51 141-154.
[28] Jennings, D. L., Amabile, T. M. and Ross, L. (1982). Informal covariation assessments: Data-based versus theory-based judgements. In Judgment Under Uncertainty : Heuristics and Biases , (D. Kahneman, P. Slovic and A. Tversky, eds.) 211-230. Cambridge Univ. Press, Cambridge, MA.
[29] King, G., Lam, P. and Roberts, M. (2014). Computer-assisted keyword and document set discovery from unstructured text. Technical report, Harvard.
[30] Knaus, W. A., Draper, E. A., Wagner, D. P. and Zimmerman, J. E. (1985). APACHE II: A severity of disease classification system. Critical Care Medicine 13 818-829.
[31] Leondes, C. T. (2002). Expert Systems : The Technology of Knowledge Management and Decision Making for the 21 st Century . Academic Press, San Diego, CA.
[32] Letham, B., Rudin, C., McCormick, T. H. and Madigan, D. (2013). An interpretable stroke prediction model using rules and Bayesian analysis. In Proceedings of AAAI Late Breaking Track . MIT, Cambridge, MA. · Zbl 1454.62348
[33] Letham, B., Rudin, C., McCormick, T. H. and Madigan, D. (2014). An interpretable model for stroke prediction using rules and Bayesian analysis. In Proceedings of 2014 KDD Workshop on Data Science for Social Good . MIT, Cambridge, MA.
[34] Letham, B., Rudin, C., McCormick, T. H. and Madigan, D. (2015). Supplement to “Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model.” , DOI:10.1214/15-AOAS848SUPPB . · Zbl 1454.62348
[35] Levenshtein, V. I. (1965). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Dokl. 10 707-710. · Zbl 0149.15905
[36] Li, W., Han, J. and Pei, J. (2001). CMAR: Accurate and efficient classification based on multiple class-association rules. In Proceedings of the IEEE International Conference on Data Mining 369-376. IEEE, New York.
[37] Lim, W. S., van der Eerden, M. M., Laing, R., Boersma, W. G., Karalus, N., Town, G. I., Lewis, S. A. and Macfarlane, J. T. (2003). Defining community acquired pneumonia severity on presentation to hospital: An international derivation and validation study. Thorax 58 377-382.
[38] Lip, G. Y. H., Frison, L., Halperin, J. L. and Lane, D. A. (2010a). Identifying patients at high risk for stroke despite anticoagulation: A comparison of contemporary stroke risk stratification schemes in an anticoagulated atrial fibrillation cohort. Stroke 41 2731-2738.
[39] Lip, G. Y. H., Nieuwlaat, R., Pisters, R., Lane, D. A. and Crijns, H. J. G. M. (2010b). Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: The euro heart survey on atrial fibrillation. Chest 137 263-272.
[40] Liu, B., Hsu, W. and Ma, Y. (1998). Integrating classification and association rule mining. In KDD’ 98 Proceedings of the 4 th International Conference on Knowledge Discovery and Data Mining 80-96. AAAI Press, Palo Alto, CA.
[41] Madigan, D., Mittal, S. and Roberts, F. (2011). Efficient sequential decision-making algorithms for container inspection operations. Naval Res. Logist. 58 637-654. · Zbl 1245.90021
[42] Madigan, D., Mosurski, K. and Almond, R. G. (1997). Explanation in belief networks. J. Comput. Graph. Statist. 6 160-181.
[43] Marchand, M. and Sokolova, M. (2005). Learning with decision lists of data-dependent features. J. Mach. Learn. Res. 6 427-451. · Zbl 1222.68257
[44] McCormick, T. H., Rudin, C. and Madigan, D. (2012). Bayesian hierarchical rule modeling for predicting medical conditions. Ann. Appl. Stat. 6 622-668. · Zbl 1243.62036
[45] Meinshausen, N. (2010). Node harvest. Ann. Appl. Stat. 4 2049-2072. · Zbl 1220.62084
[46] Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits to our capacity for processing information. The Psychological Review 63 81-97.
[47] Muggleton, S. and De Raedt, L. (1994). Inductive logic programming: Theory and methods. J. Logic Programming 19 629-679. · Zbl 0816.68043
[48] Quinlan, J. R. (1993). C 4 . 5: Programs for Machine Learning . Morgan Kaufmann, San Mateo. · Zbl 1037.68938
[49] Rivest, R. L. (1987). Learning decision lists. Mach. Learn. 2 229-246.
[50] Rudin, C. and Ertekin, Ş. (2015). Learning optimized lists of classification rules. Technical report, MIT, Cambridge, MA.
[51] Rudin, C., Letham, B. and Madigan, D. (2013). Learning theory analysis for association rules and sequential event prediction. J. Mach. Learn. Res. 14 3441-3492. · Zbl 1317.68184
[52] Rüping, S. (2006). Learning interpretable models. Ph.D. thesis, Univ. Dortmund.
[53] Shmueli, G. (2010). To explain or to predict? Statist. Sci. 25 289-310. · Zbl 1329.62045
[54] Souillard-Mandar, W., Davis, R., Rudin, C., Au, R., Libon, D. J., Swenson, R., Price, C. C., Lamar, M. and Penney, D. L. (2015). Learning classification models of cognitive conditions from subtle behaviors in the digital clock drawing test. Machine Learning . · Zbl 06679347
[55] Srikant, R. and Agrawal, R. (1996). Mining quantitative association rules in large relational tables. In SIGMOD’ 96 Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data 1-12. ACM, New York.
[56] Stang, P. E., Ryan, P. B., Racoosin, J. A., Overhage, J. M., Hartzema, A. G., Reich, C., Welebob, E., Scarnecchia, T. and Woodcock, J. (2010). Advancing the science for active surveillance: Rationale and design for the observational medical outcomes partnership. Ann. Intern. Med. 153 600-606.
[57] Taddy, M. A., Gramacy, R. B. and Polson, N. G. (2011). Dynamic trees for learning and design. J. Amer. Statist. Assoc. 106 109-123. · Zbl 1396.62158
[58] Vapnik, V. N. (1995). The Nature of Statistical Learning Theory . Springer, New York. · Zbl 0833.62008
[59] Vellido, A., Martín-Guerrero, J. D. and Lisboa, P. J. G. (2012). Making machine learning models interpretable. In Proceedings of the European Symposium on Artificial Neural Networks , Computational Intelligence and Machine Learning . ESANN, Bruges.
[60] Wang, F. and Rudin, C. (2015). Falling rule lists. In JMLR Workshop and Conference Proceedings 38 1013-1022. San Diego, CA.
[61] Wang, T., Rudin, C., Doshi, F., Liu, Y., Klampfl, E. and MacNeille, P. (2015). Bayesian or’s of and’s for interpretable classification with application to context aware recommender systems. Available at . arXiv:1504.0761
[62] Wu, Y., Tjelmeland, H. and West, M. (2007). Bayesian CART: Prior specification and posterior simulation. J. Comput. Graph. Statist. 16 44-66.
[63] Wu, X., Zhang, C. and Zhang, S. (2004). Efficient mining of both positive and negative association rules. ACM Transactions on Information Systems 22 381-405. · Zbl 1317.68184
[64] Yin, X. and Han, J. (2003). CPAR: Classification based on predictive association rules. In ICDM’ 03 Proceedings of the 2003 SIAM International Conference on Data Mining 331-335. SIAM, Philadelphia, PA.
[65] Zaki, M. J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12 372-390.
[66] Zhang, Y., Laber, E. B., Tsiatis, A. and Davidian, M. (2015). Using decision lists to construct interpretable and parsimonious treatment regimes. Available at . arXiv:1504.0771 · Zbl 1419.62490
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.