×

All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. (English) Zbl 1436.62019

Summary: Variable importance (VI) tools describe how much covariates contribute to a prediction model’s accuracy. However, important variables for one well-performing model (for example, a linear model \(f(\mathbf{x})=\mathbf{x}^T\beta\) with a fixed coefficient vector \(\beta)\) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across all well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, based on the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a single prediction model, U-statistics, conditional variable importance, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR to a public data set of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models.

MSC:

62A01 Foundations and philosophical topics in statistics
62G30 Order statistics; empirical distribution functions
62P25 Applications of statistics to social sciences
62M20 Inference from stochastic processes and prediction
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

QCQP
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] Andr´e Altmann, Laura Tolo¸si, Oliver Sander, and Thomas Lengauer. Permutation importance: a corrected feature importance measure.Bioinformatics, 26(10):1340-1347, 2010.
[2] Kellie J Archer and Ryan V Kimes. Empirical characterization of random forest variable importance measures.Computational Statistics & Data Analysis, 52(4):2249-2260, 2008. · Zbl 1452.62027
[3] Razia Azen, David V Budescu, and Benjamin Reiser. Criticality of predictors in multiple regression.British Journal of Mathematical and Statistical Psychology, 54(2):201-225, 2001.
[4] Katherine Beckett, Kris Nyrop, and Lori Pfingst. Race, drugs, and policing: understanding disparities in drug delivery arrests.Criminology, 44(1):105-137, 2006.
[5] Irene V Blair, Charles M Judd, and Kristine M Chapleau. The influence of afrocentric facial features in criminal sentencing.Psychological science, 15(10):674-679, 2004.
[6] Stephen Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge university press, 2004. · Zbl 1058.90049
[7] Leo Breiman. Random forests.Machine learning, 45(1):5-32, 2001. · Zbl 1007.68152
[8] Leo Breiman et al. Statistical modeling: the two cultures (with comments and a rejoinder by the author).Statistical science, 16(3):199-231, 2001. · Zbl 1059.62505
[9] M Luz Calle and V´ıctor Urrea. Letter to the editor: stability of random forest importance measures.Briefings in bioinformatics, 12(1):86-89, 2010.
[10] Hugh A Chipman, Edward I George, Robert E McCulloch, et al. Bart: Bayesian additive regression trees.The Annals of Applied Statistics, 4(1):266-298, 2010. · Zbl 1189.62066
[11] Alexandra Chouldechova. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments.Big data, 5(2):153-163, 2017.
[12] Beau Coker, Cynthia Rudin, and Gary King. A theory of statistical inference for ensuring the robustness of scientific results.arXiv preprint arXiv:1804.08646, 2018.
[13] Sam Corbett-Davies, Emma Pierson, Avi Feller, and Sharad Goel.A computer program used for bail and sentencing decisions was labeled biased against blacks. it’s actually not that clear.The Washington Post,October 2016. URLhttps://www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/ can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/ ?utm_term=.e896ff1e4107.
[14] Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 797-806. ACM, 2017.
[15] Anupam Datta, Shayak Sen, and Yair Zick. Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. InSecurity and Privacy (SP), 2016 IEEE Symposium on, pages 598-617. IEEE, 2016.
[16] Elizabeth R DeLong, David M DeLong, and Daniel L Clarke-Pearson. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.Biometrics, 44(3):837-845, 1988. · Zbl 0715.62207
[17] Olga V Demler, Michael J Pencina, and Ralph B D’Agostino Sr. Misuse of delong test to compare aucs for nested models.Statistics in medicine, 31(23):2577-2587, 2012.
[18] Iv´an D´ıaz, Alan Hubbard, Anna Decker, and Mitchell Cohen. Variable importance and prediction methods for longitudinal problems with missing variables.PloS one, 10(3): e0120031, 2015.
[19] Werner Dinkelbach. On nonlinear fractional programming.Management science, 13(7): 492-498, 1967. · Zbl 0152.18402
[20] Jiayun Dong and Cynthia Rudin. Variable importance clouds: A way to explore variable importance for the set of good models.arXiv preprint arXiv:1901.03209, 2019.
[21] R Dorfman. A note on the delta-method for finding variance formulae.The Biometric Bulletin, 1(129-137):92, 1938.
[22] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. InProceedings of the 3rd innovations in theoretical computer science conference, pages 214-226. ACM, 2012. · Zbl 1348.91230
[23] Muriel Gevrey, Ioannis Dimopoulos, and Sovan Lek. Review and comparison of methods to study the contribution of variables in artificial neural network models.Ecological modelling, 160(3):249-264, 2003.
[24] Baptiste Gregorutti, Bertrand Michel, and Philippe Saint-Pierre. Grouped variable importance with random forests and application to multiple functional data analysis.Computational Statistics & Data Analysis, 90:15-35, 2015. · Zbl 1468.62069
[25] Baptiste Gregorutti, Bertrand Michel, and Philippe Saint-Pierre. Correlation and variable importance in random forests.Statistics and Computing, 27(3):659-678, 2017. · Zbl 1505.62167
[26] Alexander Hapfelmeier, Torsten Hothorn, Kurt Ulm, and Carolin Strobl. A new variable importance measure for random forests with missing data.Statistics and Computing, 24 (1):21-34, 2014. · Zbl 1325.62011
[27] T Hastie, R Tibshirani, and J Friedman.The elements of statistical learning 2nd edition. New York: Springer, 2009. · Zbl 1273.62005
[28] Karl G Heider. The Rashomon effect: when ethnographers disagree.American Anthropologist, 90(1):73-81, 1988.
[29] Wassily Hoeffding. A class of statistics with asymptotically normal distribution.The annals of mathematical statistics, pages 293-325, 1948. · Zbl 0032.04101
[30] Wassily Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13-30, 1963. doi: 10.1080/01621459. 1963.10500830. URLhttps://amstat.tandfonline.com/doi/abs/10.1080/01621459. 1963.10500830. · Zbl 0127.10602
[31] Giles Hooker. Generalized functional anova diagnostics for high-dimensional functions of dependent variables.Journal of Computational and Graphical Statistics, 16(3):709-732, 2007.
[32] Reiner Horst and Nguyen V Thoai. Dc programming: overview.Journal of Optimization Theory and Applications, 103(1):1-43, 1999. · Zbl 1073.90537
[33] Faisal Kamiran, Indr˙e ˇZliobait˙e, and Toon Calders. Quantifying explainable discrimination and removing illegal discrimination in automated decision making.Knowledge and information systems, 35(3):613-644, 2013.
[34] Jalil Kazemitabar, Arash Amini, Adam Bloniarz, and Ameet S Talwalkar. Variable importance using decision trees. InAdvances in Neural Information Processing Systems, pages 425-434, 2017.
[35] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In8th Innovations in Theoretical Computer Science Conference (ITCS 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017. · Zbl 1402.68156
[36] Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compas recidivism algorithm.ProPublica, May 2016. URLhttps://www.propublica. org/article/how-we-analyzed-the-compas-recidivism-algorithm.
[37] Guillaume Lecu´e.Interplay between concentration, complexity and geometry in learning theory with applications to high dimensional data analysis. PhD thesis, Universit´e ParisEst, 2011.
[38] Erich L Lehmann and George Casella.Theory of point estimation. Springer Science & Business Media, 2006.
[39] Benjamin Letham, Portia A Letham, Cynthia Rudin, and Edward P Browne. Prediction uncertainty and optimal experimental design for learning dynamical systems.Chaos: An Interdisciplinary Journal of Nonlinear Science, 26(6):063110, 2016. · Zbl 1374.37123
[40] Gilles Louppe, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. Understanding variable importances in forests of randomized trees. InAdvances in neural information processing systems, pages 431-439, 2013.
[41] Kristian Lum and William Isaac. To predict and serve?Significance, 13(5):14-19, 2016.
[42] Nicolai Meinshausen and Peter B¨uhlmann. Stability selection.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4):417-473, 2010. · Zbl 1411.62142
[43] Lucas Mentch and Giles Hooker. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests.The Journal of Machine Learning Research, 17(1):841-881, 2016. · Zbl 1360.62095
[44] John Monahan and Jennifer L Skeem. Risk assessment in criminal sentencing.Annual review of clinical psychology, 12:489-513, 2016.
[45] Razieh Nabi and Ilya Shpitser. Fair inference on outcomes. InProceedings of the... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, volume 2018, page 1931. NIH Public Access, 2018.
[46] Daniel Nevo and Ya’acov Ritov. Identifying a minimal class of models for high-dimensional data.The Journal of Machine Learning Research, 18(1):797-825, 2017. · Zbl 1437.62280
[47] Julian D Olden, Michael K Joy, and Russell G Death. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecological Modelling, 178(3):389-397, 2004.
[48] Jaehyun Park and Stephen Boyd. General heuristics for nonconvex quadratically constrained quadratic programming.arXiv preprint arXiv:1703.07870, 2017.
[49] Raymond Paternoster and Robert Brame. Reassessing race disparities in maryland capital cases.Criminology, 46(4):971-1008, 2008.
[50] Sarah Picard-Fritsche, Michael Rempel, Jennifer A. Tallon, Julian Adler, and Natalie Reyes.Demystifying risk assessment, key principles and controversies.Technical report, 2017. Available athttps://www.courtinnovation.org/publications/ demystifying-risk-assessment-key-principles-and-controversies.
[51] Imre P´olik and Tam´as Terlaky. A survey of the s-lemma.SIAM review, 49(3):371-418, 2007. · Zbl 1128.90046
[52] Rajeev Ramchand, Rosalie Liccardo Pacula, and Martin Y Iguchi. Racial differences in marijuana-users’ risk of arrest in the united states.Drug and alcohol dependence, 84(3): 264-272, 2006.
[53] Friedrich Recknagel, Mark French, Pia Harkonen, and Ken-Ichi Yabunaka. Artificial neural network approach for modelling and prediction of algal blooms.Ecological Modelling, 96 (1-3):11-28, 1997.
[54] Wendy D Roth and Jal D Mehta. The Rashomon effect: combining positivist and interpretivist approaches in the analysis of contested events.Sociological Methods & Research, 31(2):131-173, 2002.
[55] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature Machine Intelligence, 1:206-215, May 2019.
[56] Cynthia Rudin, Caroline Wang, and Beau Coker. The age of secrecy and unfairness in recidivism prediction.Harvard Data Science Review
[57] Michele Scardi and Lawrence W Harding. Developing an empirical model of phytoplankton primary production: a neural network case study.Ecological modelling, 120(2):213-223, 1999.
[58] Lesia Semenova and Cynthia Rudin. A study in rashomon curves and volumes: a new perspective on generalization and model simplicity in machine learning.arXiv preprint arXiv:1908.01755, 2019.
[59] Robert J Serfling.Approximation theorems of mathematical statistics. John Wiley & Sons, 1980. · Zbl 0538.62002
[60] Cassia Spohn. Thirty years of sentencing reform: the quest for a racially neutral sentencing process.Criminal justice, 3:427-501, 2000.
[61] Alexander Statnikov, Nikita I Lytkin, Jan Lemeire, and Constantin F Aliferis. Algorithms for discovery of multiple markov boundaries.Journal of Machine Learning Research, 14 (Feb):499-566, 2013. · Zbl 1319.68194
[62] Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. Bias in random forest variable importance measures: illustrations, sources and a solution.BMC bioinformatics, 8(1):25, 2007.
[63] Carolin Strobl, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, and Achim Zeileis. Conditional variable importance for random forests.BMC bioinformatics, 9 (1):307, 2008. · Zbl 1452.62469
[64] Elizabeth A Stuart. Matching methods for causal inference: a review and a look forward. Statistical science: a review journal of the Institute of Mathematical Statistics, 25(1):1, 2010. · Zbl 1328.62007
[65] Laura Tolo¸si and Thomas Lengauer. Classification with correlated features: unreliability of feature ranking and solutions.Bioinformatics, 27(14):1986-1994, 2011.
[66] Theja Tulabandhula and Cynthia Rudin. Robust optimization using machine learning for uncertainty sets.arXiv preprint arXiv:1407.1097, 2014. · Zbl 1319.68198
[67] U.S. Department of Justice - Civil Rights Devision. Investigation of the Baltimore City Police Department, August 2016. Available athttps://www.justice.gov/crt/file/ 883296/download.
[68] Mark J van der Laan. Statistical inference for variable importance.The International Journal of Biostatistics, 2(1), 2006.
[69] Jay M Ver Hoef. Who invented the delta method?The American Statistician, 66(2): 124-127, 2012. · Zbl 07649009
[70] Huazhen Wang, Fan Yang, and Zhiyuan Luo. An experimental study of the intrinsic stability of random forest variable importance measures.BMC bioinformatics, 17(1):60, 2016.
[71] Brian D Williamson, Peter B Gilbert, Noah Simon, and Marco Carone. Nonparametric variable importance assessment using machine learning techniques.bepress (unpublished preprint), 2017.
[72] Jingtao Yao, Nicholas Teng, Hean-Lee Poh, and Chew Lim Tan. Forecasting and analysis of marketing data using neural networks.J. Inf. Sci. Eng., 14(4):843-862, 1998.
[73] Ruoqing Zhu, Donglin Zeng, and Michael R Kosorok.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.