zbMATH — the first resource for mathematics

Mathematical optimization in classification and regression trees. (English) Zbl 1467.90021
Summary: Classification and regression trees, as well as their variants, are off-the-shelf methods in Machine Learning. In this paper, we review recent contributions within the Continuous Optimization and the Mixed-Integer Linear Optimization paradigms to develop novel formulations in this research area. We compare those in terms of the nature of the decision variables and the constraints required, as well as the optimization algorithms proposed. We illustrate how these powerful formulations enhance the flexibility of tree models, being better suited to incorporate desirable properties such as cost-sensitivity, explainability, and fairness, and to deal with complex data, such as functional data.
90C11 Mixed integer programming
90C30 Nonlinear programming
Full Text: DOI
[1] Aghaei, S.; Azizi, MJ; Vayanos, P., Learning optimal and fair decision trees for non-discriminative decision-making, Proc AAAI Conf Artif Intell, 33, 1418-1426 (2019)
[2] Aghaei S, Gomez A, Vayanos P (2020) Learning optimal classification trees: strong max-flow formulations. arXiv:2002.09142
[3] Aglin G, Nijssen S, Schaus P (2020) Learning optimal decision trees using caching branch-and-bound search. In: Thirty-Fourth AAAI Conference on Artificial Intelligence
[4] Ahuja, RK; Magnanti, TL; Orlin, JB, Network flows: theory, algorithms, and applications (1993), New Jersey: Prentice Hall, New Jersey
[5] Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T., Permutation importance: a corrected feature importance measure, Bioinformatics, 26, 10, 1340-1347 (2010)
[6] Aouad A, Elmachtoub AN, Ferreira KJ, McNellis R (2019) Market segmentation trees. arXiv:1906.01174
[7] Apsemidis, A.; Psarakis, S.; Moguerza, JM, A review of machine learning kernel methods in statistical process monitoring, Comput Ind Eng, 142, 106376 (2020)
[8] Athanasopoulos, G.; Hyndman, RJ; Kourentzes, N.; Petropoulos, F., Forecasting with temporal hierarchies, Eur J Oper Res, 262, 1, 60-74 (2017) · Zbl 1403.62154
[9] Baesens, B.; Setiono, R.; Mues, C.; Vanthienen, J., Using neural network rule extraction and decision tables for credit-risk evaluation, Manage Sci, 49, 3, 312-329 (2003) · Zbl 1232.91684
[10] Balakrishnan S, Madigan D (2006) Decision trees for functional variables. In: Sixth international conference on data mining (ICDM’06), pp 798-802
[11] Barocas, S.; Selbst, AD, Big data’s disparate impact, California Law Rev, 104, 671 (2016)
[12] Barredo Arrieta A, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, García G, Gil-López S, Molina D, Benjamins R, Chatila R, Herrera F (2020) Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58:82-115
[13] Barros RC , Basgalupp MP, De Carvalho ACPLF, Freitas AA (2011) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(3):291-312
[14] Barrow, DK; Crone, SF, A comparison of Adaboost algorithms for time series forecast combination, Int J Forecast, 32, 4, 1103-1119 (2016)
[15] Bénard C, Biau G, Da Veiga S, Scornet E (2019) SIRUS: making random forests interpretable. arXiv:1908.06852 · Zbl 1458.62126
[16] Bénard C, Biau G, Da Veiga S, Scornet E (2020) Interpretable random forests via rule extraction. arXiv:2004.14841 · Zbl 1458.62126
[17] Benítez-Peña, S.; Bogetoft, P.; Romero Morales, D., Feature selection in data envelopment analysis: a mathematical optimization approach, Omega, 96, 102068 (2020)
[18] Benítez-Peña S, Carrizosa E, Guerrero V, Jiménez-Gamero MD, Martín-Barragán B, Molero-Río C, Ramírez-Cobo P, Romero Morales D, Sillero-Denamiel MR (2020b) On sparse ensemble methods: an application to short-term predictions of the evolution of covid-19. Technical report, IMUS, Sevilla, Spain. https://www.researchgate.net/publication/341608874_On_Sparse_Ensemble_Methods_An_Application_to_Short-Term_Predictions_of_the_Evolution_of_COVID-19
[19] Bennett KP (1992) Decision tree construction via linear programming. In: Computer Sciences Department, University of Wisconsin, Center for Parallel Optimization
[20] Bennett KP, Blue J (1996) Optimal decision trees. In: Rensselaer Polytechnic Institute Math Report, p 214
[21] Bennett, KP; Mangasarian, OL, Robust linear programming discrimination of two linearly inseparable sets, Optim Methods Softw, 1, 23-24 (1992)
[22] Bertsimas, D.; Dunn, J., Optimal classification trees, Mach Learn, 106, 7, 1039-1082 (2017) · Zbl 1455.68159
[23] Bertsimas, D.; Dunn, J.; Mundru, N., Optimal prescriptive trees, INFORMS J Optim, 1, 2, 164-183 (2019)
[24] Bertsimas, D.; O’Hair, A.; Relyea, S.; Silberholz, J., An analytics approach to designing combination chemotherapy regimens for cancer, Manage Sci, 62, 5, 1511-1531 (2016)
[25] Bertsimas, D.; Shioda, R., Classification and regression via integer optimization, Oper Res, 55, 2, 252-271 (2007) · Zbl 1167.90593
[26] Biau, G.; Devroye, L.; Lugosi, G., Consistency of random forests and other averaging classifiers, J Mach Learn Res, 9, 2015-2033 (2008) · Zbl 1225.62081
[27] Biau, G.; Scornet, E., A random forest guided tour, TEST, 25, 2, 197-227 (2016) · Zbl 1402.62133
[28] Birbil SI, Edali M, Yüceoğlu B (2020) Rule covering for interpretation and boosting. arXiv:2007.06379
[29] Bixby RE (2012) A brief history of linear and mixed-integer programming computation. Documenta Math 2012:107-121 · Zbl 1270.90003
[30] Blake CL, Merz CJ (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/ mlearn/MLRepository.html, University of California, Irvine, Department of Information and Computer Sciences
[31] Blanquero, R.; Carrizosa, E.; Jiménez-Cordero, A.; Martín-Barragán, B., Functional-bandwidth kernel for support vector machine with functional data: an alternating optimization algorithm, Eur J Oper Res, 275, 1, 195-207 (2019) · Zbl 1430.90450
[32] Blanquero, R.; Carrizosa, E.; Jiménez-Cordero, A.; Martín-Barragán, B., Selection of time instants and intervals with support vector regression for multivariate functional data, Comput Oper Res, 123, 105050 (2020)
[33] Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2021) Optimal Randomized classification trees. Forthcoming Compu Oper Res. doi:10.1016/j.cor.2021.105281 · Zbl 1441.62163
[34] Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2020a) On sparse optimal regression trees. In: Technical report, IMUS, Sevilla, Spain. https://www.researchgate.net/publication/326901224_Optimal_Randomized_Classification_Trees · Zbl 1441.62163
[35] Blanquero R, Carrizosa E, Molero-Río C, Romero Morales D (2020b) Sparsity in optimal randomized classification trees. Eur J Oper Res 284(1):255 - 272 · Zbl 1441.62163
[36] Botari T, Hvilshøj F, Izbicki R, de Carvalho ACPLF (2020) MeLIME: Meaningful local explanation for machine learning models. arXiv:2009.05818
[37] Bottou, L.; Curtis, F.; Nocedal, J., Optimization methods for large-scale machine learning, SIAM Rev, 60, 2, 223-311 (2018) · Zbl 1397.65085
[38] Breiman, L., Random forests, Mach Learn, 45, 1, 5-32 (2001) · Zbl 1007.68152
[39] Breiman, L.; Friedmann, JH; Olshen, RA; Stone, CJ, Classification and regression trees (1984), Belmont: Wadsworth, Belmont
[40] Brodley, CE; Utgoff, PE, Multivariate decision trees, Mach Learn, 19, 1, 45-77 (1995) · Zbl 0831.68091
[41] Carrizosa E, Galvis Restrepo M, Romero Morales D (2019) On clustering categories of categorical predictors in generalized linear models. Technical report, Copenhagen Business School, Denmark. https://www.researchgate.net/publication/349179679_On_Clustering_Categories_of_Categorical_Predictors_in_Generalized_Linear_Models
[42] Carrizosa, E.; Guerrero, V.; Hardt, D.; Romero Morales, D., On building online visualization maps for news data streams by means of mathematical optimization, Big Data, 6, 2, 139-158 (2018)
[43] Carrizosa, E.; Guerrero, V.; Romero Morales, D., Visualizing data as objects by DC (difference of convex) optimization, Math Program Ser B, 169, 119-140 (2018) · Zbl 1390.90616
[44] Carrizosa E, Guerrero V, Romero Morales D, Satorra A (2020a) Enhancing interpretability in factor analysis by means of mathematical optimization. Multivariate Behav Res 55(5):748-762
[45] Carrizosa E, Kurishchenko K, Marin A, Romero Morales D (2020b) Interpreting clusters by prototype optimization. Technical report, Copenhagen Business School, Denmark. https://www.researchgate.net/publication/349287282_Interpreting_Clusters_via_Prototype_Optimization
[46] Carrizosa E, Mortensen LH, Romero Morales D, Sillero-Denamiel MR (2020c) On linear regression models with hierarchical categorical variables. Technical report, IMUS, Sevilla, Spain. https://www.researchgate.net/publication/341042405_On_linear_regression_models_with_hierarchical_categorical_variables
[47] Carrizosa, E.; Nogales-Gómez, A.; Romero Morales, D., Clustering categories in support vector machines, Omega, 66, 28-37 (2017)
[48] Carrizosa, E.; Olivares-Nadal, AV; Ramírez-Cobo, P., Time series interpolation via global optimization of moments fitting, Eur J Oper Res, 230, 1, 97-112 (2013) · Zbl 1317.62068
[49] Carrizosa, E.; Romero Morales, D., Supervised classification and mathematical optimization, Comput Oper Res, 40, 1, 150-165 (2013) · Zbl 1349.68135
[50] Casalicchio G, Molnar C, Bischl B (2019) Visualizing the feature importance for black box models. In: Berlingerio M, Bonchi F, Gärtner T, Hurley N, Ifrim G (eds) Machine Learning and Knowledge Discovery in Databases, pp 655-670, Cham. Springer International Publishing
[51] Cerquitelli, T.; Quercia, D.; Pasquale, F., Transparent data mining for Big and small data (2017), Berlin: Springer, Berlin
[52] Chen, D.; Fraiberger, SP; Moakler, R.; Provost, F., Enhancing transparency and control when drawing data-driven inferences about individuals, Big Data, 5, 3, 197-212 (2017)
[53] Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 785-794
[54] Cohen, S.; Dror, G.; Ruppin, E., Feature selection via coalitional game theory, Neural Comput, 19, 7, 1939-1961 (2007) · Zbl 1173.91305
[55] Cui Z, Chen W, He Y, Chen Y (2015) Optimal action extraction for random forests and boosted trees. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 179-188
[56] Dash S, Günlük O, Wei D (2018) Boolean decision rules via column generation. In: Advances in neural information processing systems, pp 4655-4665
[57] Demiriz, A.; Bennett, KP; Shawe-Taylor, J., Linear programming boosting via column generation, Mach Learn, 46, 225-254 (2002) · Zbl 0998.68105
[58] Demirović E, Lukina A, Hebrard E, Chan J, Bailey J, Leckie C, Ramamohanarao K, Stuckey PJ (2020) MurTree: optimal classification trees via dynamic programming and search. arXiv:2007.12652
[59] Demirović E, Stuckey PJ (2020) Optimal decision trees for nonlinear metrics. arXiv:2009.06921
[60] Deng H, Runger G (2012) Feature selection via regularized trees. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp 1-8, IEEE
[61] Deng, H.; Runger, G., Gene selection with guided regularized random forest, Pattern Recogn, 46, 12, 3483-3489 (2013)
[62] Denil M, Matheson D, Freitas N (2013) Consistency of online random forests. In: International Conference on Machine Learning, pp 1256-1264
[63] Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Machine Learning Proceedings 1995, pp 194-202, Elsevier
[64] Duarte Silva, AP, Optimization approaches to supervised classification, Eur J Oper Res, 261, 2, 772-788 (2017) · Zbl 1403.62114
[65] Dunn J (2018) Optimal trees for prediction and prescription. In: PhD thesis, Massachusetts Institute of Technology
[66] Esteve, M.; Aparicio, J.; Rabasa, A.; Rodriguez-Sala, JJ, Efficiency analysis trees: a new methodology for estimating production frontiers through decision trees, Expert Syst Appl, 162, 113783 (2020)
[67] European Commission (2020) White Paper on Artificial Intelligence: a European approach to excellence and trust. https://ec.europa.eu/info/sites/info/files/commission-white-paper-artificial-intelligence-feb2020_en.pdf
[68] Fang X, Liu Sheng OR, Goes P (2013) When is the right time to refresh knowledge discovered from data? Oper Res 61(1):32-44 · Zbl 1267.90166
[69] Fawagreh K, Medhat Gaber M, Elyan E (2014) Random forests: from early developments to recent advancements. Syst Sci Control Eng 2(1):602-609
[70] Fayyad UM, Irani KB (1992) The attribute selection problem in decision tree generation. In: AAAI, pp 104-110
[71] Fernández-Delgado, M.; Cernadas, E.; Barro, S.; Amorim, D., Do we need hundreds of classifiers to solve real world classification problems?, J Mach Learn Res, 15, 3133-3181 (2014) · Zbl 1319.62005
[72] Fernández, RR; Martín de Diego, I.; Aceña, V.; Fernández-Isabel, A.; Moguerza, JM, Random forest explainability using counterfactual sets, Inf Fusion, 63, 196-207 (2020)
[73] Firat, M.; Crognier, G.; Gabor, AF; Hurkens, CAJ; Zhang, Y., Column generation based heuristic for learning classification trees, Comput Oper Res, 116, 104866 (2020) · Zbl 1458.68201
[74] Fountoulakis, K.; Gondzio, J., A second-order method for strongly convex \(\ell_1\)-regularization problems, Math Program, 156, 1, 189-219 (2016) · Zbl 1364.90255
[75] Freitas, AA, Comprehensible classification models: a position paper, ACM SIGKDD Explor Newsl, 15, 1, 1-10 (2014)
[76] Freund, Y.; Schapire, RE, A decision-theoretic generalization of on-line learning and an application to boosting, J Comput Syst Sci, 55, 1, 119-139 (1997) · Zbl 0880.68103
[77] Friedman, JH, Greedy function approximation: a gradient boosting machine, Ann Stat, 29, 5, 1189-1232 (2001) · Zbl 1043.62034
[78] Friedman, JH, Stochastic gradient boosting, Comput Stat Data Anal, 38, 4, 367-378 (2002) · Zbl 1072.65502
[79] Fu, Z.; Golden, BL; Lele, S.; Raghavan, S.; Wasil, EA, A genetic algorithm-based approach for building accurate decision trees, INFORMS J Comput, 15, 1, 3-22 (2003) · Zbl 1238.90076
[80] Gambella C, Ghaddar B, Naoum-Sawaya J (2020) Optimization models for machine learning: a survey. Eur J of Oper Res 290(3):807-828
[81] Genuer, R.; Poggi, J-M; Tuleau-Malot, C.; Villa-Vialaneix, N., Random forests for big data, Big Data Res, 9, 28-46 (2017)
[82] Georganos S, Grippa T, Gadiaga AN, Linard C, Lennert M, Vanhuysse S, Mboga N, Wolff E, Kalogirou S (2019) Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto International 36(2):121-136
[83] Gevrey, M.; Dimopoulos, I.; Lek, S., Review and comparison of methods to study the contribution of variables in artificial neural network models, Ecol Model, 160, 3, 249-264 (2003)
[84] González, S.; García, S.; Del Ser, J.; Rokach, L.; Herrera, F., A practical tutorial on bagging and boosting based ensembles for machine learning: algorithms, software tools, performance study, practical perspectives and opportunities, Inf Fusion, 64, 205-237 (2020)
[85] Goodfellow, I.; Bengio, Y.; Courville, A., Deep learning (2016), Hoboken: MIT Press, Hoboken · Zbl 1373.68009
[86] Goodman, B.; Flaxman, S., European Union regulations on algorithmic decision-making and a “right to explanation”, AI Mag, 38, 3, 50-57 (2017)
[87] Grubinger, T.; Zeileis, A.; Pfeiffer, K-P, evtree: evolutionary learning of globally optimal classification and regression trees in R, J Stat Softw Articles, 61, 1, 1-29 (2014)
[88] Günlük O, Kalagnanam J, Menickelly M, Scheinberg K (2019) Optimal decision trees for categorical data via integer programming. arXiv:1612.03225v3
[89] Gunning, D.; Aha, DW, DARPA’s explainable artificial intelligence program, AI Mag, 40, 2, 44-58 (2019)
[90] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, J Mach Learn Res, 3, 1157-1182 (2003) · Zbl 1102.68556
[91] Hastie, T.; Rosset, S.; Zhu, J.; Zou, H., Multi-class AdaBoost, Stat Interface, 2, 3, 349-360 (2009) · Zbl 1245.62080
[92] Hastie, T.; Tibshirani, R.; Friedman, J., The elements of statistical learning (2009), New York: Springer, New York
[93] Hastie, T.; Tibshirani, R.; Wainwright, M., Statistical learning with sparsity: the lasso and generalizations (2015), Hoboken: CRC Press, Hoboken · Zbl 1319.68003
[94] Hofman, JM; Sharma, A.; Watts, DJ, Prediction and explanation in social systems, Science, 355, 6324, 486-488 (2017)
[95] Holter S, Gomez O, Bertini E (2018) FICO Explainable Machine Learning Challenge. https://community.fico.com/s/explainable-machine-learning-challenge
[96] Höppner S, Stripling E, Baesens B, vanden Broucke S, Verdonck T (2020) Profit driven decision trees for churn prediction. Eur J Oper Res 284(3):920-933 · Zbl 1441.62739
[97] Hu, X.; Rudin, C.; Seltzer, M., Optimal sparse decision trees, Adv Neural Inf Process Syst, 32, 7265-7273 (2019)
[98] Hyafil, L.; Rivest, RL, Constructing optimal binary decision trees is NP-complete, Inf Process Lett, 5, 1, 15-17 (1976) · Zbl 0333.68029
[99] Iosifidis V, Ntoutsi E (2019) Adafair: cumulative fairness adaptive boosting. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, pp 781-790, New York, NY, USA, Association for Computing Machinery
[100] Irsoy O, Yıldız OT, Alpaydın E (2012) Soft decision trees. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), pp 1819-1822
[101] Izza Y, Ignatiev A, Marques-Silva J (2020) On explaining decision trees. arXiv:2010.11034
[102] Jakaitiene A, Sangiovanni M, Guarracino MR, Pardalos PM (2016) Multidimensional scaling for genomic data, pp 129-139. Springer International Publishing, Cham · Zbl 1359.62478
[103] Jung J, Concannon C, Shroff R, Goel S, Goldstein DG (2017) Creating simple rules for complex decisions. Harvard Business Rev 2017:1
[104] Jung, J.; Concannon, C.; Shroff, R.; Goel, S.; Goldstein, DG, Simple rules to guide expert classifications, J R Stat Soc Ser A (Stat Soc), 183, 3, 771-800 (2020)
[105] Kaloudi, N.; Li, J., The AI-based cyber threat landscape: a survey, ACM Comput Surv (CSUR), 53, 1, 1-34 (2020)
[106] Kao, H-P; Tang, K., Cost-sensitive decision tree induction with label-dependent late constraints, INFORMS J Comput, 26, 2, 238-252 (2014) · Zbl 1357.68173
[107] Karimi A-H, Barthe G, Schölkopf B, Valera I (2020) A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arXiv:2010.04050
[108] Karmy, JP; Maldonado, S., Hierarchical time series forecasting via support vector regression in the European travel retail industry, Expert Syst Appl, 137, 59-73 (2019)
[109] Kass, GV, An exploratory technique for investigating large quantities of categorical data, J R Stat Soc Ser C (Appl Stat), 29, 2, 119-127 (1980)
[110] Katuwal, R.; Suganthan, PN; Zhang, L., Heterogeneous oblique random forest, Pattern Recogn, 99, 107078 (2020)
[111] Khalil EB, Le Bodic P, Song L, Nemhauser GL, Dilkina BN (2016) Learning to branch in mixed integer programming. In: AAAI, pp 724-731
[112] Kim, H.; Loh, W-Y, Classification trees with unbiased multiway splits, J Am Stat Assoc, 96, 454, 589-604 (2001)
[113] Kleinberg, J.; Lakkaraju, H.; Leskovec, J.; Ludwig, J.; Mullainathan, S., Human decisions and machine predictions, Q J Econ, 133, 1, 237-293 (2018) · Zbl 1405.91119
[114] Koenker, R.; Hallock, KF, Quantile regression, J Econ Perspect, 15, 4, 143-156 (2001)
[115] Kriegler B, Berk R (2010) Small area estimation of the homeless in Los Angeles: an application of cost-sensitive stochastic gradient boosting. Ann Appl Stat 2010:1234-1255 · Zbl 1202.62178
[116] Li, X-B; Sweigart, JR; Teng, JTC; Donohue, JM; Thombs, LA; Wang, SM, Multivariate decision trees using linear discriminants and tabu search, IEEE Trans Syst Man Cybern-Part A Syst Hum, 33, 2, 194-205 (2003)
[117] Liberti, L., Distance geometry and data science, TOP, 28, 271-339 (2020) · Zbl 07215401
[118] Lin J, Zhong C, Hu D, Rudin C, Seltzer M (2020) Generalized and scalable optimal sparse decision trees. arXiv:2006.08690
[119] Liu, H.; Hussain, F.; Tan, C.; Dash, M., Discretization: an enabling technique, Data Min Knowl Disc, 6, 4, 393-423 (2002)
[120] Lodi, A.; Zarpellon, G., On learning and branching: a survey, TOP, 25, 2, 207-236 (2017) · Zbl 1372.90003
[121] Loh, W-Y, Fifty years of classification and regression trees, Int Stat Rev, 82, 3, 329-348 (2014) · Zbl 1416.62347
[122] Loh, W-Y; Shih, Y-S, Split selection methods for classification trees, Stat Sin, 7, 4, 815-840 (1997) · Zbl 1067.62545
[123] Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. Adv Neural Inf Process Syst 2013:431-439
[124] Lucic A, Oosterhuis H, Haned H, de Rijke M (2020) FOCUS: Flexible optimizable counterfactual explanations for tree ensembles. arXiv:1911.12199
[125] Lundberg, SM; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, JM; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S-I, From local explanations to global understanding with explainable AI for trees, Nature Mach Intell, 2, 1, 2522-5839 (2020)
[126] Lundberg SM, Erion G, Lee S-I (2018) Consistent individualized feature attribution for tree ensembles. arXiv:1802.03888
[127] Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 2017:4765-4774
[128] Martens, D.; Baesens, B.; Gestel, TV; Vanthienen, J., Comprehensible credit scoring models using rule extraction from support vector machines, Eur J Oper Res, 183, 3, 1466-1476 (2007) · Zbl 1278.91177
[129] Martens, D.; Provost, F., Explaining data-driven document classifications, MIS Q, 38, 1, 73-99 (2014)
[130] Martínez Torres J, Iglesias Comesaña C, García-Nieto PJ (2019) Machine learning techniques applied to cybersecurity. Int J Mach Learn Cybern 10(10):2823-2836
[131] Meinshausen, N., Node harvest, Ann Appl Stat, 4, 4, 2049-2072 (2010) · Zbl 1220.62084
[132] Menze BH, Kelm BM, Splitthoff DN, Koethe U, Hamprecht FA (2011) On oblique random forests. In: Gunopulos D, Hofmann T, Malerba D, Vazirgiannis M (eds) Machine Learning and Knowledge Discovery in Databases, pp 453-469
[133] Miller, T., Explanation in artificial intelligence: insights from the social sciences, Artif Intell, 267, 1-38 (2019) · Zbl 07099170
[134] Miron M, Tolan S, Gómez E, Castillo C (2020) Addressing multiple metrics of group fairness in data-driven decision making. arXiv:2003.04794
[135] Mišić VV (2020) Optimization of Tree Ensembles. Oper Res 68(5):1605-1624 · Zbl 1457.90098
[136] Möller, A.; Tutz, G.; Gertheiss, J., Random forests for functional covariates, J Chemom, 30, 12, 715-725 (2016)
[137] Molnar, C.; Casalicchio, G.; Bischl, B., iml: an R package for interpretable machine learning, J Open Sourc Softw, 3, 26, 786 (2018)
[138] Molnar C, Casalicchio G, Bischl B (2020) Interpretable machine learning – a brief history, state-of-the-art and challenges. arXiv:2010.09337
[139] Mothilal RK, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp 607-617
[140] Murthy, SK; Kasif, S.; Salzberg, S., A system for induction of oblique decision trees, J Artif Intell Res, 2, 1-32 (1994) · Zbl 0900.68335
[141] Narodytska N, Ignatiev A, Pereira F, Marques-Silva J (2018) Learning Optimal Decision Trees with SAT. In: Proceedings of the Twenty-Seventh international joint conference on artificial intelligence (IJCAI-18), pp 1362-1368 · Zbl 06958127
[142] Nijssen, S.; Fromont, E., Optimal constraint-based decision tree induction from itemset lattices, Data Min Knowl Disc, 21, 1, 9-51 (2010)
[143] Norouzi M, Collins M, Johnson MA, Fleet DJ, Kohli P (2015) Efficient non-greedy optimization of decision trees. Adv Neural Inf Process Syst 2015:1729-1737
[144] Orsenigo, C.; Vercellis, C., Multivariate classification trees based on minimum features discrete support vector machines, IMA J Manag Math, 14, 3, 221-234 (2003) · Zbl 1115.90406
[145] Óskarsdóttir M, Ahmed W, Antonio K, Baesens B, Dendievel R, Donas T, Reynkens T (2020) Social network analytics for supervised fraud detection in insurance. arXiv:2009.08313
[146] Palagi, L., Global optimization issues in deep network regression: an overview, J Global Optim, 73, 2, 239-277 (2019) · Zbl 1421.90154
[147] Pangilinan, JM; Janssens, GK, Pareto-optimality of oblique decision trees from evolutionary algorithms, J Global Optim, 51, 2, 301-311 (2011) · Zbl 1230.90171
[148] Pardalos PM, Boginski VL, Vazacopoulos A (eds) (2007) Data mining in biomedicine. Springer optimization and its applications, Springer
[149] Pfetsch ME, Pokutta S (2020) IPBoost—non-convex boosting via integer programming. arxiv:2002.04679
[150] Piccialli, V.; Sciandrone, M., Nonlinear optimization and support vector machines, 4OR, 16, 2, 111-149 (2018) · Zbl 1398.65126
[151] Pospisil T, Lee AB (2019) (f)RFCDE: Random forests for conditional density estimation and functional data. arXiv:1906.07177
[152] Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
[153] Rahman, R.; Dhruba, SR; Ghosh, S.; Pal, R., Functional random forest with applications in dose-response predictions, Sci Rep, 9, 1, 1-14 (2019)
[154] Ramon Y, Martens D, Provost F, Evgeniou T (2020) A comparison of instance-level counterfactual explanation algorithms for behavioral and textual data: SEDC, LIME-C and SHAP-C. Adv Data Anal Classif 2020:5
[155] Ribeiro MT, Singh S, Guestrin C (2016) Why Should I Trust You?: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135-1144
[156] Ridgeway, G., The pitfalls of prediction, Natl Inst Justice J, 271, 34-40 (2013)
[157] Romei, A.; Ruggieri, S., A multidisciplinary survey on discrimination analysis, Knowl Eng Rev, 29, 5, 582-638 (2014)
[158] Rudin, C., Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, Nature Mach Intell, 1, 5, 206-215 (2019)
[159] Rudin, C.; Ertekin, Ş., Learning customized and optimized lists of rules with mathematical programming, Math Program Comput, 10, 4, 659-702 (2018) · Zbl 1411.90234
[160] Ruggieri, S., Complete search for feature selection in decision trees, J Mach Learn Res, 20, 104, 1-34 (2019) · Zbl 1446.68141
[161] Saha A, Basu S, Datta A (2020) Random forests for dependent data. arXiv:2007.15421
[162] Savickỳ P, Klaschka J, Antoch J (2000) Optimal classification trees. In: COMPSTAT, pp 427-432, Springer · Zbl 1455.62128
[163] Scornet, E., On the asymptotics of random forests, J Multivariate Anal, 146, 72-83 (2016) · Zbl 1337.62063
[164] Scornet, E.; Biau, G.; Vert, J-P, Consistency of random forests, Ann Stat, 43, 4, 1716-1741 (2015) · Zbl 1317.62028
[165] Sherali, HD; Hobeika, AG; Jeenanunta, C., An optimal constrained pruning strategy for decision trees, INFORMS J Comput, 21, 1, 49-61 (2009) · Zbl 1243.91029
[166] Sokol K, Flach PA (2019) Counterfactual explanations of machine learning predictions: opportunities and challenges for AI safety. In: SafeAI @ AAAI
[167] Souillard-Mandar, W.; Davis, R.; Rudin, C.; Au, R.; Libon, DJ; Swenson, R.; Price, CC; Lamar, M.; Penney, DL, Learning classification models of cognitive conditions from subtle behaviors in the digital clock drawing test, Mach Learn, 102, 3, 393-441 (2016)
[168] Street, WN, Oblique multicategory decision trees using nonlinear programming, INFORMS J Comput, 17, 1, 25-31 (2005) · Zbl 1239.68064
[169] Strobl, C.; Boulesteix, A-L; Kneib, T.; Augustin, T.; Zeileis, A., Conditional variable importance for random forests, BMC Bioinf, 9, 1, 307 (2008)
[170] Su, X.; Wang, M.; Fan, J., Maximum likelihood regression trees, J Comput Graph Stat, 13, 3, 586-598 (2004)
[171] Therneau T, Atkinson B, Ripley B (2015) rpart: recursive partitioning and regression trees, 2015. R package version 4.1-10
[172] Truong A (2009) Fast growing and interpretable oblique trees via logistic regression models. In: Ph.D. thesis, University of Oxford, UK
[173] Tuncel, KS; Baydogan, MG, Autoregressive forests for multivariate time series modeling, Pattern Recogn, 73, 202-215 (2018)
[174] Turney, PD, Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm, J Artif Intell Res, 2, 369-409 (1995)
[175] Ustun, B.; Rudin, C., Supersparse linear integer models for optimized medical scoring systems, Mach Learn, 102, 3, 349-391 (2016) · Zbl 1406.62144
[176] Van Vlasselaer, V.; Eliassi-Rad, T.; Akoglu, L.; Snoeck, M.; Baesens, B., GOTCHA! Network-based fraud detection for social security fraud, Manage Sci, 63, 9, 3090-3110 (2017)
[177] Verhaeghe H, Nijssen S, Pesant G, Quimper C-G, Schaus P (2019) Learning optimal decision trees using constraint programming. In: The 25th International Conference on Principles and Practice of Constraint Programming (CP2019)
[178] Verma S, Dickerson J, Hines K (2020) Counterfactual explanations for machine learning: a review. arXiv:2010.10596
[179] Verwer S, Zhang Y (2017) Learning decision trees with flexible constraints and objectives using integer optimization. In Salvagnin D, Lombardi M (eds) Integration of AI and OR techniques in constraint programming: 14th International Conference, CPAIOR 2017, Padua, Italy. Proceedings, pp 94-103 · Zbl 06756578
[180] Verwer, S.; Zhang, Y.; Ye, QC, Auction optimization using regression trees and linear models as integer programs, Artif Intell, 244, 368-395 (2017) · Zbl 1404.68122
[181] Verwer, S.; Zhang, Y.; Ye, QC, Learning optimal classification trees using a binary linear program formulation, Proc AAAI Conf Artif Intel, 33, 1625-1632 (2019)
[182] Vidal T, Pacheco T, Schiffer M (2020) Born-again tree ensembles. arXiv:2003.11132
[183] Visani G, Bagli E, Chesani F, Poluzzi A, Capuzzo D (2020) Statistical stability indices for LIME: obtaining reliable explanations for machine learning models. arXiv:2001.11757
[184] Wachter, S.; Mittelstadt, B.; Russell, C., Counterfactual explanations without opening the black box: automated decisions and the GDPR, Harvard J Law Technol, 31, 841-887 (2017)
[185] Wager, S.; Athey, S., Estimation and inference of heterogeneous treatment effects using random forests, J Am Stat Assoc, 113, 523, 1228-1242 (2018) · Zbl 1402.62056
[186] Weston, J.; Elisseeff, A.; Schölkopf, B.; Tipping, M., Use of the zero norm with linear models and kernel methods, J Mach Learn Res, 3, 1439-1461 (2003) · Zbl 1102.68605
[187] Wickramarachchi, DC; Robertson, BL; Reale, M.; Price, CJ; Brown, J., HHCART: an oblique decision tree, Comput Stat Data Anal, 96, 12-23 (2016) · Zbl 06918560
[188] Wickramasuriya, SL; Athanasopoulos, G.; Hyndman, RJ, Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization, J Am Stat Assoc, 114, 526, 804-819 (2019) · Zbl 1420.62402
[189] Yang, L.; Liu, S.; Tsoka, S.; Papageorgiou, LG, A regression tree approach using mathematical programming, Expert Syst Appl, 78, 347-357 (2017)
[190] Yang Y, Garcia Morillo I, Hospedales TM (2018) Deep neural decision trees. arXiv:1806.06988
[191] Yu J, Ignatiev A, Stuckey PJ, Le Bodic P (2020) Computing Optimal Decision Sets with SAT. arXiv:2007.15140
[192] Zafar MB, Valera I, Gomez Rodriguez M, Gummadi KP (2017) Fairness constraints: mechanisms for fair classification. In: Artificial Intelligence and Statistics, pp 962-970, PMLR
[193] Zantedeschi V, Kusner MJ, Niculae V (2020) Learning binary trees via sparse relaxation. arXiv:2010.04627
[194] Zeng, J.; Ustun, B.; Rudin, C., Interpretable classification models for recidivism prediction, J R Stat Soc Ser A, 180, 3, 689-722 (2017)
[195] Zhang Y, Song K, Sun Y, Tan S, Udell M (2019) Why should you trust my explanation? Understanding Uncertainty in LIME Explanations. arXiv:1904.12991
[196] Zhu H, Murali P, Phan DT, Nguyen LM, Kalagnanam JR (2020) A scalable MIP-based method for learning optimal multivariate decision trees. Adv Neural Inf Process Syst 2020:33
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.