×

Optimization of tree ensembles. (English) Zbl 1457.90098

Summary: Tree ensemble models such as random forests and boosted trees are among the most widely used and practically successful predictive models in applied machine learning and business analytics. Although such models have been used to make predictions based on exogenous, uncontrollable independent variables, they are increasingly being used to make predictions where the independent variables are controllable and are also decision variables. In this paper, we study the problem of tree ensemble optimization: given a tree ensemble that predicts some dependent variable using controllable independent variables, how should we set these variables so as to maximize the predicted value? We formulate the problem as a mixed-integer optimization problem. We theoretically examine the strength of our formulation, provide a hierarchy of approximate formulations with bounds on approximation quality and exploit the structure of the problem to develop two large-scale solution methods, one based on Benders decomposition and one based on iteratively generating tree split constraints. We test our methodology on real data sets, including two case studies in drug design and customized pricing, and show that our methodology can efficiently solve large-scale instances to near or full optimality, and outperforms solutions obtained by heuristic approaches.
The e-companion is available at https://doi.org/10.1287/opre.2019.1928.

MSC:

90C11 Mixed integer programming
90C90 Applications of mathematical programming
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Atomwise Inc. (2017) Atomwise—Better medicines faster. Accessed May 27, 2017, http://www.atomwise.com.Google Scholar
[2] Bajari P, Nekipelov D, Ryan SP, Yang M (2015) Machine learning methods for demand estimation. Amer. Econom. Rev. 105(5):481-485.Crossref, Google Scholar · doi:10.1257/aer.p20151021
[3] Bertsimas D, Dunn J (2017) Optimal classification trees. Machine Learn. 106(7):1039-1082.Crossref, Google Scholar · Zbl 1455.68159 · doi:10.1007/s10994-017-5633-9
[4] Bertsimas D, Kallus N (2016) The power and limits of predictive approaches to observational-data-driven optimization. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
[5] Bertsimas D, King A (2015) OR Forum: An algorithmic approach to linear regression. Oper. Res. 64(1):2-16.Link, Google Scholar · Zbl 1338.90272
[6] Bertsimas D, Mišić VV (2019) Exact first-choice product line optimization. Oper. Res. 67(3):651-670.Link, Google Scholar · Zbl 1444.90071
[7] Bertsimas D, Lulli G, Odoni A (2011) An integer optimization approach to large-scale air traffic flow management. Oper. Res. 59(1):211-227.Link, Google Scholar · Zbl 1218.90119
[8] Bertsimas D, O’Hair A, Relyea S, Silberholz J (2016) An analytics approach to designing combination chemotherapy regimens for cancer. Management Sci. 62(5):1511-1531.Link, Google Scholar
[9] Besbes O, Phillips R, Zeevi A (2010) Testing the validity of a demand model: An operations perspective. Manufacturing Service Oper. Management 12(1):162-183.Link, Google Scholar
[10] Bezanson J, Karpinski S, Shah VB, Edelman A (2012) Julia: A fast dynamic language for technical computing. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
[11] Biau G, Scornet E (2016) A random forest guided tour. TEST 25(2):197-227.Crossref, Google Scholar · Zbl 1402.62133 · doi:10.1007/s11749-016-0481-7
[12] Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J. Machine Learn. Res. 9(Sep):2015-2033.Google Scholar · Zbl 1225.62081
[13] Breiman L (1996) Bagging predictors. Machine Learn. 24(2):123-140.Crossref, Google Scholar · Zbl 0858.68080 · doi:10.1007/BF00058655
[14] Breiman L (2001) Random forests. Machine Learn. 45(1):5-32.Crossref, Google Scholar · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[15] Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees (CRC Press, Boca Raton, FL).Google Scholar
[16] Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 785-794.Google Scholar
[17] Cohen MC, Leung N-HZ, Panchamgam K, Perakis G, Smith A (2017) The impact of linear optimization on promotion planning. Oper. Res. 65(2):446-468.Link, Google Scholar · Zbl 1366.90140
[18] Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47(4):547-553.Crossref, Google Scholar · doi:10.1016/j.dss.2009.05.016
[19] Crama Y (1993) Concave extensions for nonlinear 0-1 maximization problems. Math. Programming 61(1-3):53-60.Crossref, Google Scholar · Zbl 0796.90040 · doi:10.1007/BF01582138
[20] Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(1): article 3.Crossref, Google Scholar · doi:10.1186/1471-2105-7-3
[21] Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems. J. Machine Learn. Res. 15(1):3133-3181.Google Scholar · Zbl 1319.62005
[22] Ferreira KJ, Lee BHA, Simchi-Levi D (2015) Analytics for an online retailer: Demand forecasting and price optimization. Manufacturing Service Oper. Management 18(1):69-88.Link, Google Scholar
[23] Gurobi Optimization, Inc. (2015) Gurobi Optimizer Reference Manual. http://www.gurobi.com.Google Scholar
[24] Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Machine Intelligence 20(8):832-844.Crossref, Google Scholar · doi:10.1109/34.709601
[25] Huuskonen J (2000) Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. J. Chemical Inform. Comput. Sci. 40(3):773-777.Crossref, Google Scholar · doi:10.1021/ci9901338
[26] Kallus N (2017) Recursive partitioning for personalization using observational data. Proc. 34th Internat. Conf. Machine Learn., vol. 70, 1789-1798.Google Scholar
[27] Kansy M, Senner F, Gubernator K (1998) Physicochemical high throughput screening: Parallel artificial membrane permeation assay in the description of passive absorption processes. J. Medicinal Chemistry 41(7):1007-1010.Crossref, Google Scholar · doi:10.1021/jm970530e
[28] Kuhn M, Johnson K (2014) AppliedPredictiveModeling: Functions and Data Sets for ‘Applied Predictive Modeling.’ R package version 1.1-6. https://cran.r-project.org/web/packages/AppliedPredictiveModeling/index.html.Google Scholar
[29] Lemmens A, Croux C (2006) Bagging and boosting classification trees to predict churn. J. Marketing Res. 43(2):276-286.Crossref, Google Scholar · doi:10.1509/jmkr.43.2.276
[30] Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18-22.Google Scholar
[31] Lichman M (2013) UCI machine learning repository. Center for Machine Learning and Intelligent Systems. Accessed October 9, 2019, http://archive.ics.uci.edu/ml.Google Scholar
[32] Lubin M, Dunning I (2015) Computing in operations research using Julia. INFORMS J. Comput. 27(2):238-248.Link, Google Scholar · Zbl 1331.90001
[33] Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) Deep neural nets as a method for quantitative structure-activity relationships. J. Chemical Inform. Modeling 55(2):263-274.Crossref, Google Scholar · doi:10.1021/ci500747n
[34] Mišić VV (2016) Data, models and decisions for large-scale stochastic optimization problems. Unpublished PhD thesis, Massachusetts Institute of Technology, Cambridge.Google Scholar
[35] Montgomery AL (1997) Creating micro-marketing pricing strategies using supermarket scanner data. Marketing Sci. 16(4):315-337.Link, Google Scholar
[36] Quinlan JR (1986) Induction of decision trees. Machine Learn. 1(1):81-106.Crossref, Google Scholar · doi:10.1007/BF00116251
[37] Ridgeway G (2006) gbm: Generalized boosted regression models. R Package version 1.5-7.Google Scholar
[38] Scannell JW, Blanckley A, Boldon H, Warrington B (2012) Diagnosing the decline in pharmaceutical R&D efficiency. National Rev. Drug Discovery 11(3):191-200.Crossref, Google Scholar · doi:10.1038/nrd3681
[39] Schapire RE, Freund Y (2012) Boosting: Foundations and Algorithms (MIT Press, Cambridge, MA).Google Scholar · Zbl 1278.68021
[40] Scornet E, Biau G, Vert J-P (2015) Consistency of random forests. Ann. Statist. 43(4):1716-1741.Crossref, Google Scholar · Zbl 1317.62028 · doi:10.1214/15-AOS1321
[41] Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: A classification and regression tool for compound classification and qsar modeling. J. Chemical Inform. Comput. Sci. 43(6):1947-1958.Crossref, Google Scholar · doi:10.1021/ci034160g
[42] Tetko IV, Tanchuk VY, Kasheva TN, Villa AEP (2001) Estimation of aqueous solubility of chemical compounds using e-state indices. J. Chemical Inform. Comput. Sci. 41(6):1488-1493.Crossref, Google Scholar · doi:10.1021/ci000392t
[43] Varian HR (2014) Big data: New tricks for econometrics. J. Econom. Perspect. 28(2):3-27.Crossref, Google Scholar · doi:10.1257/jep.28.2.3
[44] Vielma JP (2015) Mixed integer linear programming formulation techniques. SIAM Rev. 57(1):3-57.Crossref, Google Scholar · Zbl 1338.90277 · doi:10.1137/130915303
[45] Willett P, Barnard JM, Downs GM (1998) Chemical similarity searching. J. Chemical Inform. Comput. Sci. 38(6):983-996.Crossref, Google Scholar · doi:10.1021/ci9800211
[46] Wright MN, Ziegler A (2017) ranger: A fast implementation of random forests for high dimensional data in C++ and R. J. Statist. Software 77(1):1-17.Crossref, Google Scholar · doi:10.18637/jss.v077.i01
[47] Yeh I-C (1998) Modeling of strength of high-performance concrete using artificial neural networks. Cement Concrete Res. 28(12):1797-1808.Crossref, · doi:10.1016/S0008-8846(98)00165-3
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.