Near-optimal nonlinear regression trees. (English) Zbl 07331253

Summary: We propose Near-optimal Nonlinear Regression Trees with hyperplane splits (NNRTs) that use a polynomial prediction function in the leaf nodes, which we solve by stochastic gradient methods. On synthetic data, we show experimentally that the algorithm converges to the global optimal. We compare NNRTs, ORT-LH, Multivariate Adaptive Regression Splines (MARS), Random Forests (RF) and XGBoost on 40 real-world datasets and show that overall NNRTs have a performance edge over all other methods.


90-XX Operations research, mathematical programming
Full Text: DOI


[1] Breiman, L.; Friedman, J.; Stone, C. J.; Olshen, R. A., Classification and Regression Trees (1984), CRC Press · Zbl 0541.62042
[2] Bertsimas, D.; Dunn, J., Optimal classification trees, Mach. Learn., 106, 7, 1039-1082 (2017) · Zbl 1455.68159
[3] Bertsimas, D.; Dunn, J., Machine Learning Under a Modern Optimization Lens (2019), Dynamic Ideas
[4] Dunn, J., Optimal Trees for Prediction and Prescription (2018), Massachusetts Institute of Technology, (PhD dissertation)
[5] Quinlan, J. R., Learning with continuous classes, (5th Australian Joint Conference on Artificial Intelligence, Vol. 92 (1992), Singapore), 343-348
[6] Chaudhuri, P.; Lo, W.-D.; Loh, W.-Y.; Yang, C.-C., Generalized regression trees, Statist. Sinica, 641-666 (1995) · Zbl 0824.62060
[7] Hothorn, T.; Hornik, K.; Zeileis, A., Unbiased recursive partitioning: A conditional inference framework, J. Comput. Graph. Statist., 15, 3, 651-674 (2006)
[8] Loh, W.-Y., Regression tress with unbiased variable selection and interaction detection, Statist. Sinica, 361-386 (2002) · Zbl 0998.62042
[9] Kim, H.; Loh, W.-Y.; Shih, Y.-S.; Chaudhuri, P., Visualizable and interpretable regression models with good prediction power, IIE Trans., 39, 6, 565-579 (2007)
[10] Friedman, J. H., Multivariate adaptive regression splines, Ann. Statist., 1-67 (1991) · Zbl 0765.62064
[11] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2009), Springer Science & Business Media · Zbl 1273.62005
[12] Dua, D.; Graff, C., UCI machine learning repository (2017)
[13] Bertsimas, D.; Copenhaver, M. S., Characterization of the equivalence of robustification and regularization in linear and matrix regression, European J. Oper. Res., 270, 3, 931-942 (2018) · Zbl 1403.62040
[14] Bottou, L.; Bousquet, O., The tradeoffs of large scale learning, (Advances in Neural Information Processing Systems (2008)), 161-168
[15] Bottou, L., Large-scale machine learning with stochastic gradient descent, (Proceedings of COMPSTAT’2010 (2010), Springer), 177-186 · Zbl 1436.68293
[16] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, in: International Conference on Learning Representations, 2015.
[17] Bezanson, J.; Edelman, A.; Karpinski, S.; Shah, V. B., Julia: A fresh approach to numerical computing, SIAM Rev., 59, 1, 65-98 (2017) · Zbl 1356.68030
[18] Milborrow, M. S., Package ‘earth’ (2019), R Software package
[19] Liaw, A.; Wiener, M., Classification and regression by randomforest, R News, 2, 3, 18-22 (2002)
[20] Chen, T.; Guestrin, C., Xgboost: A scalable tree boosting system, (Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining (2016), ACM), 785-794
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.