Greedy function approximation: A gradient boosting machine. (English) Zbl 1043.62034

Summary: Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression [P. Huber, Ann. Math. Stat. 35, 73–101 (1964; Zbl 0136.39805)], and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such “TreeBoost” models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Y. Freund and R. E. Shapire [see J. Comput. Syst. Sci. 55, 119–139 (1997; Zbl 0880.68103)] and J. Friedman, T. Hastie and R. Tibshirani [Ann. Stat. 28, 337–407 (2000; Zbl 1106.62323)] are discussed.


62G08 Nonparametric regression and quantile regression
62-07 Data analysis (statistics) (MSC2010)
65C60 Computational problems in statistics (MSC2010)
62K10 Statistical block designs


Full Text: DOI


[1] Becker, R. A. and Cleveland, W. S (1996). The design and control ofTrellis display. J. Comput. Statist. Graphics 5 123-155.
[2] Breiman, L. (1997). Pasting bites together for prediction in large data sets and on-line. Technical report, Dept. Statistics, Univ. California, Berkeley.
[3] Breiman, L. (1999). Prediction games and arcing algorithms. Neural Comp. 11 1493-1517.
[4] Breiman, L., Friedman, J. H., Olshen, R. and Stone, C. (1983). Classification and Regression Trees. Wadsworth, Belmont, CA. · Zbl 0541.62042
[5] Copas, J. B. (1983). Regression, prediction, and shrinkage (with discussion). J. Roy. Statist. Soc. Ser. B 45 311-354. JSTOR: · Zbl 0532.62048
[6] Donoho, D. L. (1993). Nonlinear wavelete methods for recovery of signals, densities, and spectra from indirect and noisy data. In Different Perspectives on Wavelets. Proceedings of Symposium in Applied Mathematics (I. Daubechies, ed.) 47 173-205. Amer. Math. Soc., Providence RI. · Zbl 0786.62094
[7] Drucker, H. (1997). Improving regressors using boosting techniques. Proceedings of Fourteenth International Conference on Machine Learning (D. Fisher, Jr., ed.) 107-115. MorganKaufmann, San Francisco.
[8] Duffy, N. and Helmbold, D. (1999). A geometric approach to leveraging weak learners. In Computational Learning Theory. Proceedings of 4th European Conference EuroCOLT99 (P. Fischer and H. U. Simon, eds.) 18-33. Springer, New York. · Zbl 0997.68166
[9] Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference 148-156. Morgan Kaufman, San Francisco.
[10] Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141. · Zbl 0765.62064
[11] Friedman J. H., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: a statistical view ofboosting (with discussion). Ann. Statist. 28 337-407. · Zbl 1106.62323
[12] Griffin, W. L., Fisher, N. I., Friedman J. H., Ryan, C. G. and O’Reilly, S. (1999). Cr-Pyrope garnets in lithospheric mantle. J. Petrology. 40 679-704.
[13] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London. · Zbl 0747.62061
[14] Huber, P. (1964). Robust estimation ofa location parameter. Ann. Math. Statist. 35 73-101. · Zbl 0136.39805
[15] Mallat, S. and Zhang, Z. (1993). Matching pursuits with time frequency dictionaries. IEEE Trans. Signal Processing 41 3397-3415. · Zbl 0842.94004
[16] Powell, M. J. D. (1987). Radial basis functions for multivariate interpolation: a review. In Algorithms for Approximation (J. C. Mason and M. G. Cox, eds.) 143-167. Clarendon Press, Oxford. · Zbl 0638.41001
[17] Ratsch, G., Onoda, T. and Muller, K. R. (1998). Soft margins for AdaBoost. NeuroCOLT Technical Report NC-TR-98-021. · Zbl 0969.68128
[18] Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge Univ. Press. · Zbl 0853.62046
[19] Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning representations by backpropagating errors. Nature 323 533-536. · Zbl 1369.68284
[20] Schapire, R. and Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory. ACM, New York. · Zbl 0945.68194
[21] Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer, New York. · Zbl 0833.62008
[22] Warner, J. R., Toronto, A. E., Veasey, L. R. and Stephenson, R. (1961). A mathematical model for medical diagnosis-application to congenital heart disease. J. Amer. Med. Assoc. 177 177-184.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.