×

Population theory for boosting ensembles. (English) Zbl 1105.62308

Summary: Tree ensembles are looked at in distribution space, that is, the limit case of “infinite” sample size. It is shown that the simplest kind of trees is complete in \(D\)-dimensional \(L_2(P)\) space if the number of terminal nodes \(T\) is greater than \(D\). For such trees we show that the AdaBoost algorithm gives an ensemble converging to the Bayes risk.

MSC:

62F15 Bayesian inference
62H30 Classification and discrimination; cluster analysis (statistical aspects)

References:

[1] Bauer, E. and Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36 105–139.
[2] Breiman, L. (1996). Bagging predictors. Machine Learning 24 123–140. · Zbl 0858.68080
[3] Breiman, L. (1997). Arcing the edge. Technical Report 486, Dept. Statistics, Univ. California, Berkeley. Available at www.stat.berkeley.edu.
[4] Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801–849. · Zbl 0934.62064 · doi:10.1214/aos/1024691079
[5] Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493–1517.
[6] Breiman, L. (2000). Some infinite theory for predictor ensembles. Technical Report 577, Dept. Statistics, Univ. California, Berkeley.
[7] Bühlmann, P. and Yu, B. (2003). Boosting with the \(L_2\) loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339. · Zbl 1041.62029 · doi:10.1198/016214503000125
[8] Dietterich, T. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning 40 139–157.
[9] Drucker, H. and Cortes, C. (1996). Boosting decision trees. In Advances in Neural Information Processing Systems 8 479–485. MIT Press, Cambridge, MA.
[10] Dunford, N. and Schwartz, J. (1958). Linear Operators. I. Interscience Publishers, New York. · Zbl 0084.10402
[11] Forsythe, G. E. and Wasow, W. R. (1960). Finite-Difference Methods for Partial Differential Equations . Wiley, New York. · Zbl 0099.11103
[12] Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning 148–156. Morgan Kaufmann, San Francisco.
[13] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407. · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[14] Jiang, W. (2004). Process consistency for AdaBoost. Ann. Statist. 32 13–29. · Zbl 1105.62316 · doi:10.1214/aos/1079120128
[15] Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32 30–55. · Zbl 1105.62319 · doi:10.1214/aos/1079120129
[16] Mannor, S., Meir, R. and Zhang, T. (2002). The consistency of greedy algorithms for classification. In Proc. 15th Annual Conference on Computational Learning Theory . Lecture Notes in Comp. Sci. 2375 319–333. Springer, New York. · Zbl 1050.68581
[17] Schapire, R., Freund, Y., Bartlett, P. and Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651–1686. · Zbl 0929.62069 · doi:10.1214/aos/1024691352
[18] Schapire, R. and Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning 37 297–336. · Zbl 0945.68194 · doi:10.1023/A:1007614523901
[19] Wheway, V. (1999). Variance reduction trends on “boosted” classifiers. Unpublished manuscript. · Zbl 1213.62109
[20] Zhang, T. and Yu, B. (2003). Boosting with early stopping: Convergence and consistency. Technical Report 635, Dept. Statistics, Univ. California, Berkeley. Available from www.stat.berkeley.edu/ binyu/publications.html. · Zbl 1078.62038
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.