Process consistency for AdaBoost. (English) Zbl 1105.62316

Summary: Recent experiments and theoretical studies show that AdaBoost can overfit in the limit of large time. If running the algorithm forever is suboptimal, a natural question is how low can the prediction error be during the process of AdaBoost? We show under general regularity conditions that during the process of AdaBoost a consistent prediction is generated, which has the prediction error approximating the optimal Bayes error as the sample size increases. This result suggests that, while running the algorithm forever can be suboptimal, it is reasonable to expect that some regularization method via truncation of the process may lead to a near-optimal performance for sufficiently large sample size.


62F15 Bayesian inference
62H30 Classification and discrimination; cluster analysis (statistical aspects)


Full Text: DOI Euclid


[1] Anthony, M. and Biggs, N. (1992). Computational Learning Theory : An Introduction . Cambridge Univ. Press. · Zbl 0755.68115
[2] Breiman, L. (1997). Prediction games and arcing classifiers. Technical Report 504, Dept. Statistics, Univ. California, Berkeley. · Zbl 0934.62064
[3] Breiman, L. (2000). Some infinity theory for predictor ensembles. Technical Report 579, Dept. Statistics, Univ. California, Berkeley.
[4] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York. · Zbl 0853.68150
[5] Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139. · Zbl 0880.68103
[6] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407. · Zbl 1106.62323
[7] Grove, A. J. and Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proc. 15th National Conference on Artificial Intelligence 692–699. AAAI Press, Menlo Park, CA.
[8] Jiang, W. (2002). On weak base hypotheses and their implications for boosting regression and classification. Ann. Statist. 30 51–73. · Zbl 1012.62066
[9] Mason, L., Baxter, J., Bartlett, P. and Frean, M. (1999). Boosting algorithms as gradient descent in function space. Technical report, Dept. Systems Engineering, Australian National Univ.
[10] Schapire, R. E. (1999). Theoretical views of boosting. In Computational Learning Theory : Proc. Fourth European Conference 1–10.
[11] Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651–1686. · Zbl 0929.62069
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.