×

On the Bayes-risk consistency of regularized boosting methods. (English) Zbl 1105.62319

Summary: The probability of error of classification methods based on convex combinations of simple base classifiers by ”boosting” algorithms is investigated. The main result of the paper is that certain regularized boosting algorithms provide Bayes-risk consistent classifiers under the sole assumption that the Bayes classifier may be approximated by a convex combination of the base classifiers. Nonasymptotic distribution-free bounds are also developed which offer interesting new insight into how boosting works and help explain its success in practical classification problems.

MSC:

62F15 Bayesian inference
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G99 Nonparametric inference

Software:

AdaBoost.MH

References:

[1] Amit, Y. and Blanchard, G. (2001). Multiple randomized classifiers. Unpublished manuscript.
[2] Blanchard, G. (2001). Méthodes de mélange et d’agrégation d’estimateurs en reconnaissance de formes. Application aux arbres de décision. Ph.D. dissertation, Univ. Paris XIII. (In English.)
[3] Breiman, L. (1996a). Bagging predictors. Machine Learning 24 123–140. · Zbl 0858.68080
[4] Breiman, L. (1996b). Bias, variance, and arcing classifiers. Technical Report 460, Dept. Statistics, Univ. California, Berkeley.
[5] Breiman, L. (1997a). Arcing the edge. Technical Report 486, Dept. Statistics, Univ. California, Berkeley.
[6] Breiman, L. (1997b). Pasting bites together for prediction in large data sets and on-line. Technical report, Dept. Statistics, Univ. California, Berkeley.
[7] Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801–849. · Zbl 0934.62064 · doi:10.1214/aos/1024691079
[8] Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493–1517.
[9] Breiman, L. (2000). Some infinite theory for predictor ensembles. Technical Report 577, Dept. Statistics, Univ. California, Berkeley.
[10] Bühlmann, P. and Yu, B. (2000). Discussion of “Additive logistic regression: A statistical view of boosting” by J. Friedman, T. Hastie and R. Tibshirani. Ann. Statist. 28 377–386. · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[11] Bühlmann, P. and Yu, B. (2003). Boosting with the \(L_2\)-loss: Regression and classification. J. Amer. Statist. Assoc . 98 324–339. · Zbl 1041.62029 · doi:10.1198/016214503000125
[12] Collins, M., Schapire, R. E. and Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning 48 253–285. · Zbl 0998.68123 · doi:10.1023/A:1013912006537
[13] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition . Springer, New York. · Zbl 0853.68150
[14] Devroye, L. and Lugosi, G. (2001). Combinatorial Methods in Density Estimation . Springer, New York. · Zbl 0964.62025
[15] Freund, Y. (1995). Boosting a weak learning algorithm by majority. Inform. and Comput. 121 256–285. · Zbl 0833.68109 · doi:10.1006/inco.1995.1136
[16] Freund, Y., Mansour, Y. and Schapire, R. E. (2001). Why averaging classifiers can protect against overfitting. In Proc. Eighth International Workshop on Artificial Intelligence and Statistics .
[17] Freund, Y. and Schapire, R. E. (1996a). Experiments with a new boosting algorithm. In Machine Learning : Proc. 13th International Conference 148–156. Morgan Kaufmann, San Francisco.
[18] Freund, Y. and Schapire, R. E. (1996b). Game theory, on-line prediction and boosting. In Proc. 9th Annual Conference on Computational Learning Theory 325–332. ACM Press, New York.
[19] Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139. · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504
[20] Freund, Y. and Schapire, R. E. (2000). Discussion of “Additive logistic regression: A statistical view of boosting,” by J. Friedman, T. Hastie and R. Tibshirani. Ann. Statist. 28 391–393. · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[21] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407. · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[22] Jiang, W. (2001). Some theoretical aspects of boosting in the presence of noisy data. In Proc. 18th International Conference on Machine Learning ( ICML-2001 ) 234–241. Morgan Kaufmann, San Francisco.
[23] Jiang, W. (2004). Process consistency for AdaBoost. Ann. Statist. 32 13–29. · Zbl 1105.62316 · doi:10.1214/aos/1079120128
[24] Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 1–50. · Zbl 1012.62004
[25] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces . Springer, New York. · Zbl 0748.60004
[26] Mannor, S. and Meir, R. (2001). Weak learners and improved convergence rate in boosting. In Advances in Neural Information Processing Systems 13 . Proc. NIPS’2000 280–286.
[27] Mannor, S., Meir, R. and Mendelson, S. (2001). On the consistency of boosting algorithms. Unpublished manuscript.
[28] Mannor, S., Meir, R. and Zhang, T. (2002). The consistency of greedy algorithms for classification. In Proc. 15th Annual Conference on Computational Learning Theory . Lecture Notes in Computer Science 2375 319–333. Springer, New York. · Zbl 1050.68581
[29] Mason, L., Baxter, J., Bartlett, P. L. and Frean, M. (2000). Functional gradient techniques for combining hypotheses. In Advances in Large Margin Classifiers (A. J. Smola, P. L. Bartlett, B. Schölkopf and D. Schuurmans, eds.) 221–247. MIT Press, Cambridge, MA.
[30] Schapire, R. E. (1990). The strength of weak learnability. Machine Learning 5 197–227. · Zbl 1142.62372
[31] Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651–1686. · Zbl 0929.62069 · doi:10.1214/aos/1024691352
[32] Steinwart, I. (2001). On the generalization ability of support vector machines. Unpublished manuscript.
[33] Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 56–85. · Zbl 1105.62323 · doi:10.1214/aos/1079120130
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.