×

On weak base hypotheses and their implications for boosting regression and classification. (English) Zbl 1012.62066

Summary: When studying the training error and the prediction error for boosting, it is often assumed that the hypotheses returned by the base learner are weakly accurate, or are able to beat a random guesser by a certain amount of difference. It has been an open question how much this difference can be, whether it will eventually disappear in the boosting process or be bounded by a positive amount. This question is crucial for the behavior of both the training error and the prediction error.
We study this problem and show affirmatively that the amount of improvement over the random guesser will be at least a positive amount for almost all possible sample realizations and for most of the commonly used base hypotheses. This has a number of implications for the prediction error, including, for example, that boosting forever may not be good and regularization may be necessary. The problem is studied by first considering an analog of AdaBoost in regression, where we study similar properties and find that, for good performance, one cannot hope to avoid regularization by just adopting the boosting device to regression.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G08 Nonparametric regression and quantile regression
68T05 Learning and adaptive systems in artificial intelligence
62G99 Nonparametric inference

Software:

AdaBoost.MH
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] ANTHONY, M. and BIGGS, N. (1992). Computational Learning Theory: An Introduction. Cambridge Univ. Press. · Zbl 0755.68115
[2] BREIMAN, L. (1996). Bagging predictors. Machine Learning 24 123-140. · Zbl 0858.68080
[3] BREIMAN, L. (1997a). Prediction games and arcing classifiers. Technical Report 504, Dept. Statistics, Univ. California, Berkeley.
[4] BREIMAN, L. (1997b). Arcing the edge. Technical Report 486, Dept. Statistics, Univ. California, Berkeley.
[5] BREIMAN, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801-849. · Zbl 0934.62064
[6] BREIMAN, L. (1999). Using adaptive bagging to debias regressions. Technical Report 547, Dept. Statistics, Univ. California, Berkeley. · Zbl 1052.68109
[7] BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA. · Zbl 0541.62042
[8] BÜHLMANN, P. and YU, B. (2000). Explaining bagging. Technical report, Dept. Statistics, Univ. California, Berkeley.
[9] DEVROYE, L., GYÖRFI, L. and LUGOSI, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York. · Zbl 0853.68150
[10] DONOHO, D. L. and JOHNSTONE, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425-455. JSTOR: · Zbl 0815.62019
[11] FREUND, Y. (1995). Boosting a weak learning algorithm by majority. Inform. and Comput. 121 256-285. · Zbl 0833.68109
[12] FREUND, Y. and SCHAPIRE, R. E. (1996). Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual ACM Conference on Computational Learning Theory 325-332. ACM Press, New York.
[13] FREUND, Y. and SCHAPIRE, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119-139. · Zbl 0880.68103
[14] FRIEDMAN, J. H. (1999a). Greedy function approximation: a gradient boosting machine. Technical report, Dept. Statistics, Stanford Univ.
[15] FRIEDMAN, J. H. (1999b). Stochastic gradient boosting. Technical report, Dept. Statistics, Stanford Univ.
[16] FRIEDMAN, J., HASTIE, T. and TIBSHIRANI, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion). Ann. Statist. 28 337-407. · Zbl 1106.62323
[17] GOLDMANN, M., HASTAD, J. and RAZBOROV, A. (1992). Majority gates vs. general weighted threshold gates. Comput. Complexity 2 277-300. · Zbl 0770.68054
[18] GROVE, A. J. and SCHUURMANS, D. (1998). Boosting in the limit: maximizing the margin of learned ensembles. In Proceedings of the 15th National Conference on Artificial Intelligence. AAAI, Menlo Park, CA.
[19] HAUSSLER, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inform. and Comput. 100 78-150. · Zbl 0762.68050
[20] JACOBS, R. A., JORDAN, M. I., NOWLAN, S. J. and HINTON, G. E. (1991). Adaptive mixtures of local experts. Neural Comput. 3 79-87.
[21] JIANG, W. (2000a). On weak base hypotheses and their implications for boosting regression and classification. Technical Report 00-01, Dept. Statistics, Northwestern Univ.
[22] JIANG, W. (2000b). Process consistency for AdaBoost. Technical Report 00-05, Dept. Statistics, Northwestern Univ.
[23] MALLAT, S. and ZHANG, S. (1993). Matching pursuit in a time-frequency dictionary. IEEE Trans. Signal Processing 41 3397-3415. · Zbl 0842.94004
[24] MASON, L., BAXTER, J., BARTLETT, P. and FREAN, M. (1999). Boosting algorithms as gradient descent in function space. Technical report, Dept. Systems Engineering, Australian National Univ.
[25] SCHAPIRE, R. E. (1990). The strength of weak learnability. Machine Learning 5 197-227.
[26] SCHAPIRE, R. E. (1999). Theoretical views of boosting. In Computational Learning Theory. Lecture Notes in Comput. Sci. 1572 1-10. Springer, Berlin.
[27] SCHAPIRE, R. E., FREUND, Y., BARTLETT, P. and LEE, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651-1686. · Zbl 0929.62069
[28] VAPNIK, V. N. (1998). Statistical Learning Theory. Wiley, New York. · Zbl 0935.62007
[29] YANG, Y. (1999). Minimax nonparametric classification. I. Rates of convergence. IEEE Trans. Inform. Theory 45 2271-2284. · Zbl 0962.62026
[30] EVANSTON, ILLINOIS 60208 E-MAIL: wjiang@northwestern.edu
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.