×

Analysis of boosting algorithms using the smooth margin function. (English) Zbl 1132.68827

Summary: We introduce a useful tool for analyzing boosting algorithms called the “smooth margin function,” a differentiable approximation of the usual margin for boosting algorithms. We present two boosting algorithms based on this smooth margin, “coordinate ascent boosting” and “approximate coordinate ascent boosting,” which are similar to Freund and Schapire’s AdaBoost algorithm and Breiman’s arc-gv algorithm. We give convergence rates to the maximum margin solution for both of our algorithms and for arc-gv. We then study AdaBoost’s convergence properties using the smooth margin function. We precisely bound the margin attained by AdaBoost when the edges of the weak classifiers fall within a specified range. This shows that a previous bound proved by Rätsch and Warmuth is exactly tight. Furthermore, we use the smooth margin to capture explicit properties of AdaBoost in cases where cyclic behavior occurs.

MSC:

68W40 Analysis of algorithms
68Q25 Analysis of algorithms and problem complexity
68Q32 Computational learning theory

Software:

AdaBoost.MH

References:

[1] Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801-849. · Zbl 0934.62064 · doi:10.1214/aos/1024691079
[2] Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493-1517.
[3] Caruana, R. and Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. In Proc. Twenty-Third International Conference on Machine Learning 161-168. ACM Press, New York.
[4] Collins, M., Schapire, R. E. and Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning 48 253-285. · Zbl 0998.68123 · doi:10.1023/A:1013912006537
[5] Drucker, H. and Cortes, C. (1996). Boosting decision trees. In Advances in Neural Information Processing Systems 8 479-485. MIT Press, Cambridge, MA.
[6] Duffy, N. and Helmbold, D. (1999). A geometric approach to leveraging weak learners. Computational Learning Theory ( Nordkirchen , 1999 ). Lecture Notes in Comput. Sci. 1572 18-33. Springer, Berlin. · Zbl 0997.68166 · doi:10.1016/S0304-3975(01)00083-4
[7] Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119-139. · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504
[8] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337-407. · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[9] Grove, A. J. and Schuurmans, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proc. Fifteenth National Conference on Artificial Intelligence 692-699.
[10] Koltchinskii, V. and Panchenko, D. (2005). Complexities of convex combinations and bounding the generalization error in classification. Ann. Statist. 33 1455-1496. · Zbl 1080.62045 · doi:10.1214/009053605000000228
[11] Kutin, S. (2002). Algorithmic stability and ensemble-based learning. Ph.D. dissertation, Univ. Chicago.
[12] Mason, L., Baxter, J., Bartlett, P. and Frean, M. (2000). Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems 12 512-518. MIT Press, Cambridge, MA.
[13] Meir, R. and Rätsch, G. (2003). An introduction to boosting and leveraging. Advanced Lectures on Machine Learning. Lecture Notes in Comput. Sci. 2600 119-183. Springer, Berlin. · Zbl 1019.68092
[14] Quinlan, J. R. (1996). Bagging, boosting, and C4.5. In Proc. Thirteenth National Conference on Artificial Intelligence 725-730. AAAI Press, Menlo Park, CA. · Zbl 1184.68423
[15] Rätsch, G. (2001). Robust boosting via convex optimization: Theory and applications. Ph.D. dissertation, Dept. Computer Science, Univ. Potsdam, Potsdam, Germany.
[16] Rätsch, G., Onoda, T. and Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning 42 287-320. · Zbl 0969.68128 · doi:10.1023/A:1007618119488
[17] Rätsch, G. and Warmuth, M. (2005). Efficient margin maximizing with boosting. J. Mach. Learn. Res. 6 2131-2152. · Zbl 1222.68285
[18] Reyzin, L. and Schapire, R. E. (2006). How boosting the margin can also boost classifier complexity. In Proc. Twenty-Third International Conference on Machine Learning 753-760. ACM Press, New York.
[19] Rosset, S., Zhu, J. and Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. J. Mach. Learn. Res. 5 941-973. · Zbl 1222.68290
[20] Rudin, C. (2004). Boosting, margins and dynamics. Ph.D. dissertation, Princeton Univ. · Zbl 1222.68293
[21] Rudin, C., Cortes, C., Mohri, M. and Schapire, R. E. (2005). Margin-based ranking meets boosting in the middle. Learning Theory . Lecture Notes in Comput. Sci. 3559 63-78. Springer, Berlin. · Zbl 1137.68561 · doi:10.1007/b137542
[22] Rudin, C., Daubechies, I. and Schapire, R. E. (2004). The dynamics of AdaBoost: Cyclic behavior and convergence of margins. J. Mach. Learn. Res. 5 1557-1595. · Zbl 1222.68293
[23] Rudin, C., Daubechies, I. and Schapire, R. E. (2004). On the dynamics of boosting. In Advances in Neural Information Processing Systems 16 . MIT Press, Cambridge, MA. · Zbl 1078.68724
[24] Rudin, C. and Schapire, R. E. (2007). Margin-based ranking and why Adaboost is actually a ranking algorithm.
[25] Rudin, C., Schapire, R. E. and Daubechies, I. (2004). Boosting based on a smooth margin. Learning Theory . Lecture Notes in Comput. Sci. 3120 502-517. Springer, Berlin. · Zbl 1078.68724 · doi:10.1007/b98522
[26] Rudin, C., Schapire, R. E. and Daubechies, I. (2007). Precise statements of convergence for AdaBoost and arc-gv. In Proc. AMS-IMS-SIAM Joint Summer Research Conference : Machine Learning , Statistics , and Discovery 131-145. · Zbl 1141.68722 · doi:10.1090/conm/443/08559
[27] Schapire, R. E. (2003). The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification. Lecture Notes in Statist. 171 149-171. Springer, New York. · Zbl 1142.62372
[28] Schapire, R. E., Freund, Y., Bartlett, P. and Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651-1686. · Zbl 0929.62069 · doi:10.1214/aos/1024691352
[29] Zhang, T. and Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Ann. Statist. 33 1538-1579. · Zbl 1078.62038 · doi:10.1214/009053605000000255
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.