×

Bounding the generalization error of convex combinations of classifiers: Balancing the dimensionality and the margins. (English) Zbl 1073.62535

Summary: A problem of bounding the generalization error of a classifier \(f\in \text{conv}(\mathcal{H})\), where \(\mathcal{H}\) is a “base” class of functions (classifiers), is considered. This problem frequently occurs in computer learning, where efficient algorithms that combine simple classifiers into a complex one (such as boosting and bagging) have attracted a lot of attention. Using Talagrand’s concentration inequalities for empirical processes, we obtain new sharper bounds on the generalization error of combined classifiers that take into account both the empirical distribution of “classification margins” and an ”approximate dimension” of the classifiers, and study the performance of these bounds in several experiments with learning algorithms.

MSC:

62G99 Nonparametric inference
60F15 Strong limit theorems
62G20 Asymptotic properties of nonparametric inference

Software:

AdaBoost.MH; UCI-ml
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] ANTHONY, M. and BARTLETT, P. (1999). Neural Network Learning: Theoretical Foundations. Cambridge Univ. Press. · Zbl 0968.68126 · doi:10.1017/CBO9780511624216
[2] BARTLETT, P. (1998). The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Trans. Inform. Theory 44 525-536. · Zbl 0901.68177 · doi:10.1109/18.661502
[3] BARTLETT, P., BOUCHERON, S. and LUGOSI, G. (2001). Model selection and error estimation. Machine Learning 48 85-113. · Zbl 0998.68117
[4] BLAKE, C. L. and MERZ, C. J. (1998). UCI repository of machine learning databases. Available at http://www.ics.uci.edu/ mlearn/MLRepository.html. URL:
[5] BREIMAN, L. (1996). Bagging predictors. Machine Learning 26 123-140. · Zbl 0858.68080
[6] BREIMAN, L. (1998). Arcing classifiers. Ann. Statist. 26 801-849. · Zbl 0934.62064 · doi:10.1214/aos/1024691079
[7] CORTES, C. and VAPNIK, V. (1995). Support vector networks. Machine Learning 24 273-297. · Zbl 0831.68098
[8] DEVROy E, L., GYÖRFI, L. and LUGOSI, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York. · Zbl 0853.68150
[9] DUFFY, N. and HELMBOLD, D. (1999). A geometric approach to leveraging weak learners. Computational Learning Theory. Lecture Notes in Comput. Sci. 18-33. Springer, New York. · Zbl 0997.68166 · doi:10.1016/S0304-3975(01)00083-4
[10] FREUND, Y. and SCHAPIRE, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Sy stem Sci. 55 119-139. · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504
[11] GROVE, A. and SCHUURMANS, D. (1998). Boosting in the limit: Maximizing the margin of learned ensembles. In Proceedings of the Fifteenth National Conference on Artificial Intelligence 692-699. AAAI Press, Menlo Park, California.
[12] HUSH, D. and HORNE, B. (1998). Efficient algorithms for function approximation with piecewise linear sigmoids. IEEE Trans. Neural Networks 9 1129-1141.
[13] KEARNS, M., MANSOUR, Y., NG, A. and RON, D. (1997). An experimental and theoretical comparison of model selection methods. Machine Learning 27 7-50.
[14] KOLTCHINSKII, V. (2001). Bounds on margin distributions in learning problems. Preprint. Available at http://www.math.unm.edu/ panchenk/. URL: · Zbl 1031.60017 · doi:10.1016/S0246-0203(03)00023-2/FLA
[15] KOLTCHINSKII, V. (2001). Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47 1902-1914. · Zbl 1008.62614 · doi:10.1109/18.930926
[16] KOLTCHINSKII, V. and PANCHENKO, D. (2000). Rademacher processes and bounding the risk of function learning. In High Dimensional Probability II (D. Mason, E. Giné and J. Wellner, eds.) 443-457. Birkhäuser, Boston. · Zbl 1106.68385
[17] KOLTCHINSKII, V. and PANCHENKO, D. (2002). Empirical margin distribution and bounding the generalization error of combined classifiers. Ann. Statist. 30 1-50. · Zbl 1012.62004
[18] KOLTCHINSKII, V., PANCHENKO, D. and LOZANO, F. (2001). Bounding the generalization error of neural networks and combined classifiers. In Proceedings of Thirteenth International Conference on Advances in Neural Information Processing Sy stems 245-251. MIT Press.
[19] LOZANO, F. and KOLTCHINSKII, V. (2002). Direct optimization of simple cost functions of the margin. In Proceedings of the First International NAISO Congress on Neuro Fuzzy Technologies. Academic Press, Amsterdam.
[20] MASON, L., BARTLETT, P. and BAXTER, J. (2000). Improved generalization through explicit optimization of margins. Machine Learning 38 243-255. · Zbl 0954.68134 · doi:10.1023/A:1007697429651
[21] MASON, L., BAXTER, J., BARTLETT, P. and FREAN, M. (2000). Functional gradient techniques of combining hy potheses. In Advances in Large Margin Classifiers (A. J. Smol, P. Bartlett, B. Schölkopf and C. Schuurmans, eds.) 221-246. MIT Press.
[22] MASSART, P. (2000). About the constants in Talagrand’s concentration inequalities for empirical processes. Ann. Probab. 28 863-884. · Zbl 1140.60310 · doi:10.1214/aop/1019160263
[23] PISIER, G. (1981). Remarques sur un résultat non publié de B. Maurey. In Séminaire d’Analy se Fonctionelle 1980-1981, Exposé 5. Ecole Poly technique, Palaiseau. · Zbl 0491.46017
[24] SCHAPIRE, R. E., FREUND, Y., BARTLETT, P. and LEE, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651-1687. · Zbl 0929.62069 · doi:10.1214/aos/1024691352
[25] TALAGRAND, M. (1996). New concentration inequalities in product spaces. Invent. Math. 126 505-563. · Zbl 0893.60001 · doi:10.1007/s002220050108
[26] VAN DER VAART, A. W. and WELLNER, J. (1996). Weak Convergence of Empirical Processes with Applications to Statistics. Springer, New York. · Zbl 0862.60002
[27] VAPNIK, V. (1998). Statistical Learning Theory. Wiley, New York. · Zbl 0935.62007
[28] ALBUQUERQUE, NEW MEXICO 87131-1141 E-MAIL: vlad@math.unm.edu panchenk@math.unm.edu F. LOZANO DEPARTMENT OF ELECTRONIC ENGINEERING UNIVERSIDAD JAVERIANA BOGOTA COLOMBIA E-MAIL: fernando.lozano@javeriana.edu.co
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.