Risk bounds for statistical learning. (English) Zbl 1108.62007

Summary: We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM). We essentially focus on the binary classification framework. We extend A. B. Tsybakov’s [ibid. 32, No. 1, 135–166 (2004; Zbl 1105.62353)] analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with ways of measuring the ‘size’ of a class of classifiers other than entropy with bracketing as in Tsybakov’s work. In particular, we derive new risk bounds for the ERM when the classification rules belong to some Vapuis-Chervonenkis class under margin conditions and discuss the optimality of these bounds in a minimax sense.


62B10 Statistical aspects of information-theoretic topics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence
60E15 Inequalities; stochastic orderings
94A17 Measures of information, entropy
62F15 Bayesian inference


Zbl 1105.62353
Full Text: DOI arXiv


[1] Barron, A. R., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413. · Zbl 0946.62036
[2] Birgé, L. (2005). A new lower bound for multiple hypothesis testing. IEEE Trans. Inform. Theory 51 1611–1615. · Zbl 1283.62030
[3] Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 329–375. · Zbl 0954.62033
[4] Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495–500. · Zbl 1001.60021
[5] Devroye, L. and Lugosi, G. (1995). Lower bounds in pattern recognition and learning. Pattern Recognition 28 1011–1018.
[6] Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Univ. Press. · Zbl 0951.60033
[7] Edelsbrunner, H. (1987). Algorithms in Combinatorial Geometry. Springer, Berlin. · Zbl 0634.52001
[8] Haussler, D. (1995). Sphere packing numbers for subsets of the Boolean \(n\)-cube with bounded Vapnik–Chervonenkis dimension. J. Combin. Theory Ser. A 69 217–232. · Zbl 0818.60005
[9] Haussler, D., Littlestone, N. and Warmuth, M. (1994). Predicting \(\;0,1\$-functions on randomly drawn points. Inform. and Comput. 115 248--292.\) · Zbl 0938.68785
[10] Koltchinskii, V. I. (1981). On the central limit theorem for empirical measures. Theor. Probab. Math. Statist. 24 71–82. · Zbl 0504.60024
[11] Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction . Lecture Notes in Statist. 82 . Springer, New York. · Zbl 0833.62039
[12] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces . Isoperimetry and Processes . Springer, Berlin. · Zbl 0748.60004
[13] Lugosi, G. (2002). Pattern classification and learning theory. In Principles of Nonparametric Learning (L. Györfi, ed.) 1–56. Springer, Vienna.
[14] Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808–1829. · Zbl 0961.62058
[15] Massart, P. (2000). Some applications of concentration inequalities to statistics. Probability theory. Ann. Fac. Sci. Toulouse Math. ( 6 ) 9 245–303. · Zbl 0986.62002
[16] Massart, P. (2006). Concentration inequalities and model selection. Lectures on Probability Theory and Statistics. Ecole d ’ Eté de Probabilités de Saint Flour XXXIII . Lecture Notes in Math. 1896 . Springer, Berlin.
[17] Massart, P. and Rio, E. (1998). A uniform Marcinkiewicz–Zygmund strong law of large numbers for empirical processes. In Festschrift for Miklós Csörgő : Asymptotic Methods in Probability and Statistics (B. Szyszkowicz, ed.) 199–211. North-Holland, Amsterdam. · Zbl 0933.60015
[18] McDiarmid, C. (1989). On the method of bounded differences. In Surveys in Combinatorics 1989 (J. Siemons, ed.) 148–188. Cambridge Univ. Press. · Zbl 0712.05012
[19] Pollard, D. (1982). A central limit theorem for empirical processes. J. Austral. Math. Soc. Ser. A 33 235–248. · Zbl 0504.60023
[20] Reynaud-Bouret, P. (2003). Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields 126 103–153. · Zbl 1019.62079
[21] Talagrand, M. (1996). New concentration inequalities in product spaces. Invent. Math. 126 505–563. · Zbl 0893.60001
[22] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166. · Zbl 1105.62353
[23] Vapnik, V. N. (1982). Estimation of Dependences Based on Empirical Data . Springer, New York. · Zbl 0499.62005
[24] Vapnik, V. N. and Chervonenkis, A. Ya. (1974). Theory of Pattern Recognition . Nauka, Moscow. (In Russian.) · Zbl 0284.68070
[25] Yang, Y. and Barron, A. R. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564–1599. · Zbl 0978.62008
[26] Yu, B. (1997). Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam : Research Papers in Probability and Statistics (D. Pollard, E. Torgersen and G. L. Yang, eds.) 423–435. Springer, New York. · Zbl 0896.62032
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.