zbMATH — the first resource for mathematics

On aggregation for heavy-tailed classes. (English) Zbl 1371.62032
Summary: We introduce an alternative to the notion of ‘fast rate’ in Learning Theory, which coincides with the optimal error rate when the given class happens to be convex and regular in some sense. While it is well known that such a rate cannot always be attained by a learning procedure (i.e., a procedure that selects a function in the given class), we introduce an aggregation procedure that attains that rate under rather minimal assumptions – for example, that the \(L_q\) and \(L_2\) norms are equivalent on the linear span of the class for some \(q>2\), and the target random variable is square-integrable. The key components in the proof include a two-sided isomorphic estimator on distances between class members, which is based on the median-of-means; and an almost isometric lower bound of the form \(N^{-1}\sum _{i=1}^N f^2(X_i) \geq (1-\zeta )\mathbb {E}f^2\) which holds uniformly in the class. Both results only require that the \(L_q\) and \(L_2\) norms are equivalent on the linear span of the class for some \(q>2\).

62G07 Density estimation
60G25 Prediction theory (aspects of stochastic processes)
60G50 Sums of independent random variables; random walks
68T05 Learning and adaptive systems in artificial intelligence
62G99 Nonparametric inference
PDF BibTeX Cite
Full Text: DOI arXiv
[1] Anthony, M., Bartlett, P.L.: Neural network learning: theoretical foundations. Cambridge University Press, Cambridge (1999) · Zbl 0968.68126
[2] Audibert, J.Y.: Proof of the optimality of the empirical star algorithm, unpublished note. Available at http://certis.enpc.fr/ audibert/Mes%20articles/NIPS07supplem2. Accessed 21 June 2016
[3] Audibert, JY, Fast learning rates in statistical inference through aggregation, Ann. Stat., 37, 1591-1646, (2009) · Zbl 1360.62167
[4] Boucheron, S., Lugosi, G., Massart, P.: Concentration inequalities: a nonasymptotic theory of independence. Oxford University Press, Oxford (2013) · Zbl 1279.60005
[5] Catoni, O.: Statistical learning theory and stochastic optimization, vol. 1851 of Lecture Notes in Mathematics, Springer, Berlin (2004) · Zbl 1076.93002
[6] de la Peña, V., Giné, E.: Decoupling: from dependence to independence. Springer-Verlag, Berlin (1999)
[7] Giné, E; Zinn, J, Some limit theorems for empirical processes, Ann. Probab., 12, 929-989, (1984) · Zbl 0553.60037
[8] Juditsky, A; Rigollet, P; Tsybakov, AB, Learning by mirror averaging, Ann. Stat., 36, 2183-2206, (2008) · Zbl 1274.62288
[9] Koltchinskii, V., Mendelson, S.: Bounding the smallest singular value of a random matrix without concentration. http://arxiv.org/abs/1312.3580 (preprint) · Zbl 1331.15027
[10] Lecué, G.: HDR Thesis. Available at http://www.cmap.polytechnique.fr/ lecue/HDR. Accessed 21 June 2016 · Zbl 0553.60037
[11] Lecué, G; Mendelson, S, Aggregation via empirical risk minimization, Probab. Theor. Relat. Fields, 145, 591-613, (2009) · Zbl 1206.62094
[12] Lecué, G., Mendelson, S.: Learning subgaussian classes: upper and minimax bounds. Available at http://arxiv.org/abs/1305.4825 (preprint) · Zbl 1206.62094
[13] Lecué, G., Mendelson, S.: Performance of empirical risk minimization in linear aggregation. Bernoulli 22(3), 1520-1534 (2016). doi:10.3150/15-BEJ701 · Zbl 0798.60051
[14] Ledoux, M.: The concentration of measure phenomenon. In: Mathematical Surveys and Monographs, vol. 89, p. x\(+\)181. American Mathematical Society, Providence, RI (2001) · Zbl 0995.60002
[15] Ledoux, M., Talagrand, M.: Probability in banach spaces. Isoperimetry and processes, Ergebnisse der Mathematik und ihrer Grenzgebiete (3), vol. 23. Springer-Verlag, Berlin (1991) · Zbl 0748.60004
[16] Mendelson, S; Pajor, A; Tomczak-Jaegermann, N, Reconstruction and Subgaussian operators, Geom. Funct. Anal., 17, 1248-1282, (2007) · Zbl 1163.46008
[17] Mendelson, S.: A remark on the diameter of random sections of convex bodies, geometric aspects of functional analysis (GAFA Seminar Notes), In: Klartag, B., Milman, E. (eds) Lecture notes in Mathematics 2116, pp. 395-404 (2014) · Zbl 1317.52011
[18] Mendelson, S, Learning without concentration, J ACM, 62(3), 1-25, (2015) · Zbl 1333.68232
[19] Mendelson, S.: Learning without concentration for a general loss function. http://arxiv.org/abs/1410.3192 (preprint) · Zbl 1274.62288
[20] Mendelson, S.: Upper bounds on product and multiplier empirical processes. Available at http://arxiv.org/abs/1410.8003 (preprint) · Zbl 1386.60077
[21] Mossel, E; O’Donnell, R; Oleszkiewicz, K, Noise stability of functions with low influences: invariance and optimality, Ann. Math., 171, 295-341, (2010) · Zbl 1201.60031
[22] Nemirovski, A.: Topics in non-parametric statistics, In: Lectures on probability theory and statistics (Saint-Flour, 1998), vol 1738 of Lecture Notes in Math., pp. 85-277. Springer, Berlin (2000)
[23] Pisier, G.: The volume of convex bodies and banach space geometry. In: Cambridge Tracts in Mathematics, vol. 94, p. xvi\(+\)250. Cambridge University Press, Cambridge (1989). doi:10.1017/CBO9780511662454 · Zbl 0698.46008
[24] Talagrand, M, Sharper bounds for Gaussian and empirical processes, Ann. Probab., 22, 28-76, (1994) · Zbl 0798.60051
[25] Tsybakov, A.B.: Introduction to nonparametric estimation. Springer, New York (2009) · Zbl 1176.62032
[26] Van der Vaart, A.W., Wellner, J.A.: Weak convergence and empirical processes. Springer Verlag, Berlin (1996) · Zbl 0862.60002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.