×

Robust high dimensional learning for Lipschitz and convex losses. (English) Zbl 07307489

Summary: We establish risk bounds for Regularized Empirical Risk Minimizers (RERM) when the loss is Lipschitz and convex and the regularization function is a norm. In a first part, we obtain these results in the i.i.d. setup under subgaussian assumptions on the design. In a second part, a more general framework where the design might have heavier tails and data may be corrupted by outliers both in the design and the response variables is considered. In this situation, RERM performs poorly in general. We analyse an alternative procedure based on median-of-means principles and called “minmax MOM”. We show optimal subgaussian deviation rates for these estimators in the relaxed setting. The main results are meta-theorems allowing a wide-range of applications to various problems in learning theory. To show a non-exhaustive sample of these potential applications, it is applied to classification problems with logistic loss functions regularized by LASSO and SLOPE, to regression problems with Huber loss regularized by Group LASSO and Total Variation. Another advantage of the minmax MOM formulation is that it suggests a systematic way to slightly modify descent based algorithms used in high-dimensional statistics to make them robust to outliers. We illustrate this principle in a Simulations section where a “ minmax MOM” version of classical proximal descent algorithms are turned into robust to outliers algorithms.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

gglasso
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments.J. Comput. System Sci., 58(1, part 2):137-147, 1999.ISSN 0022-0000. doi: 10.1006/jcss.1997.1545. URLhttp://dx.doi.org/10.1006/jcss.1997.1545. Twenty-eighth Annual ACM Symposium on the Theory of Computing (Philadelphia, PA, 1996). · Zbl 0938.68153
[2] P. Alquier, V. Cottet, and G. Lecu´e.Estimation bounds and sharp oracle inequalities of regularized procedures with lipschitz loss functions.to appear in Ann. Statist., arXiv preprint arXiv:1702.01402, 2017.
[3] Dennis Amelunxen, Martin Lotz, Michael B. McCoy, and Joel A. Tropp. Living on the edge: phase transitions in convex programs with random data.Inf. Inference, 3(3):224-294, 2014. ISSN 20498764. doi: 10.1093/imaiai/iau005. URLhttps://doi.org/10.1093/imaiai/iau005. · Zbl 1339.90251
[4] Andreas Argyriou, Luca Baldassarre, Charles A. Micchelli, and Massimiliano Pontil. On sparsity inducing regularization methods for machine learning. InEmpirical inference, pages 205-216. Springer, Heidelberg, 2013. doi: 10.1007/978-3-642-41136-6 18. URLhttps://doi.org/ 10.1007/978-3-642-41136-6_18. · Zbl 1330.68245
[5] Jean-Yves Audibert and Olivier Catoni. Robust linear least squares regression.Ann. Statist., 39(5): 2766-2794, 2011. ISSN 0090-5364. doi: 10.1214/11-AOS918. URLhttp://dx.doi.org/ 10.1214/11-AOS918. · Zbl 1231.62126
[6] Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. Structured sparsity through convex optimization.Statist. Sci., 27(4):450-468, 2012. ISSN 0883-4237. doi: 10.1214/ 12-STS394. URLhttps://doi.org/10.1214/12-STS394. · Zbl 1331.90050
[7] Y. Baraud, L. Birg´e, and M. Sart. A new method for estimation and model selection:ρ-estimation. Invent. Math., 207(2):425-517, 2017. ISSN 0020-9910. doi: 10.1007/s00222-016-0673-5. URL https://doi.org/10.1007/s00222-016-0673-5. · Zbl 1373.62141
[8] Peter L. Bartlett and Shahar Mendelson.Empirical minimization.Probab. Theory Related Fields, 135(3):311-334, 2006.ISSN 0178-8051.URLhttps://doi.org/10.1007/ s00440-005-0462-3. · Zbl 1142.62348
[9] Peter L. Bartlett, Olivier Bousquet, and Shahar Mendelson. Localized Rademacher complexities. InComputational learning theory (Sydney, 2002), volume 2375 ofLecture Notes in Comput. Sci., pages 44-58. Springer, Berlin, 2002. doi: 10.1007/3-540-45435-7 4. URLhttps://doi. org/10.1007/3-540-45435-7_4.
[10] Peter L Bartlett, Olivier Bousquet, Shahar Mendelson, et al. Local rademacher complexities.The Annals of Statistics, 33(4):1497-1537, 2005. · Zbl 1083.62034
[11] Peter L. Bartlett, Michael I. Jordan, and Jon D. McAuliffe. Convexity, classification, and risk bounds.J. Amer. Statist. Assoc., 101(473):138-156, 2006. ISSN 0162-1459. doi: 10.1198/ 016214505000000907. URLhttps://doi.org/10.1198/016214505000000907. · Zbl 1118.62330
[12] Pierre C Bellec. Localized gaussian width ofm-convex hulls with applications to lasso and convex aggregation.arXiv preprint arXiv:1705.10696, 2017.
[13] Pierre C Bellec, Guillaume Lecu´e, and Alexandre B Tsybakov. Towards the study of least squares estimators with convex penalty.In S´eminaire et Congr‘es, number 31. Soci´et´e math´ematique de France, 2017. · Zbl 1409.62133
[14] Pierre C. Bellec, Guillaume Lecu´e, and Alexandre B. Tsybakov. Slope meets Lasso: improved oracle bounds and optimality.Ann. Statist., 46(6B):3603-3642, 2018. ISSN 0090-5364. doi: 10.1214/17-AOS1670. URLhttps://doi.org/10.1214/17-AOS1670. · Zbl 1405.62056
[15] Badri Narayan Bhaskar, Gongguo Tang, and Benjamin Recht. Atomic norm denoising with applications to line spectral estimation.IEEE Trans. Signal Process., 61(23):5987-5999, 2013. ISSN 1053-587X. doi: 10.1109/TSP.2013.2273443. URLhttps://doi.org/10.1109/TSP. 2013.2273443. · Zbl 1394.94079
[16] Peter J. Bickel, Ya’acov Ritov, and Alexandre B. Tsybakov. Simultaneous analysis of lasso and Dantzig selector.Ann. Statist., 37(4):1705-1732, 2009.ISSN 0090-5364.doi: 10.1214/ 08-AOS620. URLhttps://doi.org/10.1214/08-AOS620. · Zbl 1173.62022
[17] Lucien Birg´e.Stabilit´e et instabilit´e du risque minimax pour des variables ind´ependantes ´equidistribu´ees.Ann. Inst. H. Poincar´e Probab. Statist., 20(3):201-223, 1984. ISSN 0246-0203. · Zbl 0542.62018
[18] Mał gorzata Bogdan, Ewout van den Berg, Chiara Sabatti, Weijie Su, and Emmanuel J. Cand‘es. SLOPE—adaptive variable selection via convex optimization.Ann. Appl. Stat., 9(3):1103-1140, 2015. ISSN 1932-6157. doi: 10.1214/15-AOAS842. URLhttps://doi.org/10.1214/ 15-AOAS842. · Zbl 1454.62212
[19] Peter B¨uhlmann and Sara van de Geer.Statistics for high-dimensional data.Springer Series in Statistics. Springer, Heidelberg, 2011.ISBN 978-3-642-20191-2.doi: 10.1007/ 978-3-642-20192-9. URLhttps://doi.org/10.1007/978-3-642-20192-9. Methods, theory and applications. · Zbl 1273.62015
[20] T Tony Cai, Zhao Ren, Harrison H Zhou, et al. Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation.Electronic Journal of Statistics, 10(1):1-59, 2016. · Zbl 1331.62272
[21] Djalil Chafa¨ı, Olivier Gu´edon, Guillaume Lecu´e, and Alain Pajor.Interactions between compressed sensing random matrices and high dimensional geometry. Citeseer, 2012. · Zbl 1396.94015
[22] Geoffrey Chinot. Robust learning and complexity dependent bounds for regularized problems.arXiv preprint arXiv:1902.02238, 2019.
[23] Geoffrey Chinot, Guillaume Lecu´e, and Matthieu Lerasle. Robust statistical learning with lipschitz and convex loss functions.To appear in Probability Theory and related fields, 2018.
[24] Geoffrey Chinot et al. Erm and rerm are optimal estimators for regression problems when malicious outliers corrupt the labels.Electronic Journal of Statistics, 14(2):3563-3605, 2020. · Zbl 1453.62484
[25] Luc Devroye, Matthieu Lerasle, Gabor Lugosi, Roberto I Oliveira, et al. Sub-gaussian mean estimators.The Annals of Statistics, 44(6):2695-2725, 2016. · Zbl 1360.62115
[26] Andreas Elsener and Sara van de Geer. Robust low-rank matrix estimation.Ann. Statist., 46(6B): 3481-3509, 2018. ISSN 0090-5364. doi: 10.1214/17-AOS1666. URLhttps://doi.org/ 10.1214/17-AOS1666. · Zbl 1412.62068
[27] Christophe Giraud.Introduction to high-dimensional statistics, volume 139 ofMonographs on Statistics and Applied Probability. CRC Press, Boca Raton, FL, 2015. ISBN 978-1-4822-37948. · Zbl 1341.62011
[28] Gene H Golub, Per Christian Hansen, and Dianne P O’Leary. Tikhonov regularization and total least squares.SIAM Journal on Matrix Analysis and Applications, 21(1):185-194, 1999. · Zbl 0945.65042
[29] Yehoram Gordon, Alexander E Litvak, Shahar Mendelson, and Alain Pajor. Gaussian averages of interpolated bodies and applications to approximate reconstruction.Journal of Approximation Theory, 149(1):59-73, 2007. · Zbl 1148.60003
[30] P. J. Huber and E. Ronchetti. Robust statistics. InInternational Encyclopedia of Statistical Science, pages 1248-1251. Springer, 2011.
[31] Mark R. Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution.Theoret. Comput. Sci., 43(2-3):169-188, 1986. ISSN 0304-3975. doi: 10.1016/0304-3975(86)90174-X. URLhttp://dx.doi.org/10.1016/ 0304-3975(86)90174-X. · Zbl 0597.68056
[32] Vladimir Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist., 34(6):2593-2656, 2006. ISSN 0090-5364. doi: 10.1214/009053606000001019. URLhttps://doi.org/10.1214/009053606000001019.
[33] Vladimir Koltchinskii.Oracle inequalities in empirical risk minimization and sparse recovery problems, volume 2033 ofLecture Notes in Mathematics. Springer, Heidelberg, 2011a. ISBN 9783-642-22146-0. URLhttps://doi.org/10.1007/978-3-642-22147-7. Lectures from the 38th Probability Summer School held in Saint-Flour, 2008, ´Ecole d’ ´Et´e de Probabilit´es de Saint-Flour. [Saint-Flour Probability Summer School]. · Zbl 1223.91002
[34] Vladimir Koltchinskii. Empirical and rademacher processes. InOracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems, pages 17-32. Springer, 2011b. · Zbl 1223.91002
[35] Vladimir Koltchinskii, Karim Lounici, and Alexandre B. Tsybakov.Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion.Ann. Statist., 39(5):2302-2329, 2011. ISSN 0090-5364. doi: 10.1214/11-AOS894. URLhttp://dx.doi.org/10.1214/ 11-AOS894. · Zbl 1231.62097
[36] Guillaume Lecu´e and Matthieu Lerasle. Learning from mom’s principles: Le cam’s approach.To appear in Stochastic Processes and their applications, 2017a.
[37] Guillaume Lecu´e and Matthieu Lerasle. Robust machine learning by median-of-means: theory and practice.to appear in The Annals of Statistics, 2017b.
[38] Guillaume Lecu´e and Shahar Mendelson.Learning subgaussian classes: Upper and minimax bounds.arXiv preprint arXiv:1305.4825, 2013.
[39] Guillaume Lecu´e and Shahar Mendelson. Regularization and the small-ball method II: complexity dependent error rates.J. Mach. Learn. Res., 18:Paper No. 146, 48, 2017. ISSN 1532-4435. · Zbl 1444.62051
[40] Guillaume Lecu´e and Shahar Mendelson. Regularization and the small-ball method I: Sparse recovery.Ann. Statist., 46(2):611-641, 2018. ISSN 0090-5364. doi: 10.1214/17-AOS1562. URL https://doi.org/10.1214/17-AOS1562. · Zbl 1403.60085
[41] Michel Ledoux and Michel Talagrand.Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media, 2013.
[42] Enno Mammen and Alexandre B. Tsybakov. Smooth discrimination analysis.Ann. Statist., 27(6): 1808-1829, 1999. ISSN 0090-5364. doi: 10.1214/aos/1017939240. URLhttps://doi. org/10.1214/aos/1017939240. · Zbl 0961.62058
[43] Lukas Meier, Sara Van De Geer, and Peter B¨uhlmann. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):53-71, 2008. · Zbl 1400.62276
[44] Shahar Mendelson. Learning without concentration. InConference on Learning Theory, pages 25-39, 2014.
[45] Shahar Mendelson. On multiplier processes under weak moment assumptions. InGeometric Aspects of Functional Analysis, pages 301-318. Springer, 2017. · Zbl 1366.60044
[46] Stanislav Minsker and Nate Strawn. Distributed statistical estimation and rates of convergence in normal approximation.arXiv preprint arXiv:1704.02658, 2017.
[47] A. S. Nemirovsky and D. B. Yudin.Problem complexity and method efficiency in optimization. A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York, 1983. ISBN 0-471-10345-4. Translated from the Russian and with a preface by E. R. Dawson, Wiley-Interscience Series in Discrete Mathematics. · Zbl 0501.90062
[48] Stanley Osher, Martin Burger, Donald Goldfarb, Jinjun Xu, and Wotao Yin. An iterative regularization method for total variation-based image restoration.Multiscale Modeling & Simulation, 4(2): 460-489, 2005. · Zbl 1090.94003
[49] Shai Shalev-Shwartz and Ambuj Tewari. Stochastic methods for l1-regularized loss minimization. Journal of Machine Learning Research, 12(Jun):1865-1892, 2011. · Zbl 1280.62081
[50] Noah Simon, Jerome Friedman, Trevor Hastie, and Robert Tibshirani. A sparse-group lasso.Journal of Computational and Graphical Statistics, 22(2):231-245, 2013.
[51] Michel Talagrand.Upper and lower bounds for stochastic processes, volume 60 ofErgebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics]. Springer, Heidelberg, 2014. ISBN 978-3-642-54074-5; 978-3-642-54075-2. doi: 10. 1007/978-3-642-54075-2. URLhttps://doi.org/10.1007/978-3-642-54075-2. Modern methods and classical problems. · Zbl 1293.60001
[52] Robert Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society. Series B (Methodological), pages 267-288, 1996. · Zbl 0850.62538
[53] Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. Sparsity and smoothness via the fused lasso.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(1):91-108, 2005. · Zbl 1060.62049
[54] Alexandre B. Tsybakov. Optimal aggregation of classifiers in statistical learning.Ann. Statist., 32 (1):135-166, 2004. ISSN 0090-5364. doi: 10.1214/aos/1079120131. URLhttps://doi. org/10.1214/aos/1079120131. · Zbl 1105.62353
[55] Sara van de Geer.Estimation and testing under sparsity, volume 2159 ofLecture Notes in Mathematics. Springer, [Cham], 2016. ISBN 978-3-319-32773-0; 978-3-319-32774-7. doi: 10.1007/ 978-3-319-32774-7. URLhttps://doi.org/10.1007/978-3-319-32774-7. Lecture notes from the 45th Probability Summer School held in Saint-Four, 2015, ´Ecole d’ ´Et´e de Probabilit´es de Saint-Flour. [Saint-Flour Probability Summer School]. · Zbl 1362.62006
[56] Sara van de Geer.Logistic regression with total variation regularization.arXiv preprint arXiv:2003.02678, 2020.
[57] V. N. Vapnik and A. Ja. ˇCervonenkis. The uniform convergence of frequencies of the appearance of events to their probabilities.Teor. Verojatnost. i Primenen., 16:264-279, 1971. ISSN 0040-361x. · Zbl 0247.60005
[58] Vladimir Naumovich Vapnik.Statistical learning theory, volume 1. Wiley New York, 1998. · Zbl 0935.62007
[59] Yi Yang and Hui Zou. A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing, 25(6):1129-1141, 2015. · Zbl 1331.62343
[60] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization.Ann. Statist., 32(1):56-85, 2004. ISSN 0090-5364. doi: 10.1214/aos/1079120130. URLhttps://doi.org/10.1214/aos/1079120130. · Zbl 1105.62323
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.