Confidence regions and minimax rates in outlier-robust estimation on the probability simplex. (English) Zbl 1447.62031

The mean of a random vector of the probability simplex is estimated. The minimax rates of the expected error of estimation under the sparsity assumption are established in cases of total variation, Hellinger, Euclidean and Wasserstein distances employed for accuracy measures. The efficincy is illustrated on the dataset which is a collection of books written by Alexandre Dumas and Emile Zola.


62F35 Robustness and adaptive procedures (parametric inference)
62H12 Estimation in multivariate analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62C20 Minimax procedures in statistical decision theory
Full Text: DOI arXiv Euclid


[1] J.-Y. Audibert and O. Catoni. Robust linear least squares regression., Ann. Statist., 39(5) :2766-2794, 10 2011. · Zbl 1231.62126
[2] S. Balakrishnan, S. S. Du, J. Li, and A. Singh. Computationally efficient robust sparse estimation in high dimensions. In, Conference on Learning Theory, pages 169-212, 2017.
[3] P. C. Bellec. Adaptive confidence sets in shape restricted regression., arXiv preprint arXiv:1601.05766, 2016.
[4] P. C. Bellec. Sharp oracle inequalities for least squares estimators in shape restricted regression., Ann. Statist., 46(2):745-780, 04 2018. · Zbl 1408.62066
[5] K. Bhatia, P. Jain, P. Kamalaruban, and P. Kar. Consistent robust regression. In, Advances in Neural Information Processing Systems, pages 2110-2119, 2017.
[6] S. Boucheron, G. Lugosi, and P. Massart., Concentration Inequalities: A Nonasymptotic Theory of Independence. OUP, Oxford, 2013. · Zbl 1279.60005
[7] D. Braess and T. Sauer. Bernstein polynomials and learning theory., Journal of Approximation Theory, 128(2):187-206, 2004. · Zbl 1068.41009
[8] V.-E. Brunel. Concentration of the empirical level sets of Tukey’s halfspace depth., Probability Theory and Related Fields, 173(3-4) :1165-1196, 2019. · Zbl 1416.62285
[9] T. T. Cai and M. G. Low. Adaptive confidence balls., Ann. Statist., 34(1):202-228, 2006. · Zbl 1091.62037
[10] A. Carpentier, S. Delattre, E. Roquain, and N. Verzelen. Estimating minimum effect with outlier selection., arXiv e-prints arXiv:1809.08330, Sept. 2018.
[11] M. Chen, C. Gao, and Z. Ren. A general decision theory for Huber’s \(\varepsilon \)-contamination model., Electronic Journal of Statistics, 10(2) :3752-3774, 2016. · Zbl 1357.62038
[12] M. Chen, C. Gao, and Z. Ren. Robust covariance and scatter matrix estimation under Huber’s contamination model., Ann. Statist., 46(5) :1932-1960, 10 2018. · Zbl 1408.62104
[13] S. Chen, J. Li, and A. Moitra. Efficiently learning structured distributions from untrusted batches. In, Proccedings of STOC 2020, June 22-26, pages 960-973. ACM, 2020.
[14] Y. Chen, C. Caramanis, and S. Mannor. Robust sparse regression under adversarial corruption. In, International Conference on Machine Learning, pages 774-782, 2013.
[15] G. Chinot, G. Lecué, and M. Lerasle. Statistical learning with Lipschitz and convex loss functions., arXiv e-prints arXiv:1810.01090, Oct. 2018. · Zbl 1436.62178
[16] O. Collier and A. S. Dalalyan. Multidimensional linear functional estimation in sparse gaussian models and robust estimation of the mean., Electron. J. Statist., 13(2) :2830-2864, 2019. · Zbl 1432.62203
[17] A. Dalalyan and Y. Chen. Fused sparsity and robust estimation for linear models with unknown variance. In, Advances in Neural Information Processing Systems 25, pages 1259-1267. Curran Associates, Inc., 2012.
[18] A. S. Dalalyan and A. Minasyan. All-in-one robust estimator of the gaussian mean., math.ST, arXiv:2002.01432, 2020.
[19] A. S. Dalalyan and M. Sebbar. Optimal Kullback-Leibler aggregation in mixture density estimation by maximum likelihood., Math. Stat. Learn., 1(1):1-35, 2018. · Zbl 1416.62193
[20] L. Devroye, M. Lerasle, G. Lugosi, and R. I. Oliveira. Sub-gaussian mean estimators., Ann. Statist., 44(6) :2695-2725, 12 2016. · Zbl 1360.62115
[21] I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In, IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, USA, pages 655-664, 2016. · Zbl 1421.68149
[22] I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. Being robust (in high dimensions) can be practical. In, Proceedings of ICML 2017, pages 999-1008, 2017.
[23] I. Diakonikolas, J. Li, and L. Schmidt. Fast and sample near-optimal algorithms for learning multidimensional histograms. In, COLT 2018, Stockholm, Sweden, 6-9 July 2018., pages 819-842, 2018.
[24] D. L. Donoho and M. Gasko. Breakdown properties of location estimates based on halfspace depth and projected outlyingness., Ann. Statist., 20(4) :1803-1827, 1992. · Zbl 0776.62031
[25] D. L. Donoho and P. J. Huber. The notion of breakdown point. Festschr. for Erich L. Lehmann, 157-184, 1983. · Zbl 0523.62032
[26] D. L. Donoho and A. Montanari. High dimensional robust m-estimation: asymptotic variance via approximate message passing., Probability Theory and Related Fields, 166(3):935-969, Dec 2016. · Zbl 1357.62220
[27] J. Feng, H. Xu, S. Mannor, and S. Yan. Robust logistic regression and classification. In, Advances in Neural Information Processing Systems 27, pages 253-261. Curran Associates, Inc., 2014.
[28] A. Guntuboyina and B. Sen. Nonparametric shape-restricted regression., Statist. Sci., 33(4):568-594, 11 2018. · Zbl 1407.62135
[29] M. Hoffmann and R. Nickl. On adaptive inference and confidence bands., Ann. Statist., 39(5) :2383-2409, 2011. · Zbl 1232.62072
[30] P. J. Huber. Robust estimation of a location parameter., Ann. Math. Statist., 35(1):73-101, 1964. · Zbl 0136.39805
[31] A. Jain and A. Orlitsky. Robust learning of discrete distributions from batches., CoRR, arXiv:1911.08532, 2019.
[32] E. Joly, G. Lugosi, and R. Imbuzeiro Oliveira. On the estimation of the mean of a random vector., Electron. J. Statist., 11(1):440-451, 2017. · Zbl 1362.62121
[33] S. Kamath, A. Orlitsky, D. Pichapati, and A. T. Suresh. On learning distributions from their samples. In, Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, volume 40 of JMLR Workshop and Conference Proceedings, pages 1066-1100. JMLR.org, 2015.
[34] K. A. Lai, A. B. Rao, and S. Vempala. Agnostic estimation of mean and covariance. In, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665-674, Oct 2016.
[35] G. Lecué and M. Lerasle. Learning from MOM’s principles: Le Cam’s approach., Stoch. Proc. App., 129(11) :4385-4410, 2019. · Zbl 1435.62175
[36] H. Liu and C. Gao. Density estimation with contamination: minimax rates and theory of adaptation., Electron. J. Stat., 13(2) :3613-3653, 2019. · Zbl 1429.62130
[37] G. Lugosi and S. Mendelson. Sub-gaussian estimators of the mean of a random vector., Ann. Statist., 47(2):783-794, 04 2019. · Zbl 1417.62192
[38] S. Minsker. Geometric median and robust estimation in banach spaces., Bernoulli, 21(4) :2308-2335, 11 2015. · Zbl 1348.60041
[39] S. Minsker. Sub-gaussian estimators of the mean of a random matrix with heavy-tailed entries., Ann. Statist., 46(6A) :2871-2903, 12 2018. · Zbl 1418.62235
[40] N. H. Nguyen and T. D. Tran. Robust lasso with missing and grossly corrupted observations., IEEE Trans. Inform. Theory, 59(4) :2036-2058, 2013. · Zbl 1364.94146
[41] M. Qiao and G. Valiant. Learning discrete distributions from untrusted batches. In, 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, volume 94 of LIPIcs, pages 47:1-47:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018.
[42] P. J. Rousseeuw and M. Hubert. Regression depth., Journal of the American Statistical Association, 94(446):388-402, 1999. · Zbl 1007.62060
[43] A. B. Tsybakov., Introduction to Nonparametric Estimation. Springer, New York, 2009. · Zbl 1176.62032
[44] J. W. Tukey. Mathematics and the picturing of data. In, Proc. int. Congr. Math., Vancouver 1974, volume 2, pages 523-531, 1975.
[45] D. · Zbl 1403.62104
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.