Density estimation with contamination: minimax rates and theory of adaptation. (English) Zbl 1429.62130

Summary: This paper studies density estimation under pointwise loss in the setting of contamination model. The goal is to estimate \(f(x_0)\) at some \(x_0\in\mathbb{R}\) with i.i.d. contaminated observations:\[ X_1,\dots,X_n\sim (1-\epsilon)f+\epsilon g\] where \(g\) stands for a contamination distribution. We closely track the effect of contamination by the following model index: contamination proportion \(\epsilon,\) smoothness of the target density \(\beta_0,\) smoothness of the contamination density \(\beta_1,\) and the local level of contamination \(m\) such that \(g(x_0)\leq{m}\). The local effect of contamination is shown to depend intricately on the interplay of these parameters. In particular, under a minimax framework, the cost \[[\epsilon^2(1\wedge m)^2]\vee[n^{-\frac{2\beta_1}{2\beta_1+1}}\epsilon^{\frac{2}{2\beta_1+1}}]\] is shown to be the optimal cost for contamination compared with the usual minimax rate without contamination. The lower bound relies on a novel construction that involves perturbations of a density function at two different resolutions. Such a construction may be of independent interest for the study of local effect of contamination in other nonparametric estimation problems. We also study the setting without any assumption on the contamination distribution, and the minimax cost for contamination is shown to be \[\epsilon^{\frac{2\beta_0}{\beta_0+1}}.\] Finally, the minimax cost for adaptation is established both for smooth contamination and arbitrary contamination. Under arbitrary contamination, we show that while adaptation to either contamination proportion or smoothness only costs a logarithmic factor, adaptation to both numbers is impossible.


62G07 Density estimation
62C20 Minimax procedures in statistical decision theory
Full Text: DOI arXiv Euclid


[1] Lawrence D. Brown and Mark G. Low. A constrained risk inequality with applications to nonparametric functional estimation., The Annals of Statistics, 24(6) :2524-2535, 1996. · Zbl 0867.62023
[2] T. Tony Cai. Rates of convergence and adaptation over Besov spaces under pointwise risk., Statistica Sinica, 13:881-902, 2003. · Zbl 1046.62046
[3] T. Tony Cai and Jiashun Jin. Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing., The Annals of Statistics, 38(1):100-145, 2010. · Zbl 1181.62040
[4] T. Tony Cai and Mark G. Low. On adaptive estimation of linear functionals., The Annals of Statistics, 33(5) :2311-2343, 2005. · Zbl 1086.62031
[5] T. Tony Cai and Mark G. Low. Optimal adaptive estimation of a quadratic functional., The Annals of Statistics, 34(5) :2298-2325, 2006. · Zbl 1110.62048
[6] Mengjie Chen, Chao Gao, and Zhao Ren. A general decision theory for Huber’s \(\epsilon \)-contamination model., Electronic Journal of Statistics, 10(2) :3752-3774, 2016. · Zbl 1357.62038
[7] Mengjie Chen, Chao Gao, Zhao Ren, et al. Robust covariance and scatter matrix estimation under Huber’s contamination model., The Annals of Statistics, 46(5) :1932-1960, 2018. · Zbl 1408.62104
[8] L. Devroye and G. Lugosi. Combinatorial methods in density estimation, 2001. · Zbl 0964.62025
[9] David L. Donoho. Statistical estimation and optimal recovery., The Annals of Statistics, 22(1):238-270, 1994. · Zbl 0805.62014
[10] David L. Donoho and Richard C. Liu. Geometrizing rates of convergence, iii., The Annals of Statistics, 19(2):668-701, 1991. · Zbl 0754.62029
[11] Bradley Efron. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis., Journal of the American Statistical Association, 99(465):96-104, 2004. · Zbl 1089.62502
[12] Chao Gao. Robust regression via mutivariate regression depth., Bernoulli (to appear), 2017.
[13] Christian H. Hesse. Deconvolving a density from partially contaminated observations., Journal of Multivariate Analysis, 55(2):246-260, 1995. · Zbl 0863.62037
[14] Peter J. Huber. Robust estimation of a location parameter., The Annals of Mathematical Statistics, 35(1):73-101, 1964. · Zbl 0136.39805
[15] Peter J. Huber. A robust version of the probability ratio test., The Annals of Mathematical Statistics, 36(6) :1753-1758, 1965. · Zbl 0137.12702
[16] Jiashun Jin and T. Tony Cai. Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons., Journal of the American Statistical Association, 102(478):495-506, 2007. · Zbl 1172.62319
[17] Iain M. Johnstone. Chi-square oracle inequalities., Lecture Notes-Monograph Series, pages 399-418, 2001. · Zbl 1373.62062
[18] O.V. Lepski and V.G. Spokoiny. Optimal pointwise adaptive methods in nonparametric estimation., The Annals of Statistics, 25(6) :2512-2546, 1997. · Zbl 0894.62041
[19] O.V. Lepski and T. Willer. Oracle inequalities and adaptive estimation in the convolution structure density model., The Annals of Statistics, 47(1):233-287, 2019. · Zbl 1419.62075
[20] O.V. Lepskii. On a problem of adaptive estimation in gaussian white noise., Theory of Probability & Its Applications, 35(3):454-466, 1991.
[21] O.V. Lepskii. Asymptotically minimax adaptive estimation. I: Upper bounds. optimally adaptive estimates., Theory of Probability & Its Applications, 36(4):682-697, 1992.
[22] O.V. Lepskii. Asymptotically minimax adaptive estimation. II. schemes without optimal adaptation: Adaptive estimators., Theory of Probability & Its Applications, 37(3):433-448, 1993.
[23] Rostyslav Maiboroda and Olena Sugakova. Nonparametric density estimation for symmetric distributions by contaminated data., Metrika, 75(1):109-126, 2012. · Zbl 1241.62045
[24] Bernard W. Silverman., Density estimation for statistics and data analysis, volume 26. CRC Press, 1986. · Zbl 0617.62042
[25] Karine Tribouley. Adaptive estimation of integrated functionals., Mathematical Methods of Statistics, 9(1):19-38, 2000. · Zbl 1006.62037
[26] Alexandre B. Tsybakov., Introduction to nonparametric estimation, volume 11. Springer, 2009. · Zbl 1176.62032
[27] Bin Yu. Assouad, fano, and le cam., Festschrift for Lucien Le Cam, 423:435, 1997.
[28] Ming Yuan and Jiaqin Chen. Deconvolving multidimensional density from partially contaminated observations., Journal of Statistical Planning and Inference, 104(1):147-160, 2002. · Zbl 1011.62039
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.