A new perspective on robust \(M\)-estimation: finite sample theory and applications to dependence-adjusted multiple testing. (English) Zbl 1409.62154

Consider an ordinary linear regression model with a vector of regression coefficients, and the random noise variable with mean zero and finite variance. When the normality assumption is violated, robust alternatives to the method of least square, typified by the Huber estimator, are sorely needed.
In this paper, the Huber estimator with tuning parameter adapted to sample size, dimension and variance of the noise is considered. The Berry-Essen inequality and Crámer-type moderate deviation are developed, too.
As a special case, a sub-Gaussian type deviation inequality and a non-asymptotic Bahadur representation when noise variables only have second moments are established.


62J15 Paired and multiple comparisons; multiple testing
62H15 Hypothesis testing in multivariate analysis
62F35 Robustness and adaptive procedures (parametric inference)
62J05 Linear regression; mixed models
62E20 Asymptotic distribution theory in statistics


Full Text: DOI arXiv Euclid


[1] Ahn, S. C. and Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors. Econometrica81 1203–1227. · Zbl 1274.62403 · doi:10.3982/ECTA8968
[2] Barbe, P. and Bertail, P. (1995). The Weighted Bootstrap. Lecture Notes in Statistics98. Springer, New York. · Zbl 0826.62030
[3] Barras, L., Scaillet, O. and Wermers, R. (2010). False discoveries in mutual fund performance: Measuring luck in estimated alphas. J. Finance65 179–216.
[4] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol.57 289–300. · Zbl 0809.62014
[5] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist.29 1165–1188. · Zbl 1041.62061 · doi:10.1214/aos/1013699998
[6] Carhart, M. M. (1997). On persistence in mutual fund performance. J. Finance52 57–82.
[7] Catoni, O. (2012). Challenging the empirical mean and empirical variance: A deviation study. Ann. Inst. Henri Poincaré Probab. Stat.48 1148–1185. · Zbl 1282.62070 · doi:10.1214/11-AIHP454
[8] Chang, J., Shao, Q.-M. and Zhou, W.-X. (2016). Cramér-type moderate deviations for Studentized two-sample \(U\)-statistics with applications. Ann. Statist.44 1931–1956. · Zbl 1357.60030 · doi:10.1214/15-AOS1375
[9] Clarke, S. and Hall, P. (2009). Robustness of multiple testing procedures against dependence. Ann. Statist.37 332–358. · Zbl 1155.62031 · doi:10.1214/07-AOS557
[10] Cont, R. (2001). Empirical properties of asset returns: Stylized facts and statistical issues. Quant. Finance1 223–236. · Zbl 1408.62174
[11] Delaigle, A., Hall, P. and Jin, J. (2011). Robustness and accuracy of methods for high dimensional data analysis based on Student’s \(t\)-statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol.73 283–301. · Zbl 1411.62222
[12] Desai, K. H. and Storey, J. D. (2012). Cross-dimensional inference of dependent high-dimensional data. J. Amer. Statist. Assoc.107 135–151. · Zbl 1261.62048 · doi:10.1080/01621459.2011.645777
[13] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc.99 96–104. · Zbl 1089.62502 · doi:10.1198/016214504000000089
[14] Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc.102 93–103. · Zbl 1284.62340 · doi:10.1198/016214506000001211
[15] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc.96 1151–1160. · Zbl 1073.62511 · doi:10.1198/016214501753382129
[16] Fama, E. F. (1963). Mandelbrot and the stable paretian hypothesis. J. Bus.36 420–429.
[17] Fan, J., Hall, P. and Yao, Q. (2007). To how many simultaneous hypothesis tests can normal, Student’s \(t\) or bootstrap calibration be applied? J. Amer. Statist. Assoc.102 1282–1288. · Zbl 1332.62063 · doi:10.1198/016214507000000969
[18] Fan, J. and Han, X. (2017). Estimation of the false discovery proportion with unknown dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol.79 1143–1164. · Zbl 1373.62272 · doi:10.1111/rssb.12204
[19] Fan, J., Han, X. and Gu, W. (2012). Estimating false discovery proportion under arbitrary covariance dependence. J. Amer. Statist. Assoc.107 1019–1035. · Zbl 1395.62219 · doi:10.1080/01621459.2012.713878
[20] Fan, J., Li, Q. and Wang, Y. (2017). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. Ser. B. Stat. Methodol.79 247–265. · Zbl 1414.62178
[21] Finkenstadt, B. and Rootzeén, H. (2003). Extreme Values in Finance, Telecommunications and the Environment. Chapman & Hall, New York.
[22] Friguet, C., Kloareg, M. and Causeur, D. (2009). A factor model approach to multiple testing under dependence. J. Amer. Statist. Assoc.104 1406–1415. · Zbl 1205.62071 · doi:10.1198/jasa.2009.tm08332
[23] Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist.32 1035–1061. · Zbl 1092.62065 · doi:10.1214/009053604000000283
[24] He, X. and Shao, Q.-M. (1996). A general Bahadur representation of \(M\)-estimators and its application to linear regression with nonstochastic designs. Ann. Statist.24 2608–2630. · Zbl 0867.62012 · doi:10.1214/aos/1032181172
[25] He, X. and Shao, Q.-M. (2000). On parameters of increasing dimensions. J. Multivariate Anal.73 120–135. · Zbl 0948.62013 · doi:10.1006/jmva.1999.1873
[26] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat.35 73–101. · Zbl 0136.39805 · doi:10.1214/aoms/1177703732
[27] Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Statist.1 799–821. · Zbl 0289.62033 · doi:10.1214/aos/1176342503
[28] Joly, E. and Lugosi, G. (2016). Robust estimation of \(U\)-statistics. Stochastic Process. Appl.126 3760–3773. · Zbl 1386.60074 · doi:10.1016/j.spa.2016.04.021
[29] Kosorok, M. R. and Ma, S. (2007). Marginal asymptotics for the “large \(p\), small \(n\)” paradigm: With applications to microarray data. Ann. Statist.35 1456–1486. · Zbl 1123.62005 · doi:10.1214/009053606000001433
[30] Langaas, M., Lindqvist, B. H. and Ferkingstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. Ser. B. Stat. Methodol.67 555–572. · Zbl 1095.62037 · doi:10.1111/j.1467-9868.2005.00515.x
[31] Leek, J. T. and Storey, J. D. (2008). A general framework for multiple testing dependence. Proc. Natl. Acad. Sci. USA105 18718–18723. · Zbl 1359.62202 · doi:10.1073/pnas.0808709105
[32] Lehmann, E. L. and Romano, J. P. (2005). Generalizations of the familywise error rate. Ann. Statist.33 1138–1154. · Zbl 1072.62060 · doi:10.1214/009053605000000084
[33] Linnik, Ju. V. (1961). On the probability of large deviations for the sums of independent variables. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. II 289–306. Univ. California Press, Berkeley, CA. · Zbl 0107.13401
[34] Liu, W. and Shao, Q.-M. (2010). Cramér-type moderate deviation for the maximum of the periodogram with application to simultaneous tests in gene expression time series. Ann. Statist.38 1913–1935. · Zbl 1202.62127 · doi:10.1214/09-AOS774
[35] Liu, W. and Shao, Q.-M. (2014). Phase transition and regularized bootstrap in large-scale \(t\)-tests with false discovery rate control. Ann. Statist.42 2003–2025. · Zbl 1305.62213 · doi:10.1214/14-AOS1249
[36] Mammen, E. (1989). Asymptotics with increasing dimension for robust regression with applications to the bootstrap. Ann. Statist.17 382–400. · Zbl 0674.62017 · doi:10.1214/aos/1176347023
[37] Mandelbrot, B. (1963). The variation of certain speculative prices. J. Bus.36 394–419.
[38] Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist.34 373–393. · Zbl 1091.62059 · doi:10.1214/009053605000000741
[39] Nemirovsky, A. S. and Yudin, D. B. (1983). Problem Complexity and Method Efficiency in Optimization. Wiley, New York. · Zbl 0501.90062
[40] Oberthuer, A., Berthold, F., Warnat, P., Hero, B., Kahlert, Y., Spitz, R., Ernestus, K., König, R., Haas, S., Eils, R., Schwab, M., Brors, B., Westermann, F. and Fischer, M. (2006). Customized oligonucleotide microarray gene expression based classification of neuroblastoma patients outperforms current clinical risk stratification. J. Clin. Oncol.24 5070–5078.
[41] Portnoy, S. (1985). Asymptotic behavior of \(M\) estimators of \(p\) regression parameters when \(p^{2}/n\) is large. II. Normal approximation. Ann. Statist.13 1403–1417. · Zbl 0601.62026 · doi:10.1214/aos/1176349744
[42] Schwartzman, A. and Lin, X. (2011). The effect of correlation in false discovery rate estimation. Biometrika98 199–214. · Zbl 1215.62071 · doi:10.1093/biomet/asq075
[43] Shi, L., et al. (MAQC Consortium) (2010). The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol.28 827–841.
[44] Spokoiny, V. and Zhilova, M. (2015). Bootstrap confidence sets under model misspecification. Ann. Statist.43 2653–2675. · Zbl 1327.62179 · doi:10.1214/15-AOS1355
[45] Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B. Stat. Methodol.64 479–498. · Zbl 1090.62073 · doi:10.1111/1467-9868.00346
[46] Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B. Stat. Methodol.66 187–205. · Zbl 1061.62110 · doi:10.1111/j.1467-9868.2004.00439.x
[47] Sun, W. and Cai, T. T. (2009). Large-scale multiple testing under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol.71 393–424. · Zbl 1248.62005 · doi:10.1111/j.1467-9868.2008.00694.x
[48] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
[49] Yohai, V. J. and Maronna, R. A. (1979). Asymptotic behavior of \(M\)-estimators for the linear model. Ann. Statist.7 258–268. · Zbl 0408.62027 · doi:10.1214/aos/1176344610
[50] Zhilova, M. (2016). Non-classical Berry-Esseen inequality and accuracy of the weighted bootstrap. Available at arXiv:1611.02686.
[51] Zhou, W.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.