zbMATH — the first resource for mathematics

On false discovery rate thresholding for classification under sparsity. (English) Zbl 1373.62315
Summary: We study the properties of false discovery rate (FDR) thresholding, viewed as a classification procedure. The “\(0\)”-class (null) is assumed to have a known density while the “\(1\)”-class (alternative) is obtained from the “\(0\)”-class either by translation or by scaling. Furthermore, the “\(1\)”-class is assumed to have a small number of elements w.r.t. the “\(0\)”-class (sparsity). We focus on densities of the Subbotin family, including Gaussian and Laplace models. Nonasymptotic oracle inequalities are derived for the excess risk of FDR thresholding. These inequalities lead to explicit rates of convergence of the excess risk to zero, as the number \(m\) of items to be classified tends to infinity and in a regime where the power of the Bayes rule is away from \(0\) and \(1\). Moreover, these theoretical investigations suggest an explicit choice for the target level \(\alpha_{m}\) of FDR thresholding, as a function of \(m\). Our oracle inequalities show theoretically that the resulting FDR thresholding adapts to the unknown sparsity regime contained in the data. This property is illustrated with numerical experiments.

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62J15 Paired and multiple comparisons; multiple testing
62H15 Hypothesis testing in multivariate analysis
62F15 Bayesian inference
Full Text: DOI Euclid
[1] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584-653. · Zbl 1092.62005
[2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 289-300. · Zbl 0809.62014
[3] Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93 491-507. · Zbl 1108.62069
[4] Blanchard, G., Lee, G. and Scott, C. (2010). Semi-supervised novelty detection. J. Mach. Learn. Res. 11 2973-3009. · Zbl 1242.68205
[5] Blanchard, G. and Roquain, É. (2009). Adaptive false discovery rate control under independence and dependence. J. Mach. Learn. Res. 10 2837-2871. · Zbl 1235.62093
[6] Bogdan, M., Chakrabarti, A., Frommlet, F. and Ghosh, J. K. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist. 39 1551-1579. · Zbl 1221.62012
[7] Bogdan, M., Ghosh, J. K. and Tokdar, S. T. (2008). A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing. In Beyond Parametrics in Interdisciplinary Research : Festschrift in Honor of Professor Pranab K. Sen. Inst. Math. Stat. Collect. 1 211-230. IMS, Beachwood, OH.
[8] Chi, Z. (2007). On the performance of FDR control: Constraints and a partial solution. Ann. Statist. 35 1409-1431. · Zbl 1125.62075
[9] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962-994. · Zbl 1092.62051
[10] Donoho, D. and Jin, J. (2006). Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data. Ann. Statist. 34 2980-3018. · Zbl 1114.62010
[11] Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 1-22. · Zbl 1327.62046
[12] Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23 70-86.
[13] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151-1160. · Zbl 1073.62511
[14] Ferreira, J. A. and Zwinderman, A. H. (2006). On the Benjamini-Hochberg method. Ann. Statist. 34 1827-1849. · Zbl 1246.62170
[15] Finner, H., Dickhaus, T. and Roters, M. (2009). On the false discovery rate and an asymptotically optimal rejection curve. Ann. Statist. 37 596-618. · Zbl 1162.62068
[16] Finner, H. and Roters, M. (2002). Multiple hypotheses testing and expected number of type I errors. Ann. Statist. 30 220-238. · Zbl 1012.62020
[17] Gavrilov, Y., Benjamini, Y. and Sarkar, S. K. (2009). An adaptive step-down procedure with proven FDR control under independence. Ann. Statist. 37 619-629. · Zbl 1162.62069
[18] Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 499-517. · Zbl 1090.62072
[19] Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035-1061. · Zbl 1092.62065
[20] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896 . Springer, Berlin. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003, with a foreword by Jean Picard. · Zbl 1170.60006
[21] Neuvial, P. and Roquain, E. (2012). Supplement to “On false discovery rate thresholding for classification under sparsity.” . · Zbl 1373.62315
[22] Roquain, E. (2011). Type I error rate control for testing many hypotheses: A survey with proofs. J. SFdS 152 3-38. · Zbl 1316.62115
[23] Roquain, E. and Villers, F. (2011). Exact calculations for false discovery proportion with application to least favorable configurations. Ann. Statist. 39 584-612. · Zbl 1209.62164
[24] Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist. 30 239-257. · Zbl 1101.62349
[25] Sarkar, S. K. (2008). On methods controlling the false discovery rate. Sankhyā 70 135-168. · Zbl 1193.62121
[26] Sarkar, S. K., Zhou, T. and Ghosh, D. (2008). A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statist. Sinica 18 925-945. · Zbl 1149.62003
[27] Sawyers, C. L. (2008). The cancer biomarker problem. Nature 452 548-552.
[28] Seeger, P. (1968). A note on a method for the analysis of significances en masse. Technometrics 10 586-593.
[29] Sen, P. K. (1999). Some remarks on Simes-type multiple tests of significance. J. Statist. Plann. Inference 82 139-145. Multiple comparisons (Tel Aviv, 1996). · Zbl 1063.62560
[30] Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications to Statistics . Wiley, New York. · Zbl 1170.62365
[31] Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479-498. · Zbl 1090.62073
[32] Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the \(q\)-value. Ann. Statist. 31 2013-2035. · Zbl 1042.62026
[33] Tamhane, A. C., Liu, W. and Dunnett, C. W. (1998). A generalized step-up-down multiple test procedure. Canad. J. Statist. 26 353-363. · Zbl 0914.62013
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.