×

Some optimality properties of FDR controlling rules under sparsity. (English) Zbl 1337.62184

Summary: False Discovery Rate (FDR) and the Bayes risk are two different statistical measures, which can be used to evaluate and compare multiple testing procedures. Recent results show that under sparsity FDR controlling procedures, like the popular Benjamini-Hochberg (BH) procedure, perform also very well in terms of the Bayes risk. In particular asymptotic Bayes optimality under sparsity (ABOS) of BH was shown previously for location and scale models based on log-concave densities. This article extends previous work to a substantially larger set of distributions of effect sizes under the alternative, where the alternative distribution of true signals does not change with the number of tests \(m\), while the sample size \(n\) slowly increases. ABOS of BH and the corresponding step-down procedure based on FDR levels proportional to \(n^{-1/2}\) are proved. A simulation study shows that these asymptotic results are relevant already for relatively small values of \(m\) and \(n\). Apart from showing asymptotic optimality of BH, our results on the optimal FDR level provide a natural extension of the well known results on the significance levels of Bayesian tests.

MSC:

62J15 Paired and multiple comparisons; multiple testing
62C10 Bayesian problems; characterization of Bayes procedures
62C12 Empirical decision procedures; empirical Bayes procedures
62C20 Minimax procedures in statistical decision theory
62C25 Compound decision problems in statistical decision theory
62F15 Bayesian inference
PDFBibTeX XMLCite
Full Text: DOI Euclid

References:

[1] Abramovich F., Benjamini Y., Donoho D. L. and Johnstone I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 , 584-653. MR2281879 · Zbl 1092.62005 · doi:10.1214/009053606000000074
[2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B. 57 , 289-300. MR1325392 · Zbl 0809.62014
[3] Bogdan, M., Ghosh, J. K., and Doerge, R. W. (2004). Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitive trait loci. Genetics 167 , 989-999.
[4] Bogdan, M., Ghosh, J. K., Ochman, A. and Tokdar, S. T. (2007) On the Empirical Bayes approach to the problem of multiple testing. Quality and Reliability Engineering International 23 , 727-739.
[5] Bogdan, M., Ghosh, J. K. and Tokdar S. T. (2008). A comparison of the Simes-Benjamini-Hochberg procedure with some Bayesian rules for multiple testing. IMS Collections, Vol.1, Beyond Parametrics in Interdisciplinary Research: Fetschrift in Honor of Professor Pranab K. Sen, edited by N. Balakrishnan, Edsel Peña and Mervyn J. Silvapulle 211-230. Beachwood Ohio.
[6] Bogdan, M., Chakrabati, A., Frommlet F. and Ghosh, J. K. (2011) Asymptotic Bayes Optimality under sparsity of some multiple testing procedures. Ann. Statist. , 39 , 1551-1579. · Zbl 1221.62012 · doi:10.1214/10-AOS869
[7] Bogdan, M., Ghosh, J. K. and Żak-Szatkowska, M. (2008) Selecting explanatory variables with the modified version of Bayesian Information Criterion, Quality and Reliability Engineering International 24 , 627-641.
[8] Bühlmann, P. and van de Geer, S. (2011) Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer. · Zbl 1273.62015
[9] Cai, T. and Jin, J. (2010). Optimal rates of convergence for estimating the null and proportion of non-null effects in large-scale multiple testing. Ann. Statist. 38 , 100-145. · Zbl 1181.62040 · doi:10.1214/09-AOS696
[10] Chi, Z. (2008). False discovery rate control with multivariate \(p\)-values. Electronic Journal of Statistics 2 , 368-411. · Zbl 1320.62100
[11] Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman and Hall. · Zbl 0334.62003
[12] Donoho, D. L. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 , 962-994. · Zbl 1092.62051 · doi:10.1214/009053604000000265
[13] Donoho, D. L. and Jin, J. (2006). Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data. Ann. Statist. 34 , 2980-3018. · Zbl 1114.62010 · doi:10.1214/009053606000000920
[14] Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over \(l_{p}\)-balls for \(l_{q}\)-error. Probab. Theory Related Fields 99 , 277-303. · Zbl 0802.62006 · doi:10.1007/BF01199026
[15] Efron, B. and Tibshirani, R. (2002). Empirical bayes methods and false discovery rates for microarrays. Genetic Epidemiology , 23 , 70-86.
[16] Efron, B. (2008). Microarrays, Empirical Bayes and the two-group model. Stat. Sci. , 23 (1), 1-22. · Zbl 1327.62046
[17] Fan, J., Hall, P. and Yao, Q. (2007). To how many simultaneous hypothesis tests can normal, Student’s t or bootstrap calibration be applied? J. Amer. Statist. Assoc. , 102 , 1282-1288. · Zbl 1332.62063
[18] Feller, W. (1966). An introduction to probability theory and its applications. Vol. 2: Wiley, New York. · Zbl 0138.10207
[19] Finner, H., Dickhaus, T. and Roters, M. (2009). On the false discovery rate and an asymptotically optimal rejection curve. Ann. Statist. 37 , 596-618. · Zbl 1162.62068 · doi:10.1214/07-AOS569
[20] Frommlet, F., Ruhaltinger, F., Twarog, P. and Bogdan, M. (2012). Modified versions of Bayesian Information Criterion for genome-wide association studies. Comput. Stat. Data An., 56, 1038-1051.
[21] Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 (3), 499-517. · Zbl 1090.62072 · doi:10.1111/1467-9868.00347
[22] George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 , 731-747. · Zbl 1029.62008 · doi:10.1093/biomet/87.4.731
[23] Guo, W. and Rao, M. B. (2008). On optimality of the Benjamini-Hochberg procedure for the false discovery rate. Statistics and Probability Letters 78 , 2024-2030. · Zbl 1283.62154
[24] Jin, J. and Cai, T. C. (2007). Estimating the null and the proportion of non-null effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 , 495-506. · Zbl 1172.62319 · doi:10.1198/016214507000000167
[25] Johnson, B. R. and Truax, D. R. (1973). Asymptotic behavior of Bayes tests and Bayes risk. Ann. Statist. 2 , 278-294. · Zbl 0275.62021 · doi:10.1214/aos/1176342663
[26] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist. 32 , 1594-1649. · Zbl 1047.62008 · doi:10.1214/009053604000000030
[27] Johnstone, I. M. and Silverman, B. W. (2005). EbayesThresh: R programs for empirical Bayes thresholding. J. Stat. Software 12 , Issue 8.
[28] Lehmann, E. L. 1957. A theory of some multiple decision problems, I. Ann. Math. Stat. 28 , 1-25. · Zbl 0078.33402 · doi:10.1214/aoms/1177707034
[29] Lehmann, E. L. and Romano, J. P. (2005). Generalizations of the familywise error rate. Ann. Statist. 33 , 1138-1154. · Zbl 1072.62060 · doi:10.1214/009053605000000084
[30] Lehmann, E. L., Romano, J. P. and Popper Shaffer, J. (2005). On optimality of stepdown and stepup multiple test procedures. Ann. Statist. 33 , 1084-1108. · Zbl 1073.62063 · doi:10.1214/009053605000000066
[31] Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 , 373-393. MR2275246 · Zbl 1091.62059 · doi:10.1214/009053605000000741
[32] Meuwissen, T. and Goddard, M. (2010). Accurate Prediction of Genetic Values for Complex Traits by Whole-Genome Resequencing. Genetics 185 (2), 623-631
[33] Neuvial, P. and Roquain, E. (2011). On false discovery rate thresholding for classification under sparsity. · Zbl 1373.62315
[34] Peña, E. A., Habiger, J. D., and Wu, W. (2011). Power-enhanced multiple decision functions controlling family-wise error and false discovery rates. Ann. Statist. 39 (1), 556-583. · Zbl 1274.62143
[35] Purdom, E. and Holmes, S. P. (2005) Error Distribution for Gene Expression Data. SAGMB 4 (1), Article 16 · Zbl 1083.62114
[36] Roquain, E., and van de Wiel, M. A. (2009). Optimal weighting for false discovery rate control. Electronic Journal of Statistics 3 , 678-711. · Zbl 1326.62164
[37] Schwarz, G. (1978). Estimating the Dimension of a Model. Ann. Statist. 6 (2), 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[38] Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 (7), 2144-2162. · Zbl 1087.62039 · doi:10.1016/j.jspi.2005.08.031
[39] Shorack, G. R. and Wellner, J. A. (1986). Empirical processes with applications to Statistics , Wiley Series in Probability and Mathematical Statistics. · Zbl 1170.62365
[40] Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the \(q\)-value. Ann. Statist. , 31 (6), 2013-2035. · Zbl 1042.62026 · doi:10.1214/aos/1074290335
[41] Storey, J. D. (2007). The optimal discovery procedure: a new approach to simultaneous significance testing. J. R. Statist. Soc. B 69 , 347-368.
[42] Sun, W. and Cai, T. C. (2007). Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102 , 901-912. · Zbl 1469.62318
[43] Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B , 58 , 267-288. · Zbl 0850.62538
[44] Żak-Szatkowska, M. and Bogdan, M. (2011). Modified versions of Bayesian Information Criterion for sparse Generalized Linear Models, Comput. Stat. Data An. 55 , 2908-2924. · Zbl 1218.62073 · doi:10.1016/j.csda.2011.04.016
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.