×

On spike and slab empirical Bayes multiple testing. (English) Zbl 1455.62035

The authors consider multiple testing of hypotheses, a topic related to identifying the active variables among a large number of candidates in high-dimensional statistical models. High-dimensional data typically involve more than thousands of variables with only a small part of them being significant.
Bayesian multiple testing methodology is applied where the testing is based on comparing posterior probabilities of the hypotheses under consideration. The choice of prior relies on empirical Bayes approaches that aim at calibrating the prior in a fully automatic, data-driven, way.
The paper explores a connection between empirical Bayes posterior distributions and false discovery rate (FDR) control. In the Gaussian sequence model, the results show that spike and slab priors produce posterior distributions with particularly suitable multiple testing properties. The authors demonstrate that a uniform control is possible up to a constant term away from the target control level. This constant is very close to \(1\) in simulations, and can even be shown to be \(1\) asymptotically for some subclass of sparse vectors.
In this way, a theoretical validation is provided for the common practical use of posterior-based quantities in frequentist FDR control.
The theoretical results are illustrated with numerical experiments.

MSC:

62C12 Empirical decision procedures; empirical Bayes procedures
62G10 Nonparametric hypothesis testing
62J15 Paired and multiple comparisons; multiple testing

Software:

EBayesThresh
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Abramovich, F. and Angelini, C. (2006). Bayesian maximum a posteriori multiple testing procedure. Sankhyā 68 436-460. Zentralblatt MATH: 1193.62031
· Zbl 1193.62031
[2] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584-653. Zentralblatt MATH: 1092.62005
Digital Object Identifier: doi:10.1214/009053606000000074
Project Euclid: euclid.aos/1151418235
· Zbl 1092.62005 · doi:10.1214/009053606000000074
[3] Abramovich, F., Grinshtein, V. and Pensky, M. (2007). On optimality of Bayesian testimation in the normal means problem. Ann. Statist. 35 2261-2286. Zentralblatt MATH: 1126.62003
Digital Object Identifier: doi:10.1214/009053607000000226
Project Euclid: euclid.aos/1194461730
· Zbl 1126.62003 · doi:10.1214/009053607000000226
[4] Arias-Castro, E. and Chen, S. (2017). Distribution-free multiple testing. Electron. J. Stat. 11 1983-2001. Zentralblatt MATH: 1361.62023
Digital Object Identifier: doi:10.1214/17-EJS1277
· Zbl 1361.62023 · doi:10.1214/17-EJS1277
[5] Belitser, E. and Ghosal, S. (2019). Empirical Bayes oracle uncertainty quantification for regression. Ann. Statist. To appear.
[6] Belitser, E. and Nurushev, N. (2019). Needles and straw in a haystack: Robust empirical Bayes confidence for possibly sparse sequences. Bernoulli. To appear. Zentralblatt MATH: 07140497
Digital Object Identifier: doi:10.3150/19-BEJ1122
Project Euclid: euclid.bj/1574758826
· Zbl 1441.62110 · doi:10.3150/19-BEJ1122
[7] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. Zentralblatt MATH: 0809.62014
Digital Object Identifier: doi:10.1111/j.2517-6161.1995.tb02031.x
· Zbl 0809.62014 · doi:10.1111/j.2517-6161.1995.tb02031.x
[8] Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93 491-507. Zentralblatt MATH: 1108.62069
Digital Object Identifier: doi:10.1093/biomet/93.3.491
· Zbl 1108.62069 · doi:10.1093/biomet/93.3.491
[9] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165-1188. Zentralblatt MATH: 1041.62061
Digital Object Identifier: doi:10.1214/aos/1013699998
Project Euclid: euclid.aos/1013699998
· Zbl 1041.62061 · doi:10.1214/aos/1013699998
[10] Bogdan, M., Chakrabarti, A., Frommlet, F. and Ghosh, J. K. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist. 39 1551-1579. Zentralblatt MATH: 1221.62012
Digital Object Identifier: doi:10.1214/10-AOS869
Project Euclid: euclid.aos/1307452128
· Zbl 1221.62012 · doi:10.1214/10-AOS869
[11] Bogdan, M., Ghosh, J. K. and Tokdar, S. T. (2008). A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing. In Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen. Inst. Math. Stat. (IMS) Collect. 1 211-230. IMS, Beachwood, OH.
[12] Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E. J. (2015). SLOPE—Adaptive variable selection via convex optimization. Ann. Appl. Stat. 9 1103-1140. Zentralblatt MATH: 06525980
Digital Object Identifier: doi:10.1214/15-AOAS842
Project Euclid: euclid.aoas/1446488733
· Zbl 1454.62212 · doi:10.1214/15-AOAS842
[13] Cai, T. T. and Sun, W. (2009). Simultaneous testing of grouped hypotheses: Finding needles in multiple haystacks. J. Amer. Statist. Assoc. 104 1467-1481. Zentralblatt MATH: 1205.62005
Digital Object Identifier: doi:10.1198/jasa.2009.tm08415
· Zbl 1205.62005 · doi:10.1198/jasa.2009.tm08415
[14] Cai, T. T., Sun, W. and Wang, W. (2019). Covariate-assisted ranking and screening for large-scale two-sample inference. J. R. Stat. Soc. Ser. B. Stat. Methodol. 81 187-234. Zentralblatt MATH: 1420.62032
Digital Object Identifier: doi:10.1111/rssb.12304
· Zbl 1420.62032 · doi:10.1111/rssb.12304
[15] Cao, J., Xie, X.-J., Zhang, S., Whitehurst, A. and White, M. A. (2009). Bayesian optimal discovery procedure for simultaneous significance testing. BMC Bioinform. 10 Article ID 5.
[16] Castillo, I. and Mismer, R. (2018). Empirical Bayes analysis of spike and slab posterior distributions. Electron. J. Stat. 12 3953-4001. Zentralblatt MATH: 1409.62026
Digital Object Identifier: doi:10.1214/18-EJS1494
· Zbl 1409.62026 · doi:10.1214/18-EJS1494
[17] Castillo, I. and Roquain, E. (2020). Supplement to “On spike and slab empirical Bayes multiple testing.” https://doi.org/10.1214/19-AOS1897SUPP.
[18] Castillo, I. and Szabó, B. (2019). Spike and slab empirical Bayes sparse credible sets. Bernoulli. To appear. · Zbl 1441.62077
[19] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069-2101. Zentralblatt MATH: 1257.62025
Digital Object Identifier: doi:10.1214/12-AOS1029
Project Euclid: euclid.aos/1351602537
· Zbl 1257.62025 · doi:10.1214/12-AOS1029
[20] Donoho, D. L., Johnstone, I. M., Hoch, J. C. and Stern, A. S. (1992). Maximum entropy and the nearly black object. J. Roy. Statist. Soc. Ser. B 54 41-81. Zentralblatt MATH: 0788.62103
Digital Object Identifier: doi:10.1111/j.2517-6161.1992.tb01864.x
· Zbl 0788.62103 · doi:10.1111/j.2517-6161.1992.tb01864.x
[21] Efron, B. (2007). Size, power and false discovery rates. Ann. Statist. 35 1351-1377. Zentralblatt MATH: 1123.62008
Digital Object Identifier: doi:10.1214/009053606000001460
Project Euclid: euclid.aos/1188405614
· Zbl 1123.62008 · doi:10.1214/009053606000001460
[22] Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 1-22. Zentralblatt MATH: 1327.62046
Digital Object Identifier: doi:10.1214/07-STS236
Project Euclid: euclid.ss/1215441276
· Zbl 1327.62046 · doi:10.1214/07-STS236
[23] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151-1160. Zentralblatt MATH: 1073.62511
Digital Object Identifier: doi:10.1198/016214501753382129
· Zbl 1073.62511 · doi:10.1198/016214501753382129
[24] Finner, H., Dickhaus, T. and Roters, M. (2007). Dependency and false discovery rate: Asymptotics. Ann. Statist. 35 1432-1455. Zentralblatt MATH: 1125.62076
Digital Object Identifier: doi:10.1214/009053607000000046
Project Euclid: euclid.aos/1188405617
· Zbl 1125.62076 · doi:10.1214/009053607000000046
[25] George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731-747. Zentralblatt MATH: 1029.62008
Digital Object Identifier: doi:10.1093/biomet/87.4.731
· Zbl 1029.62008 · doi:10.1093/biomet/87.4.731
[26] Gerard, D. and Stephens, M. (2018). Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation. Biostatistics To appear.
[27] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500-531. Zentralblatt MATH: 1105.62315
Digital Object Identifier: doi:10.1214/aos/1016218228
Project Euclid: euclid.aos/1016218228
· Zbl 1105.62315 · doi:10.1214/aos/1016218228
[28] Guindani, M., Müller, P. and Zhang, S. (2009). A Bayesian discovery procedure. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 905-925. Zentralblatt MATH: 1411.62224
Digital Object Identifier: doi:10.1111/j.1467-9868.2009.00714.x
· Zbl 1411.62224 · doi:10.1111/j.1467-9868.2009.00714.x
[29] Jiang, W. and Zhang, C.-H. (2009). General maximum likelihood empirical Bayes estimation of normal means. Ann. Statist. 37 1647-1684. Zentralblatt MATH: 1168.62005
Digital Object Identifier: doi:10.1214/08-AOS638
Project Euclid: euclid.aos/1245332828
· Zbl 1168.62005 · doi:10.1214/08-AOS638
[30] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist. 32 1594-1649. Zentralblatt MATH: 1047.62008
Digital Object Identifier: doi:10.1214/009053604000000030
Project Euclid: euclid.aos/1091626180
· Zbl 1047.62008 · doi:10.1214/009053604000000030
[31] Johnstone, I. M. and Silverman, B. W. (2005). EbayesThresh: R programs for empirical Bayes thresholding. J. Stat. Softw. 12 Issue 8.
[32] Martin, R. and Tokdar, S. (2012). A nonparametric empirical Bayes framework for large-scale significance testing. Biostatistics 13 427-439. Zentralblatt MATH: 1244.62066
Digital Object Identifier: doi:10.1093/biostatistics/kxr039
· Zbl 1244.62066 · doi:10.1093/biostatistics/kxr039
[33] Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. J. Amer. Statist. Assoc. 83 1023-1036. Zentralblatt MATH: 0673.62051
Digital Object Identifier: doi:10.1080/01621459.1988.10478694
· Zbl 0673.62051 · doi:10.1080/01621459.1988.10478694
[34] Müller, P., Parmigiani, G., Robert, C. and Rousseau, J. (2004). Optimal sample size for multiple testing: The case of gene expression microarrays. J. Amer. Statist. Assoc. 99 990-1001. Zentralblatt MATH: 1055.62127
Digital Object Identifier: doi:10.1198/016214504000001646
· Zbl 1055.62127 · doi:10.1198/016214504000001646
[35] Neuvial, P. and Roquain, E. (2012). On false discovery rate thresholding for classification under sparsity. Ann. Statist. 40 2572-2600. Zentralblatt MATH: 1373.62315
Digital Object Identifier: doi:10.1214/12-AOS1042
Project Euclid: euclid.aos/1359987531
· Zbl 1373.62315 · doi:10.1214/12-AOS1042
[36] Rudin, W. (1976). Principles of Mathematical Analysis, 3rd ed. International Series in Pure and Applied Mathematics. McGraw-Hill, New York. · Zbl 0346.26002
[37] Salomond, J.-B. (2017). Risk quantification for the thresholding rule for multiple testing using Gaussian scale mixtures. Preprint. Available at arXiv:1711.08705. arXiv: 1711.08705
[38] Sarkar, S. K. (2007). Stepup procedures controlling generalized FWER and generalized FDR. Ann. Statist. 35 2405-2420. Zentralblatt MATH: 1129.62066
Digital Object Identifier: doi:10.1214/009053607000000398
Project Euclid: euclid.aos/1201012966
· Zbl 1129.62066 · doi:10.1214/009053607000000398
[39] Sarkar, S. K., Zhou, T. and Ghosh, D. (2008). A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statist. Sinica 18 925-945. Zentralblatt MATH: 1149.62003
· Zbl 1149.62003
[40] Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Statist. 38 2587-2619. Zentralblatt MATH: 1200.62020
Digital Object Identifier: doi:10.1214/10-AOS792
Project Euclid: euclid.aos/1278861454
· Zbl 1200.62020 · doi:10.1214/10-AOS792
[41] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687-714. Zentralblatt MATH: 1041.62022
Digital Object Identifier: doi:10.1214/aos/1009210686
Project Euclid: euclid.aos/1009210686
· Zbl 1041.62022 · doi:10.1214/aos/1009210686
[42] Stephens, M. (2017). False discovery rates: A new deal. Biostatistics 18 275-294.
[43] Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the \(q\)-value. Ann. Statist. 31 2013-2035. Zentralblatt MATH: 1042.62026
Digital Object Identifier: doi:10.1214/aos/1074290335
Project Euclid: euclid.aos/1074290335
· Zbl 1042.62026 · doi:10.1214/aos/1074290335
[44] Su, W. and Candès, E. (2016). SLOPE is adaptive to unknown sparsity and asymptotically minimax. Ann. Statist. 44 1038-1068. Zentralblatt MATH: 1338.62032
Digital Object Identifier: doi:10.1214/15-AOS1397
Project Euclid: euclid.aos/1460381686
· Zbl 1338.62032 · doi:10.1214/15-AOS1397
[45] Sun, W. and Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102 901-912. Zentralblatt MATH: 05564419
Digital Object Identifier: doi:10.1198/016214507000000545
· Zbl 1469.62318 · doi:10.1198/016214507000000545
[46] Sun, W. and Cai, T. T. (2009). Large-scale multiple testing under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 393-424. Zentralblatt MATH: 1248.62005
Digital Object Identifier: doi:10.1111/j.1467-9868.2008.00694.x
· Zbl 1248.62005 · doi:10.1111/j.1467-9868.2008.00694.x
[47] van der Pas, S., Szabó, B. and van der Vaart, A. (2017). Adaptive posterior contraction rates for the horseshoe. Electron. J. Stat. 11 3196-3225. Zentralblatt MATH: 1373.62140
Digital Object Identifier: doi:10.1214/17-EJS1316
· Zbl 1373.62140 · doi:10.1214/17-EJS1316
[48] van der Pas, S., Szabó, B. and van der Vaart, A. (2017). Uncertainty quantification for the horseshoe (with discussion). Bayesian Anal. 12 1221-1274. Zentralblatt MATH: 1384.62155
Digital Object Identifier: doi:10.1214/17-BA1065
· Zbl 1384.62155 · doi:10.1214/17-BA1065
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.