×

Controlling the false discovery exceedance for heterogeneous tests. (English) Zbl 1455.62107

Summary: Several classical methods exist for controlling the false discovery exceedance (FDX) for large-scale multiple testing problems, among them the Lehmann-Romano procedure [E. L. Lehmann and J. P. Romano, Ann. Stat. 33, No. 3, 1138–1154 (2005; Zbl 1072.62060)] ([LR] below) and the Guo-Romano procedure [W. Guo and J. Romano, Stat. Appl. Genet. Mol. Biol. 6, No. 1, Paper No. 3, 34 p. (2007; Zbl 1166.62316)] ([GR] below). While these two procedures are the most prominent, they were originally designed for homogeneous test statistics, that is, when the null distribution functions of the \(p\)-values \(F_i, 1\leq i\leq m\), are all equal. In many applications, however, the data are heterogeneous which leads to heterogeneous null distribution functions. Ignoring this heterogeneity induces a lack of power. In this paper, we develop three new procedures that incorporate the \(F_i \)’s, while maintaining rigorous FDX control. The heterogeneous version of [LR], denoted [HLR], is based on the arithmetic average of the \(F_i \)’s, while the heterogeneous version of [GR], denoted [HGR], is based on the geometric average of the \(F_i \)’s. We also introduce a procedure [PB], that is based on the Poisson-binomial distribution and that uniformly improves [HLR] and [HGR], at the price of a higher computational complexity. Perhaps surprisingly, this shows that, contrary to the known theory of false discovery rate (FDR) control under heterogeneity, the way to incorporate the \(F_i \)’s can be particularly simple in the case of FDX control, and does not require any further correction term. The performances of the new proposed procedures are illustrated by real and simulated data in two important heterogeneous settings: first, when the test statistics are continuous but the \(p\)-values are weighted by some known independent weight vector, e.g., coming from co-data sets; second, when the test statistics are discretely distributed, as is the case for data representing frequencies or counts. Our new procedures are implemented in the R package FDX, see F. Junge and S. Döhler [“FDX: FDX controlling multiple testing procedures for heterogeneous and discrete tests”, R package version 0.1-3. (2020)].

MSC:

62H15 Hypothesis testing in multivariate analysis
62J15 Paired and multiple comparisons; multiple testing
62G10 Nonparametric hypothesis testing
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Basu, P., Fu, L., Saretto, A., and Sun, W. (2020). Empirical bayes control of the false discovery exceedance. Personal, communication.
[2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing., Journal of the Royal Statistical Society. Series B, 57(1):289-300. · Zbl 0809.62014 · doi:10.1111/j.2517-6161.1995.tb02031.x
[3] Blanchard, G., Neuvial, P., and Roquain, E. (2020). Post hoc confidence bounds on false positives using reference families., Ann. Statist., 48(3):1281-1303. · Zbl 1450.62038 · doi:10.1214/19-AOS1847
[4] Blanchard, G. and Roquain, E. (2008). Two simple sufficient conditions for FDR control., Electron. J. Stat., 2:963-992. · Zbl 1320.62179 · doi:10.1214/08-EJS180
[5] Chen, X., Doerge, R. W., and Heyse, J. F. (2018). Multiple testing with discrete data: proportion of true null hypotheses and two adaptive FDR procedures., Biom. J., 60(4):761-779. · Zbl 1400.62253 · doi:10.1002/bimj.201700157
[6] Chen, X., Doerge, R. W., and Sarkar, S. K. (2020). A weighted FDR procedure under discrete and heterogeneous null distributions., Biom. J., 62(6):1544-1563. · Zbl 1448.62155 · doi:10.1002/bimj.201900216
[7] Delattre, S. and Roquain, E. (2011). On the false discovery proportion convergence under Gaussian equi-correlation., Statist. Probab. Lett., 81(1):111-115. · Zbl 1206.62132 · doi:10.1016/j.spl.2010.09.025
[8] Delattre, S. and Roquain, E. (2015). New procedures controlling the false discovery proportion via Romano-Wolf’s heuristic., Ann. Statist., 43(3):1141-1177. · Zbl 1320.62128 · doi:10.1214/14-AOS1302
[9] Delattre, S. and Roquain, E. (2016). On empirical distribution function of high-dimensional Gaussian vector components with an application to multiple testing., Bernoulli, 22(1):302-324. · Zbl 1332.62057 · doi:10.3150/14-BEJ659
[10] Dickhaus, T., Straßburger, K., Schunk, D., Morcillo-Suarez, C., Illig, T., and Navarro, A. (2012). How to analyze many contingency tables simultaneously in genetic association studies., Statistical applications in genetics and molecular biology, 11(4). · Zbl 1296.92027 · doi:10.1515/1544-6115.1776
[11] Ditzhaus, M. and Janssen, A. (2019). Variability and stability of the false discovery proportion., Electron. J. Statist., 13(1):882-910. · Zbl 1432.62127 · doi:10.1214/19-EJS1544
[12] Döhler, S. (2016). A discrete modification of the Benjamini—Yekutieli procedure., Econometrics and Statistics.
[13] Döhler, S., Durand, G., and Roquain, E. (2018). New FDR bounds for discrete and heterogeneous tests., Electron. J. Statist., 12(1):1867-1900. · Zbl 1392.62227 · doi:10.1214/18-EJS1441
[14] Dudoit, S. and van der Laan, M. J. (2007)., Multiple Testing Procedures and Applications to Genomics. Springer Series in Statistics. Springer. ISBN: 978-0-387-49316-9.
[15] Durand, G. (2019). Adaptive \(p\)-value weighting with power optimality., Electron. J. Statist., 13(2):3336-3385. · Zbl 1432.62232 · doi:10.1214/19-EJS1578
[16] Durand, G. and Junge, F. (2019)., DiscreteFDR: Multiple Testing Procedures with Adaptation for Discrete Tests. R package version 1.2.
[17] Durand, G., Junge, F., Döhler, S., and Roquain, E. (2019). DiscreteFDR: An R package for controlling the false discovery rate for discrete test statistics., arXiv e-prints, arXiv:1904.02054.
[18] Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control., Ann. Statist., 32(3):1035-1061. · Zbl 1092.62065 · doi:10.1214/009053604000000283
[19] Genovese, C. R., Roeder, K., and Wasserman, L. (2006). False discovery control with p-value weighting., Biometrika, 93(3):509-524. · Zbl 1108.62070 · doi:10.1093/biomet/93.3.509
[20] Genovese, C. R. and Wasserman, L. (2006). Exceedance control of the false discovery proportion., J. Amer. Statist. Assoc., 101(476):1408-1417. · Zbl 1171.62338 · doi:10.1198/016214506000000339
[21] Gilbert, P. (2005). A modified false discovery rate multiple-comparisons procedure for discrete data, applied to human immunodeficiency virus genetics., Journal of the Royal Statistical Society. Series C, 54(1):143-158. · Zbl 1490.62358 · doi:10.1111/j.1467-9876.2005.00475.x
[22] Goeman, J. J. and Solari, A. (2011). Multiple testing for exploratory research., Statist. Sci., 26(4):584-597. · Zbl 1331.62369 · doi:10.1214/11-STS356
[23] Guo, W., He, L., and Sarkar, S. K. (2014). Further results on controlling the false discovery proportion., Ann. Statist., 42(3):1070-1101. · Zbl 1305.62271 · doi:10.1214/14-AOS1214
[24] Guo, W. and Romano, J. (2007). A generalized Sidak-Holm procedure and control of generalized error rates under independence., Stat. Appl. Genet. Mol. Biol., 6:Art. 3, 35 pp. (electronic). · Zbl 1166.62316 · doi:10.2202/1544-6115.1247
[25] Habiger, J. D. (2015). Multiple test functions and adjusted \(p\)-values for test statistics with discrete distributions., J. Statist. Plann. Inference, 167:1-13. · Zbl 1326.62043 · doi:10.1016/j.jspi.2015.06.003
[26] Heller, R. and Gur, H. (2011). False discovery rate controlling procedures for discrete tests., ArXiv e-prints.
[27] Heller, R., Gur, H., and Yaacoby, S. (2012)., discreteMTP: Multiple testing procedures for discrete test statistics. R package version 0.1-2.
[28] Hemerik, J., Solari, A., and Goeman, J. J. (2019). Permutation-based simultaneous confidence bounds for the false discovery proportion., Biometrika, 106(3):635-649. · Zbl 1464.62276
[29] Heyse, J. F. (2011). A false discovery rate procedure for categorical data. In, Recent Advances in biostatistics: False Discovery Rates, Survival Analysis, and Related Topics, pages 43-58.
[30] Holm, S. (1979). A simple sequentially rejective multiple test procedure., Scand. J. Statist., 6(2):65-70. · Zbl 0402.62058
[31] Hu, J. X., Zhao, H., and Zhou, H. H. (2010). False discovery rate control with groups., J. Amer. Statist. Assoc., 105(491):1215-1227. · Zbl 1390.62143 · doi:10.1198/jasa.2010.tm09329
[32] Ignatiadis, N., Klaus, B., Zaugg, J., and Huber, W. (2016). Data-driven hypothesis weighting increases detection power in genome-scale multiple testing., Nature Methods, 13:577-580.
[33] Junge, F. (2020)., PoissonBinomial: Efficient Computation of Ordinary and Generalized Poisson Binomial Distributions. R package version 1.1.1.
[34] Junge, F. and Döhler, S. (2020)., FDX: FDX Controlling Multiple Testing Procedures for Heterogeneous and Discrete Tests. R package version 0.1-3.
[35] Katsevich, E. and Ramdas, A. (2020). Simultaneous high-probability bounds on the false discovery proportion in structured, regression, and online settings., Ann. Statist. to appear. · Zbl 1460.62118
[36] Korn, E. L., Troendle, J. F., McShane, L. M., and Simon, R. (2004). Controlling the number of false discoveries: application to high-dimensional genomic data., J. Statist. Plann. Inference, 124(2):379-398. · Zbl 1074.62070 · doi:10.1016/S0378-3758(03)00211-8
[37] Lehmann, E. L. and Romano, J. P. (2005). Generalizations of the familywise error rate., Ann. Statist., 33:1138-1154. · Zbl 1072.62060 · doi:10.1214/009053605000000084
[38] Neuvial, P. (2008). Asymptotic properties of false discovery rate controlling procedures under independence., Electron. J. Stat., 2:1065-1110. · Zbl 1320.62181 · doi:10.1214/08-EJS207
[39] Perone Pacifico, M., Genovese, C., Verdinelli, I., and Wasserman, L. (2004). False discovery control for random fields., J. Amer. Statist. Assoc., 99(468):1002-1014. · Zbl 1055.62105 · doi:10.1198/0162145000001655
[40] Ramdas, A., Foygel Barber, R., Wainwright, M. J., and Jordan, M. I. (2019). A unified treatment of multiple testing with prior knowledge using the p-filter., Ann. Statist., 47(5):2790-2821. · Zbl 1433.62204 · doi:10.1214/18-AOS1765
[41] Romano, J. P. and Wolf, M. (2007). Control of generalized error rates in multiple testing., Ann. Statist., 35(4):1378-1408. · Zbl 1127.62063 · doi:10.1214/009053606000001622
[42] Roquain, E. (2011). Type I error rate control for testing many hypotheses: a survey with proofs., J. Soc. Fr. Stat., 152(2):3-38. · Zbl 1316.62115
[43] Roquain, E. and van de Wiel, M. (2009). Optimal weighting for false discovery rate control., Electron. J. Stat., 3:678-711. · Zbl 1326.62164 · doi:10.1214/09-EJS430
[44] Roquain, E. and Villers, F. (2011). Exact calculations for false discovery proportion with application to least favorable configurations., Ann. Statist., 39(1):584-612. · Zbl 1209.62164 · doi:10.1214/10-AOS847
[45] Rubin, D., Dudoit, S., and van der Laan, M. (2006). A method to increase the power of multiple testing procedures through sample splitting., Stat. Appl. Genet. Mol. Biol., 5:Art. 19, 20 pp. (electronic). · Zbl 1166.62318 · doi:10.2202/1544-6115.1148
[46] Shaked, M. and Shanthikumar, J.G. (2007)., Stochastic Orders. Springer Series in Statistics. Springer New York. · Zbl 1111.62016
[47] Tan, X., Liu, G. F., Zeng, D., Wang, W., Diao, G., Heyse, J. F., and Ibrahim, J. G. (2019). Controlling false discovery proportion in identification of drug-related adverse events from multiple system organ classes., Statistics in Medicine, 38(22):4378-4389.
[48] Tarone, R. E. (1990). A modified bonferroni method for discrete data., Biometrics, 46(2):515-522. · Zbl 0715.62140 · doi:10.2307/2531456
[49] Wasserman, L. and Roeder, K. (2006). Weighted hypothesis testing. Technical report, Dept. of statistics, Carnegie Mellon, University. · Zbl 1329.62435 · doi:10.1214/09-STS289
[50] Westfall, P. and Wolfinger, R. (1997). Multiple tests with discrete distributions., The American Statistician, 51(1):3-8.
[51] Zhao, H. · Zbl 1288.62111 · doi:10.1016/j.jspi.2014.04.004
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.