zbMATH — the first resource for mathematics

Single-index modulated multiple testing. (English) Zbl 1297.62217
Summary: In the context of large-scale multiple testing, hypotheses are often accompanied with certain prior information. In this paper, we present a single-index modulated (SIM) multiple testing procedure, which maintains control of the false discovery rate while incorporating prior information, by assuming the availability of a bivariate \(p\)-value, \((p_{1},p_{2})\), for each hypothesis, where \(p_{1}\) is a preliminary \(p\)-value from prior information and \(p_{2}\) is the primary \(p\)-value for the ultimate analysis. To find the optimal rejection region for the bivariate \(p\)-value, we propose a criteria based on the ratio of probability density functions of \((p_{1},p_{2})\) under the true null and nonnull. This criteria in the bivariate normal setting further motivates us to project the bivariate \(p\)-value to a single-index, \(p(\theta)\), for a wide range of directions \(\theta\). The true null distribution of \(p(\theta)\) is estimated via parametric and nonparametric approaches, leading to two procedures for estimating and controlling the false discovery rate. To derive the optimal projection direction \(\theta\), we propose a new approach based on power comparison, which is further shown to be consistent under some mild conditions. Simulation evaluations indicate that the SIM multiple testing procedure improves the detection power significantly while controlling the false discovery rate. Analysis of a real dataset will be illustrated.

62P10 Applications of statistics to biology and medical sciences; meta analysis
62G10 Nonparametric hypothesis testing
62H15 Hypothesis testing in multivariate analysis
DR-Integrator; DAVID
Full Text: DOI Euclid arXiv
[1] Bauer, S., Gagneur, J. and Robinson, P. N. (2010). Going Bayesian: Model-based gene set analysis of genome-scale data. Nucleic Acids Res. 38 3523-3532.
[2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 289-300. · Zbl 0809.62014
[3] Benjamini, Y. and Hochberg, Y. (1997). Multiple hypotheses testing with weights. Scand. J. Stat. 24 407-418. · Zbl 1090.62548
[4] Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. J. Educ. Behav. Stat. 25 60-83.
[5] Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93 491-507. · Zbl 1108.62069
[6] Bourgon, R., Gentleman, R. and Huber, W. (2010). Independent filtering increases detection power for high-throughput experiments. Proc. Natl. Acad. Sci. USA 107 9546-9551.
[7] Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models : A Modern Perspective , 2nd ed. Chapman & Hall/CRC, Boca Raton, FL. · Zbl 1119.62063
[8] Chi, Z. (2008). False discovery rate control with multivariate \(p\)-values. Electron. J. Stat. 2 368-411. · Zbl 1320.62100
[9] Durrett, R. (2010). Probability : Theory and Examples , 4th ed. Cambridge Univ. Press, Cambridge. · Zbl 1202.60001
[10] Efron, B. (2007). Size, power and false discovery rates. Ann. Statist. 35 1351-1377. · Zbl 1123.62008
[11] Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23 70-86.
[12] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151-1160. · Zbl 1073.62511
[13] Fan, J., Han, X. and Gu, W. (2012). Estimating false discovery proportion under arbitrary covariance dependence. J. Amer. Statist. Assoc. 107 1019-1035. · Zbl 1395.62219
[14] Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 499-517. · Zbl 1090.62072
[15] Genovese, C. R., Roeder, K. and Wasserman, L. (2006). False discovery control with \(p\)-value weighting. Biometrika 93 509-524. · Zbl 1108.62070
[16] Hackstadt, A. J. and Hess, A. M. (2009). Filtering for increased power for microarray data analysis. BMC Bioinformatics 10 11.
[17] Hochberg, Y. and Benjamini, Y. (1990). More powerful procedures for multiple significance testing. Stat. Med. 9 811-818.
[18] Hu, J. X., Zhao, H. and Zhou, H. H. (2010). False discovery rate control with groups. J. Amer. Statist. Assoc. 105 1215-1227. · Zbl 1390.62143
[19] Huang, D., Sherman, B. T. and Lempicki, R. A. (2008). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4 44-57.
[20] Kim, J. H., Dhanasekaran, S. M., Mehra, R., Tomlins, S. A., Gu, W., Yu, J., Kumar-Sinha, C., Cao, X., Dash, A., Wang, L., Ghosh, D., Shedden, K., Montie, J. E., Rubin, M. A., Pienta, K. J., Shah, R. B. and Chinnaiyan, A. M. (2007). Integrative analysis of genomic aberrations associated with prostate cancer progression. Cancer Res. 67 8229-8239.
[21] Lahti, L., Schäfer, M., Klein, H.-U., Bicciato, S. and Dugas, M. (2013). Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: A comparative review. Brief. Bioinform. 14 27-35.
[22] Lapointe, J., Li, C., Higgins, J. P., van de Rijn, M., Bair, E., Montgomery, K., Ferrari, M., Egevad, L., Rayford, W., Bergerheim, U. et al. (2004). Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc. Natl. Acad. Sci. USA 101 811-816.
[23] Liang, K. and Nettleton, D. (2012). Adaptive and dynamic adaptive procedures for false discovery rate control and estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 163-182.
[24] Lusa, L., Korn, E. L. and McShane, L. M. (2008). A class of comparison method with filtering-enhanced variable selection for high-dimensional data sets. Stat. Med. 27 5834-5849.
[25] McClintick, J. N. and Edenberg, H. J. (2006). Effects of filtering by present call on analysis of microarray experiments. BMC Bioinformatics 7 49.
[26] Roeder, K., Bacanu, S.-A., Wasserman, L. and Devlin, B. (2006). Using linkage genome scans to improve power of association in genome scans. Am. J. Hum. Genet. 78 243-252.
[27] Roeder, K. and Wasserman, L. (2009). Genome-wide significance levels and weighted hypothesis testing. Statist. Sci. 24 398-413. · Zbl 1329.62435
[28] Salari, K., Tibshirani, R. and Pollack, J. R. (2010). DR-Integrator: A new analytic tool for integrating DNA copy number and gene expression data. Bioinformatics 26 414-416.
[29] Schweder, T. and Spjøtvoll, E. (1982). Plots of \(P\)-values to evaluate many tests simultaneously. Biometrika 69 493-502.
[30] Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479-498. · Zbl 1090.62073
[31] Storey, J. D. (2007). The optimal discovery procedure: A new approach to simultaneous significance testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 347-368.
[32] Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187-205. · Zbl 1061.62110
[33] Talloen, W., Clevert, D. A., Hochreiter, S., Amaratunga, D., Bijnens, L., Kass, S. and Göhlmann, H. W. H. (2007). I/NI-calls for the exclusion of noninformative genes: A highly effective filtering tool for microarray data. Bioinformatics 23 2897-2902.
[34] Tritchler, D., Parkhomenko, E. and Beyene, J. (2009). Filtering genes for cluster and network analysis. BMC Bioinformatics 10 193. · Zbl 1276.92071
[35] Wang, Z., He, Q., Larget, B. and Newton, M. A. (2013). A multi-functional analyzer uses parameter constraints to improve the efficiency of model-based gene-set analysis. Preprint. Available at . · Zbl 1454.62417
[36] Zhang, C., Fan, J. and Yu, T. (2011). Multiple testing via \(\mathrm{FDR}_{L}\) for large-scale imaging data. Ann. Statist. 39 613-642. · Zbl 1209.62166
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.