×

A framework for Monte Carlo based multiple testing. (English) Zbl 1373.62389

Summary: We are concerned with a situation in which we would like to test multiple hypotheses with tests whose \(p\)-values cannot be computed explicitly but can be approximated using Monte Carlo simulation. This scenario occurs widely in practice. We are interested in obtaining the same rejections and non-rejections as the ones obtained if the \(p\)-values for all hypotheses had been available. The present article introduces a framework for this scenario by providing a generic algorithm for a general multiple testing procedure. We establish conditions that guarantee that the rejections and non-rejections obtained through Monte Carlo simulations are identical to the ones obtained with the \(p\)-values. Our framework is applicable to a general class of step-up and step-down procedures, which includes many established multiple testing corrections such as the ones of Bonferroni, Holm, Sidak, Hochberg or Benjamini-Hochberg. Moreover, we show how to use our framework to improve algorithms available in the literature in such a way as to yield theoretical guarantees on their results. These modifications can easily be implemented in practice and lead to a particular way of reporting multiple testing results as three sets together with an error bound on their correctness, demonstrated exemplarily using a real biological dataset.

MSC:

62J15 Paired and multiple comparisons; multiple testing
65C05 Monte Carlo methods
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Benjamini, Controlling the false discovery rate: A practical and powerful approach to multiple testing, J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 (1) pp 289– (1995) · Zbl 0809.62014
[2] Benjamini, The control of the false discovery rate in multiple testing under dependency, Ann. Statist. 29 (4) pp 1165– (2001) · Zbl 1041.62061
[3] Besag, Sequential Monte Carlo p-values, Biometrika 78 (2) pp 301– (1991)
[4] Bonferroni, Teoria statistica delle classi e calcolo delle probabilit√†, Pubbl. d. R. Ist. Super. di Sci. Econom. e Commerciali di Firenze 8 pp 3– (1936) · Zbl 0016.41103
[5] Chen, Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis, Biostat. 14 (2) pp 244– (2013)
[6] Cheng, Internal validation inferences of significant genomic features in genome-wide screening, Comput. Statist. Data Anal. 53 pp 788– (2009) · Zbl 1452.62800
[7] Cohen, Uncovering the co-evolutionary network among prokaryotic genes, Bioinform. 28 (ECCB) pp i389– (2012)
[8] Finner, Controlling the familywise error rate with plug-in estimator for the proportion of true null hypotheses, J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 (5) pp 1031– (2009)
[9] Friguet, Estimation of the proportion of true null hypotheses in high-dimensional data under dependence, Comput. Statist. Data Anal. 55 (9) pp 2665– (2011) · Zbl 1464.62072
[10] Gandy, MMCTest - A safe algorithm for implementing multiple Monte Carlo tests, Scand. J. Stat. 41 (4) pp 1083– (2014) · Zbl 1305.62270
[11] Guo, Adaptive choice of the number of bootstrap samples in large scale multiple testing, Stat. Appl. Genet. Mol. Biol. 7 (1) pp 1– (2008) · Zbl 1276.62072
[12] Gusenleitner, iBBiG: Iterative binary bi-clustering of gene sets, Bioinform. 28 (19) pp 2484– (2012)
[13] Han, A Bernstein-type estimator for decreasing density with application to p-value adjustments, Comput. Statist. Data Anal. 56 pp 427– (2012) · Zbl 1239.62036
[14] Hochberg, A sharper Bonferroni procedure for multiple tests of significance, Biometrika 75 (4) pp 800– (1988) · Zbl 0661.62067
[15] Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc. 58 (301) pp 13– (1963) · Zbl 0127.10602
[16] Holm, A simple sequentially rejective multiple test procedure, Scand. J. Stat. 6 (2) pp 65– (1979) · Zbl 0402.62058
[17] Hommel, A stagewise rejective multiple test procedure based on a modified Bonferroni test, Biometrika 75 (2) pp 383– (1988) · Zbl 0639.62025
[18] Jiang, Statistical properties of an early stopping rule for resampling-based multiple testing, Biometrika 99 (4) pp 973– (2012) · Zbl 1452.62557
[19] Jupiter, TreeHugger: A new test for enrichment of gene ontology terms, INFORMS Journal on Computing 22 (2) pp 210– (2010) · Zbl 1243.62133
[20] Knijnenburg, Fewer permutations, more accurate P-values, Bioinform. 25 (12) pp i161– (2009) · Zbl 05744024
[21] Lai, On confidence sequences, Ann. Statist. 4 (2) pp 265– (1976) · Zbl 0346.62035
[22] Langaas, Estimating the proportion of true null hypotheses, with application to DNA microarray data, J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 (4) pp 555– (2005) · Zbl 1095.62037
[23] Li, BaySTDetect: Detecting unusual temporal patterns in small area data via Bayesian model choice, Biostat. 13 (4) pp 695– (2012)
[24] Lin, An efficient Monte Carlo approach to assessing statistical significance in genomic studies, Bioinform. 21 (6) pp 781– (2005)
[25] Lu, The panorama of physiological responses and gene expression of whole plant of maize inbred line yq7-96 at the three-leaf stage under water deficit and re-watering, Theor. Appl. Genet. 123 pp 943– (2011)
[26] Meinshausen, False discovery control for multiple tests of association under general dependence, Scand. J. Stat. 33 (2) pp 227– (2006) · Zbl 1125.62077
[27] Nusinow, Network-based inference from complex proteomic mixtures using SNIPE, Bioinform. 28 (23) pp 3115– (2012)
[28] Pekowska, A unique h3k4me2 profile marks tissue-specific gene regulation, Genome Research 20 (11) pp 1493– (2010)
[29] Pounds, Robust estimation of the false discovery rate, Bioinform. 22 (16) pp 1979– (2006)
[30] Rahmatallah, Gene set analysis for self-contained tests: Complex null and specific alternative hypotheses, Bioinform. 28 (23) pp 3073– (2012)
[31] Rom, A sequentially rejective test procedure based on a modified Bonferroni inequality, Biometrika 77 (3) pp 663– (1990)
[32] Romano, Stepup procedures for control of generalizations of the familywise error rate, Ann. Statist. 34 (4) pp 1850– (2006) · Zbl 1246.62172
[33] Roth, A multiple comparison procedures for discrete test statistics, J. Statist. Plann. Inference 82 pp 101– (1999) · Zbl 1079.62527
[34] Sandve, Sequential Monte Carlo multiple testing, Bioinform. 27 (23) pp 3235– (2011)
[35] Schweder, Plots of p-values to evaluate many tests simultaneously, Biometrika 69 (3) pp 493– (1982)
[36] Shaffer, Modified sequentially rejective multiple test procedures, J. Amer. Statist. Assoc. 81 (395) pp 826– (1986) · Zbl 0603.62087
[37] Sidak, Rectangular confidence regions for the means of multivariate normal distributions, J. Amer. Statist. Assoc. 62 (318) pp 626– (1967) · Zbl 0158.17705
[38] Simes, An improved Bonferroni procedure for multiple tests of significance, Biometrika 73 (3) pp 751– (1986) · Zbl 0613.62067
[39] Storey, A direct approach to false discovery rates, J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 (3) pp 479– (2002) · Zbl 1090.62073
[40] Tamhane, On weighted Hochberg procedures, Biometrika 95 pp 279– (2008) · Zbl 1437.62623
[41] Wieringen, A test for partial differential expression, J. Amer. Statist. Assoc. 103 (483) pp 1039– (2008) · Zbl 1205.62189
[42] Westfall, Resampling-based multiple testing: Examples and methods for p-value adjustment (1993)
[43] Westfall, Multiple testing with minimal assumptions, Biom J. 50 (5) pp 745– (2008) · Zbl 05361932
[44] Zhou, Empirical pathway analysis, without permutation, Biostat. 14 (3) pp 573– (2013)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.