×

Distributions associated with simultaneous multiple hypothesis testing. (English) Zbl 1472.62156

Summary: We develop the distribution for the number of hypotheses found to be statistically significant using the rule from R. J. Simes [Biometrika 73, 751–754 (1986; Zbl 0613.62067)] for controlling the family-wise error rate (FWER). We find the distribution of the number of statistically significant \(p\)-values under the null hypothesis and show this follows a normal distribution under the alternative. We propose a parametric distribution \(\Psi_I( \cdot )\) to model the marginal distribution of \(p\)-values sampled from a mixture of null uniform and non-uniform distributions under different alternative hypotheses. The \(\Psi_I\) distribution is useful when there are many different alternative hypotheses and these are not individually well understood. We fit \(\Psi_I\) to data from three cancer studies and use it to illustrate the distribution of the number of notable hypotheses observed in these examples. We model dependence in sampled \(p\)-values using a latent variable. These methods can be combined to illustrate a power analysis in planning a larger study on the basis of a smaller pilot experiment.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62J15 Paired and multiple comparisons; multiple testing
62F03 Parametric hypothesis testing
62H15 Hypothesis testing in multivariate analysis
62H10 Multivariate distribution of statistics

Citations:

Zbl 0613.62067

Software:

Rmpfr; FAMT
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Benjamini, Y., Discovering the false discovery rate, J. R. Stat. Soc. B., 72, 405-16 (2010) · Zbl 1411.62043 · doi:10.1111/j.1467-9868.2010.00746.x
[2] Benjamini, Y.; Hochberg, Y., Controlling the false discovery rate: A practical and powerful approach to multiple testing, J. R. Stat. Soc. B., 57, 289-300 (1995) · Zbl 0809.62014
[3] Benjamini, Y.; Hochberg, Y., On the adaptive control of the false discovery rate in multiple testing with independent statistics, J. Educ. Behav. Stat, 25.1, 60-83 (2000) · doi:10.3102/10769986025001060
[4] Broberg, P., A comparative review of estimates of the proportion unchanged genes and the false discovery rate, BMC Bioinformatics, 6, 199-218 (2005) · doi:10.1186/1471-2105-6-199
[5] Cancer Genome Atlas Research Network, Comprehensive genomic characterization of squamous cell lung cancers, Nature, 489, 519-25 (2012) · doi:10.1038/nature11404
[6] Donoho, D.; Jin, J., Higher criticism for detecting sparse heterogeneous mixtures, Ann. Stat., 32, 962-94 (2004) · Zbl 1092.62051 · doi:10.1214/009053604000000265
[7] Efron, B.; Tibshirani, R.; Storey, J. D.; Tusher, V., Empirical Bayes analysis of a microarray experiment, J. Am. Stat. Assoc., 96, 1151-60 (2001) · Zbl 1073.62511 · doi:10.1198/016214501753382129
[8] Efron, B., Large-scale simultaneous hypothesis testing: The choice of a null hypothesis, J. Am. Stat. Assoc., 99, 96-104 (2004) · Zbl 1089.62502 · doi:10.1198/016214504000000089
[9] Friguet, C.; Kloareg, M.; Causeur, D., A factor model approach to multiple testing under dependence, J. Am. Stat. Assoc., 104, 1406-15 (2009) · Zbl 1205.62071 · doi:10.1198/jasa.2009.tm08332
[10] Genovese, C.; Wasserman, L., A stochastic process approach to false discovery control, Ann. Stat., 32, 1035-61 (2004) · Zbl 1092.62065 · doi:10.1214/009053604000000283
[11] Haynes, B. F.; Gilbert, P. B.; McElrath, M. J.; Zolla-Pazner, S.; Tomaras, G. D.; Alam, S. M., Immune-correlates analysis of an HIV-1 vaccine efficacy trial, N. Engl. J. Med., 366, 1275-1286 (2012) · doi:10.1056/NEJMoa1113425
[12] Hedenfalk, I.; Duggan, D.; Chen, Y., Gene-expression profiles in hereditary breast cancer, N. Engl. J. Med., 344, 539-48 (2001) · doi:10.1056/NEJM200102223440801
[13] Huang, H. -L.; Wu, Y. -C.; Su, L. -J., Discovery of prognostic biomarkers for predicting lung cancer metastasis using microarray and survival data, BMC Bioinformatics, 16, 54 (2015) · doi:10.1186/s12859-015-0463-x
[14] Jin, J.; Cai, T. T., Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons, J. Am. Stat. Assoc., 102, 495-506 (2007) · Zbl 1172.62319 · doi:10.1198/016214507000000167
[15] Jolley, L. B. W., Summation of Series (1961), New York: Dover, New York · Zbl 0101.28602
[16] Kozoil, J. A.; Tuckwell, H. C., A Bayesian method for combining statistical tests, J. Stat. Plan. Infer., 78, 317-23 (1999) · Zbl 0956.62021 · doi:10.1016/S0378-3758(98)00222-5
[17] Langaas, M.; Lindqvist, B. H.; Ferkingstad, E., Estimating the proportion of true null hypotheses, with application to DNA microarray data, J. R. Stat. Soc. B., 67, 555-72 (2005) · Zbl 1095.62037 · doi:10.1111/j.1467-9868.2005.00515.x
[18] Maechler, M.: Rmpfr: R MPFR - Multiple Precision Floating-Point Reliable (2019). R package version 0.7-2. https://CRAN.R-project.org/package=Rmpfr.
[19] Owen, A. B., Variance of the number of false discoveries, J. R. Stat. Soc. Ser. B, 67, 411-26 (2005) · Zbl 1069.62102 · doi:10.1111/j.1467-9868.2005.00509.x
[20] Pounds, S.; Morris, S. W., Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values, Bioinformatics, 19, 1236-42 (2003) · doi:10.1093/bioinformatics/btg148
[21] Ruiz, S. M., An algebraic identity leading to Wilson’s Theorem, Math. Gaz., 80.489, 579-82 (1996) · doi:10.2307/3618534
[22] Simes, R. J., An improved Bonferroni procedure for multiple tests of significance, Biometrika, 73, 3, 751-754 (1986) · Zbl 0613.62067 · doi:10.1093/biomet/73.3.751
[23] Storey, J. D.; Tibshirani, R., Statistical significance for genomewide studies, Proc Natl Acad Sci USA, 100, 9440-5 (2003) · Zbl 1130.62385 · doi:10.1073/pnas.1530509100
[24] Sun, W.; Cai, T. T., Large-scale multiple testing under dependence, J. R. Stat. Soc. Ser. B, 71, 393-424 (2009) · Zbl 1248.62005 · doi:10.1111/j.1467-9868.2008.00694.x
[25] Tang, Y.; Ghosai, S.; Roy, A., Nonparametric Bayesian estimation of positive false discovery rates, Biometrics, 63, 1126-34 (2007) · Zbl 1141.62091 · doi:10.1111/j.1541-0420.2007.00819.x
[26] Tanner, J. C., A derivation of the Borel distribution, Biometrika, 48, 222-4 (1961) · Zbl 0139.35101 · doi:10.1093/biomet/48.1-2.222
[27] Wu, W., On false discovery control under dependence, Ann. Stat., 36, 364-80 (2008) · Zbl 1139.62040 · doi:10.1214/009053607000000730
[28] Yu, C.; Zelterman, D., A parametric model to estimate the proportion from true null using a distribution for p-values, Comput Stat Data Anal., 114, 105-18 (2017) · Zbl 1464.62192 · doi:10.1016/j.csda.2017.04.008
[29] Yu, C.; Zelterman, D., A parametric meta-analysis, Stat. Med., 38, 4013-25 (2019) · doi:10.1002/sim.8278
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.