×

Size, power and false discovery rates. (English) Zbl 1123.62008

Summary: Modern scientific technology has provided a new class of large-scale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science surveys. This paper uses false discovery rate methods to carry out both size and power calculations on large-scale problems. A simple empirical Bayes approach allows the false discovery rate (fdr) analysis to proceed with a minimum of frequentist or Bayesian modeling assumptions. Closed-form accuracy formulas are derived for estimated false discovery rates, and used to compare different methodologies: local or tail-area fdr’s, theoretical, permutation, or empirical null hypothesis estimates. Two microarray data sets as well as simulations are used to evaluate the methodology, the power diagnostics showing why nonnull cases might easily fail to appear on a list of “significant” discoveries.

MSC:

62C12 Empirical decision procedures; empirical Bayes procedures
62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Allison, D., Gadbury, G., Heo, M., Fernández, J., Lee, C.-K., Prolla, T. and Weindruch, R. (2002). A mixture model approach for the analysis of microarray gene expression data. Comput. Statist. Data Anal. 39 1–20. · Zbl 1119.62371 · doi:10.1016/S0167-9473(01)00046-9
[2] Aubert, J., Bar-Hen, A., Daudin, J. and Robin, S. (2004). Determination of the differentially expressed genes in microarray experiments using local FDR. BMC Bioinformatics 5 125.
[3] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300. JSTOR: · Zbl 0809.62014
[4] Broberg, P. (2004). A new estimate of the proportion unchanged genes in a microarray experiment. Genome Biology 5 (5) P10.
[5] Do, K.-A., Müller, P. and Tang, F. (2005). A Bayesian mixture model for differential gene expression. Appl. Statist. 54 627–644. · Zbl 1490.62353 · doi:10.1111/j.1467-9876.2005.05593.x
[6] Dudoit, S., Shaffer, J. and Boldrick, J. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 71–103. · Zbl 1048.62099 · doi:10.1214/ss/1056397487
[7] Dudoit, S., van der Laan, M. and Pollard, K. (2004). Multiple testing. I. Single-step procedures for the control of general type I error rates. Stat. Appl. Genet. Mol. Biol. 3 article 13. Available at www.bepress.com/sagmb/vol3/iss1/art13. · Zbl 1166.62338 · doi:10.2202/1544-6115.1040
[8] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104. · Zbl 1089.62502 · doi:10.1198/016214504000000089
[9] Efron, B. (2005). Local false discovery rates. Available at www-stat.stanford.edu/ brad/papers/False.pdf.
[10] Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93–103. · Zbl 1284.62340 · doi:10.1198/016214506000001211
[11] Efron, B. and Gous, A. (2001). Scales of evidence for model selection: Fisher versus Jeffreys (with discussion). In Model Selection (P. Lahiri, ed.) 208–256. IMS, Beachwood, OH. · doi:10.1214/lnms/1215540972
[12] Efron, B. and Tibshirani, R. (1996). Using specially designed exponential families for density estimation. Ann. Statist. 24 2431–2461. · Zbl 0878.62028 · doi:10.1214/aos/1032181161
[13] Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genetic Epidemiology 23 70–86.
[14] Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160. JSTOR: · Zbl 1073.62511 · doi:10.1198/016214501753382129
[15] Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061. · Zbl 1092.62065 · doi:10.1214/009053604000000283
[16] Gottardo, R., Raftery, A., Yee Yeung, K. and Bumgarner, R. (2006). Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 62 10–18. · Zbl 1099.62128 · doi:10.1111/j.1541-0420.2005.00397.x
[17] Heller, G. and Qing, J. (2003). A mixture model approach for finding informative genes in microarray studies. Unpublished manuscript.
[18] Johnstone, I. and Silverman, B. (2004). Needles and straw in haystacks: Empirical Bayes estimates of sparse sequences. Ann. Statist. 32 1594–1649. · Zbl 1047.62008 · doi:10.1214/009053604000000030
[19] Kendziorski, C., Newton, M., Lan, H. and Gould, M. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22 3899–3914.
[20] Kerr, M., Martin, M. and Churchill, G. (2000). Analysis of variance for gene expression microarray data. J. Comput. Biol. 7 819–837.
[21] Langaas, M., Lindqvist, B. and Ferkingstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 555–572. · Zbl 1095.62037 · doi:10.1111/j.1467-9868.2005.00515.x
[22] Lee, M.-L. T., Kuo, F., Whitmore, G. and Sklar, J. (2000). Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. USA 97 9834–9839. · Zbl 0955.92016 · doi:10.1073/pnas.97.18.9834
[23] Liao, J., Lin, Y., Selvanayagam, Z. and Weichung, J. (2004). A mixture model for estimating the local false discovery rate in DNA microarray analysis. Bioinformatics 20 2694–2701.
[24] Lindsey, J. (1974). Comparison of probability distributions. J. Roy. Statist. Soc. Ser. B 36 38–47. JSTOR: · Zbl 0282.62064
[25] Lindsey, J. (1974). Construction and comparison of statistical models. J. Roy. Statist. Soc. Ser. B 36 418–425. JSTOR: · Zbl 0291.62005
[26] Newton, M., Kendziorski, C., Richmond, C., Blattner, F. and Tsui, K. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8 37–52.
[27] Newton, M., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture model. Biostatistics 5 155–176. · Zbl 1096.62124 · doi:10.1093/biostatistics/5.2.155
[28] Pan, W., Lin, J. and Le, C. (2003). A mixture model approach to detecting differentially expressed genes with microarray data. Functional and Integrative Genomics 3 117–124.
[29] Pawitan, Y., Michiels, S., Koscielny, S., Gusnanto, A. and Ploner, A. (2005). False discovery rate, sensitivity and sample size for microarray studies. Bioinformatics 21 3017–3024.
[30] Pounds, S. and Morris, S. (2003). Estimating the occurrence of false positions and false negatives in microarray studies by approximating and partitioning the empirical distribution of \(p\)-values. Bioinformatics 19 1236–1242.
[31] Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P., Renshaw, A., D’Amico, A., Richie, J., Lander, E., Loda, M., Kantoff, P., Golub, T. and Sellers, R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1 203–209.
[32] Storey, J. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498. JSTOR: · Zbl 1090.62073 · doi:10.1111/1467-9868.00346
[33] Storey, J., Taylor, J. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187–206. · Zbl 1061.62110 · doi:10.1111/j.1467-9868.2004.00439.x
[34] van’t Wout, A., Lehrman, G., Mikheeva, S., O’Keeffe, G. Katze, M., Bumgarner, R., Geiss, G. and Mullins, J. (2003). Cellular gene expression upon human immunodeficiency virus type 1 infection of CD4\(^+\)-T-cell lines. J. Virology 77 1392–1402.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.