Zhou, Xi Kathy; Liu, Fei; Dannenberg, Andrew J. A Bayesian model averaging approach for observational gene expression studies. (English) Zbl 1243.62139 Ann. Appl. Stat. 6, No. 2, 497-520 (2012). Summary: Identifying differentially expressed (DE) genes associated with a sample characteristic is the primary objective of many microarray studies. As more and more studies are carried out with observational rather than well controlled experimental samples, it becomes important to evaluate and properly control the impact of sample heterogeneity on DE gene findings. Typical methods for identifying DE genes require ranking all the genes according to a preselected statistic based on a single model for two or more group comparisons, with or without adjustment for other covariates. Such single model approaches unavoidably result in model misspecification, which can lead to increased errors due to bias for some genes and reduced efficiency for the others. We evaluated the impact of model misspecification from such approaches on detecting DE genes and identified parameters that affect the magnitude of impact. To properly control for sample heterogeneity and to provide a flexible and coherent framework for identifying simultaneously DE genes associated with a single or multiple sample characteristics and/or their interactions, we proposed a Bayesian model averaging approach which corrects the model misspecification by averaging over model space formed by all relevant covariates. An empirical approach is suggested for specifying prior model probabilities. We demonstrated through simulated microarray data that this approach resulted in improved performance in DE gene identification compared to the single model approaches. The flexibility of this approach is demonstrated through our analysis of data from two observational microarray studies. Cited in 2 Documents MSC: 62P10 Applications of statistics to biology and medical sciences; meta analysis 92C40 Biochemistry, molecular biology 92D10 Genetics and epigenetics 65C60 Computational problems in statistics (MSC2010) 62A09 Graphical methods in statistics Keywords:differential gene expression; microarrays; observational study Software:GSA; GlobalAncova × Cite Format Result Cite Review PDF Full Text: DOI arXiv Euclid References: [1] Boyle, J. O., Gumus, Z. H., Kacker, A., Choksi, V. L., Jennifer, M. B., Zhou, X. K., Ante’s, R. K., Hughes, D., Du, B., Judson, B. L., Subbaramaiah, K. and Dannenberg, A. J. (2010). Effects of cigarette smoke on the human oral mucosal transcritpome. Cancer Prevention Reseach 3 266-278. [2] Cao, J. and Zhang, S. (2010). Measuring statistical significance for full Bayesian methods in microarray analyses. Bayesian Anal. 5 413-427. · Zbl 1330.62112 · doi:10.1214/10-BA608 [3] Cao, J., Xie, X.-J., Zhang, S., Whitehurst, A. and White, M. A. (2009). Bayesian optimal discovery procedure for simultaneous significance testing. BMC Bioinformatics 10 5. [4] Carolan, B. J., Harvey, B. G., De Bishnu, P., Vanni, H. and Crystal, R. G. (2008). Decreased expression of Intelectin 1 in the human airway epithelium of smokers compared to nonsmokers. Journal of Immunology 181 5760-5767. [5] Conlon, E. M., Song, J. J. and Liu, J. S. (2006). Bayesian models for pooling microarray studies with multiple sources of replications. BMC Bioinformatics 7 247. [6] Delongchamp, R. R., Velasco, C., Dial, S. and Harris, A. J. (2005). Genome-wide estimation of gender differences in the gene expression of human livers: Statistical design and analysis. BMC Bioinformatics 6 Suppl 2 S13. [7] Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 1-22. · Zbl 1327.62046 · doi:10.1214/07-STS236 [8] Efron, B. (2010). Sets of cases (Enrichment). In Large-Scale Inference : Empirical Bayes Methods for Estimation , Testing , and Prediction. Institute of Mathematical Statistics Monographs 1 163-184. Cambridge Univ. Press, Cambridge. [9] Efron, B. and Tibshirani, R. (2007). On testing the significance of sets of genes. Ann. Appl. Stat. 1 107-129. · Zbl 1129.62102 · doi:10.1214/07-AOAS101 [10] Gottardo, R. and Raftery, A. (2009). Bayesian robust transformation and variable selection: A unified approach. Canad. J. Statist. 37 361-380. · Zbl 1177.62034 · doi:10.1002/cjs.10021 [11] Heller, R., Manduchi, E. and Small, D. S. (2009). Matching methods for observational microarray studies. Bioinformatics 25 904-909. [12] Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statist. Sci. 14 382-417. · Zbl 1059.62525 · doi:10.1214/ss/1009212519 [13] Hummel, M., Meister, R. and Mansmann, U. (2008). GlobalANCOVA: Exploration and assessment of gene group effects. Bioinformatics 24 78-85. [14] Jeffery, I. B., Higgins, D. G. and Culhane, A. C. (2006). Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics 7 359. [15] Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc. 90 773-795. · Zbl 0846.62028 · doi:10.2307/2291091 [16] Leek, J. T. and Storey, J. D. (2007). Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3 1724-1735. [17] Lewohl, J. M., Dodd, P. R., Mayfield, R. D. and Harris, R. A. (2001). Application of DNA microarrays to study human alcoholism. J. Biomed. Sci. 8 28-36. [18] Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008). Mixtures of \(g\) priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410-423. · Zbl 1335.62026 · doi:10.1198/016214507000001337 [19] Müller, P., Parmigiani, G. and Rice, K. (2007). FDR and Bayesian multiple comparisons rules. In Bayesian Statistics 8 (J. M. Bernardo, M. Bayarri, J. Berger, et al., eds.). 349-370. Oxford Univ. Press, Oxford. · Zbl 1252.62025 [20] Newton, M. A., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5 155-176. · Zbl 1096.62124 · doi:10.1093/biostatistics/5.2.155 [21] Potter, J. D. (2003). Epidemiology, cancer genetics and microarrays: Making correct inferences, using appropriate designs. Trends Genet. 19 690-695. [22] Rao, P. (1971). Some notes on misspecification in multiple regressions. Amer. Statist. 25 37-39. [23] Rao, P. (1973). Some notes on the errors-in-variables model. Amer. Statist. 27 217-218. [24] Rosenberg, S. H. and Levy, P. S. (1972). A characterization on misspecification in the general linear regression model. Biometrics 28 1129-1133. · doi:10.2307/2528646 [25] Sartor, M. A., Tomlinson, C. R., Wesselkamper, S. C., Sivaganesan, S., Leikauf, G. D. and Medvedovic, M. (2006). Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC Bioinformatics 7 538. [26] Scheid, S. and Spang, R. (2007). Compensating for unknown confounders in microarray data analysis using filtered permutations. J. Comput. Biol. 14 669-681. [27] Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Statist. 38 2587-2619. · Zbl 1200.62020 · doi:10.1214/10-AOS792 [28] Sebastiani, P., Xie, H. and Ramoni, M. F. (2006). Bayesian analysis of comparative microarray experiments by model averaging. Bayesian Anal. 1 707-732. · Zbl 1331.62037 · doi:10.1214/06-BA123 [29] Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3 Art. 3, 29 pp. (electronic). · Zbl 1038.62110 · doi:10.2202/1544-6115.1027 [30] Spira, A., Beane, J., Shah, V., Liu, G., Schembri, F., Yang, X., Palma, J. and Brody, J. S. (2004). Effects of cigarette smoke on the human airway epithelial cell transcriptome. Proc. Natl. Acad. Sci. USA 101 10143-10148. [31] Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479-498. · Zbl 1090.62073 · doi:10.1111/1467-9868.00346 [32] Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100 9440-9445 (electronic). · Zbl 1130.62385 · doi:10.1073/pnas.1530509100 [33] Tan, Q., Zhao, J., Li, S., Christiansen, L., Kruse, T. A. and Christensen, K. (2008). Differential and correlation analyses of microarray gene expression data in the CEPH Utah families. Genomics 92 94-100. [34] Troester, M. A., Millikan, R. C. and Perou, C. M. (2009). Microarrays and epidemiology: Ensuring the impact and accessibility of research findings. Cancer Epidemiology , Biomarkers & Prevention 18 1-4. · doi:10.1177/0962280209352042 [35] Webb, P. M., Merritt, M. A., Boyle, G. M. and Green, A. C. (2007). Microarrays and epidemiology: Not the beginning of the end but the end of the beginning. Cancer Epidemiology , Biomarkers & Prevention 16 637-638. [36] Wu, X. L., Gianola, D., Rosa, G. J. M. and Weigel, K. A. (2010). Bayesian model averaging for evaluation of candidate gene effects. Genetica 138 395-407. [37] Xu, L., Craiu, R. V. and Sun, L. (2011). Bayesian methods to overcome the winner’s curse in genetic studies. Ann. Appl. Stat. 5 201-231. · Zbl 1220.62027 · doi:10.1214/10-AOAS373 [38] Yang, X., Schadt, E. E., Wang, S., Wang, H., Arnold, A. P., Ingram-Drake, L., Drake, T. A. and Lusis, A. J. (2006). Tissue-specific expression and regulation of sexually dimorphic genes in mice. Genome Res. 16 995-1004. [39] Yeung, K. Y., Bumgarner, R. E. and Raftery, A. E. (2005). Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21 2394-2402. [40] Zellner, A. and Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In Bayesian Statistics : Proceedings of the First International Meeting Held in Valencia ( Spain ) (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.) 585-603. Valencia Univ. Press, Valencia. · Zbl 0457.62004 [41] Zhou, X. K., Liu, F. and Dannenberg, A. J. (2012). Supplement to “A Bayesian model averaging approach for observational gene expression studies.” . · Zbl 1243.62139 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.