×

A Bayesian graphical model for genome-wide association studies (GWAS). (English) Zbl 1400.62251

Summary: The analysis of GWAS data has long been restricted to simple models that cannot fully capture the genetic architecture of complex human diseases. As a shift from standard approaches, we propose here a general statistical framework for multi-SNP analysis of GWAS data based on a Bayesian graphical model. Our goal is to develop a general approach applicable to a wide range of genetic association problems, including GWAS and fine-mapping studies, and, more specifically, be able to: (1) Assess the joint effect of multiple SNPs that can be linked or unlinked and interact or not; (2) Explore the multi-SNP model space efficiently using the Mode Oriented Stochastic Search (MOSS) algorithm and determine the best models. We illustrate our new methodology with an application to the CGEM breast cancer GWAS data. Our algorithm selected several SNPs embedded in multi-locus models with high posterior probabilities. Most of the SNPs selected have a biological relevance. Interestingly, several of them have never been detected in standard single-SNP analyses. Finally, our approach has been implemented in the open source \(R\) package genMOSS.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
62H17 Contingency tables
92D20 Protein sequences, DNA sequences
PDFBibTeX XMLCite
Full Text: DOI Euclid

References:

[1] Anglian Breast Cancer Study Group (2000). Prevalence and penetrance of BRCA1 and BRCA2 in a population based series of breast cancer cases. The British Journal of Cancer 83 1301-1308.
[2] Barrett, J. C., Fry, B., Maller, J. and Daly, M. J. (2005). Haploview: Analysis and visualization of LD and haplotype maps. Bioinformatics 15 263-265.
[3] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014
[4] Briollais L., Dobra, A., Liu, J., Friedlander, M., Ozcelik, H. and Massam, H. (2016a). Supplement to “A Bayesian graphical model for genome-wide association studies (GWAS).” .
[5] Briollais L., Dobra, A., Liu, J., Friedlander, M., Ozcelik, H. and Massam, H. (2016b). Supplement to “A Bayesian graphical model for genome-wide association studies (GWAS).” .
[6] Briollais L., Dobra, A., Liu, J., Friedlander, M., Ozcelik, H. and Massam, H. (2016c). Supplement to “A Bayesian graphical model for genome-wide association studies (GWAS).” .
[7] Briollais L., Dobra, A., Liu, J., Friedlander, M., Ozcelik, H. and Massam, H. (2016d). Supplement to “A Bayesian graphical model for genome-wide association studies (GWAS).” .
[8] Briollais L., Dobra, A., Liu, J., Friedlander, M., Ozcelik, H. and Massam, H. (2016e). Supplement to “A Bayesian graphical model for genome-wide association studies (GWAS).” .
[9] Collaborative Group on Hormonal Factors in Breast Cancer (2002). Breast cancer and breastfeeding: Collaborative reanalysis of individual data from 47 epidemiological studies in 30 countries, including 50302 women with breast cancer and 96973 women without the disease. Lancet 360 187-195.
[10] The Breast Cancer Linkage Consortium (1999). Cancer risks in BRCA2 mutation carriers. J. Natl. Cancer Inst. 91 1310-1316.
[11] Dellaportas, P. and Forster, J. J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika 86 615-633. · Zbl 0949.62050 · doi:10.1093/biomet/86.3.615
[12] Devlin, B. and Risch, N. (1995). A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29 311-322.
[13] Diamandis, E. P. and Youssef, G. M. (2002). Human tissue kallikreins: A family of new cancer biomarkers. Clinical Chemistry 48 1196-1205.
[14] Dobra, A. and Massam, H. (2010). The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors. Stat. Methodol. 7 240-253. · Zbl 1291.62066 · doi:10.1016/j.stamet.2009.04.002
[15] Edwards, D. and Havránek, T. (1985). A fast procedure for model search in multidimensional contingency tables. Biometrika 72 339-351. · Zbl 0576.62067 · doi:10.1093/biomet/72.2.339
[16] Gail, M. H. (2008). Discriminatory accuracy from single-nucleotide polymorphisms in models to predict breast cancer risk. J. Natl. Cancer Inst. 100 1037-1041.
[17] Guan, Y. and Stephens, M. (2011). Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5 1780-1815. · Zbl 1229.62145 · doi:10.1214/11-AOAS455
[18] Han, B., Park, M. and Chen, X. W. (2010). A Markov blanket-based method for detecting causal SNPs in GWAS. BMC Bioinformatics 11 Suppl. 3 S5.
[19] He, Q. and Lin, D.-Y. (2011). A variable selection method for genome-wide association studies. Bioinformatics 27 1-8.
[20] Hindorff, L. A., Sethupathy, P., Junkins, H. A., Ramos, E. M., Mehta, J. P., Collins, F. S. and Manolio, T. A. (2009a). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106 9362-9367.
[21] Hindorff, L. A., Junkins, H. A., Hall, P. N., Mehta, J. P. and Manolio, T. A. (2009b). A catalog of published genome-wide association studies. preprint. Available at .
[22] Hirschhorn, J. N. and Daly, M. J. (2005). Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6 95-108.
[23] Hoggart, C. J., Whittaker, J. C., De Iorio, M. D. and Balding, D. J. (2008). Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4 e1000130.
[24] Hunter, D. J., Kraft, P., Jacobs, K. B., Cox, D. G., Yeager, M., Hankinson, S. E., Wacholder, S., Wang, Z., Welch, R., Hutchinson, A., Wang, J., Yu, K., Chatterjee, N., Orr, N., Willett, W. C., Colditz, G. A., Ziegler, R. G., Berg, C. D., Buys, S. S., McCarty, C. A., Feigelson, H. S., Calle, E. E., Thun, M. J., Hayes, R. B., Tucker, M., Gerhard, D. S., Joseph, F. F., Jr., Hoover, R. N., Thomas, G. and Chanock, S. J. (2007). A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat. Genet. 39 870-874.
[25] Jiang, X., Barmada, M. M. and Visweswaran, S. (2010). Identifying genetic interactions in genome-wide data using Bayesian networks. Genet. Epidemiol. 34 575-581.
[26] Kingsmore, S. F., Linquist, I. E., Mudge, J., Gessler, D. D. and Beavis, W. D. (2008). Genome-wide association studies: Progress and potential for drug discovery and development. Nature Reviews 7 221-230.
[27] Kruglyak, L. (2008). The road to genome-wide association studies. Nature Genetics 9 314-318.
[28] Letac, G. and Massam, H. (2012). Bayes factors and the geometry of discrete hierarchical loglinear models. Ann. Statist. 40 861-890. · Zbl 1274.62391 · doi:10.1214/12-AOS974
[29] Li, Y., Willer, C. J., Ding, J., Scheet, P. and Abecasis, G. R. (2010). MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34 816-834.
[30] Madigan, D. and Raftery, A. E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Amer. Statist. Assoc. 89 1535-1546. · Zbl 0814.62030 · doi:10.2307/2291017
[31] Marchini, J., Donnelly, P. and Cardon, L. R. (2005). Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37 413-417.
[32] Massam, H., Liu, J. and Dobra, A. (2009). A conjugate prior for discrete hierarchical log-linear models. Ann. Statist. 37 3431-3467. · Zbl 1369.62048 · doi:10.1214/08-AOS669
[33] McCarthy, M. I. and Hirschhorn, J. N. (2008). Genome-wide association studies: Potential next steps on a genetic journey. Hum. Mol. Genet. 17 R156-R165.
[34] Peto, J. and Mack, T. M. (2000). High constant incidence in twins and other relatives of women with breast cancer. Nat. Genet. 26 411-414.
[35] Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A. and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38 904-909.
[36] Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D., Maller, J., Sklar, P., de Bakker, P. I. W., Daly, M. J. and Sham, P. C. (2007). PLINK: A toolset for whole-genome association and population-based linkage analysis. Am. J. Hum. Genet. 81 559-575.
[37] Risch, N. J. (2000). Searching for genetic determinants in the new millennium. Nature 405 847-856.
[38] Risch, N. and Merikangas, K. (1996). The future of genetic studies of complex human diseases. Science 273 1516-1517.
[39] Schwartz, D. F., Ziegler, A. and Konig, I. R. (2008). Beyond the results of genome-wide association studies. Genet. Epidemiol. 32 671.
[40] Thomas, A. and Camp, N. J. (2004). Graphical modelling of the joint distribution of alleles at associated loci. Am. J. Hum. Genet. 74 1088-1101.
[41] Thompson, D. and Easton, D. F. (2004). The genetic epidemiology of breast cancer genes. J. Mammary Gland Biol. Neoplasia 9 221-236.
[42] Thompson, D., Easton, D. F. and Breast Cancer Linkage Consortium (2002). Cancer incidence in BRCA1 mutation carriers. J. Natl. Cancer Inst. 94 1358-1365.
[43] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[44] Ungvari, I., Hullam, G., Antal, P., Kiszel, P. S., Gezsi, A., Hadadi, É., Virág, V., Hajós, G., Millinghoffer, A., Nagy, A., Kiss, A., Semsei, Á. F., Temesi, G., Melegh, B., Kisfali, P., Széll, M., Bikov, A., Gálffy, G., Tamási, L., Falus, A. and Szalai, C. (2012). Evaluation of a partial genome screening of two asthma susceptibility regions using Bayesian network based Bayesian multilevel analysis of relevance. PLoS One 7 e33573.
[45] Verzilli, C. J., Stallard, N. and Whittaker, J. C. (2006). Bayesian graphical models for genomewide association studies. Am. J. Hum. Genet. 79 100-112.
[46] Wacholder, S., Hartge, P., Prentice, R., Garcia-Closas, M., Feigelson, H. S., Diver, W. R., Thun, M. J., Cox, D. G., Hankinson, S. E., Kraft, P., Rosner, B., Berg, C. D., Brinton, L. A., Lissowska, J., Sherman, M. E., Chlebowski, R., Kooperberg, C., Jackson, R. D., Buckman, D. W., Hui, P., Pfeiffer, R., Jacobs, K. B., Thomas, G. D., Hoover, R. N., Gail, M. H., Chanock, S. J. and Hunter, D. J. (2010). Performance of common genetic variants in breast-cancer risk models. N. Engl. J. Med. 362 986-993.
[47] Wilson, M. A., Iversen, E. S., Clyde, M. A., Schmidler, S. C. and Schildkraut, J. M. (2010). Bayesian model search and multilevel inference for SNP association studies. Ann. Appl. Stat. 4 1342-1364. · Zbl 1202.62166 · doi:10.1214/09-AOAS322
[48] Wu, Z. and Zhao, H. (2009). Statistical power of model selection strategies for genome-wide association studies. PLoS Genet. 5 e1000582.
[49] Xing, H., McDonagh, P. D., Bienkowska, J., Cashorali, T., Runge, K., Miller, R. E., Decaprio, D., Church, B., Roubenoff, R., Khalil, I. G. and Carulli, J. (2011). Causal modeling using network ensemble simulations of genetic and gene expression data predicts genes involved in rheumatoid arthritis. PLoS Comput. Biol. 7 e1001105.
[50] Yeung, K. Y., Bumgarner, R. E. and Raftery, A. E. (2005). Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21 2394-2402.
[51] Zhang, Y. (2012). A novel Bayesian graphical model for genome-wide multi-SNP association mapping. Genet. Epidemiol. 36 36-47.
[52] Zhang, Y. and Liu, J. S. (2007). Bayesian inference of epistatic interactions in case-control studies. Nat. Genet. 39 1167-1173.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.