Multiple comparisons in genetic association studies: a hierarchical modeling approach.

*(English)*Zbl 1296.92178Summary: Multiple comparisons or multiple testing has been viewed as a thorny issue in genetic association studies aiming to detect disease-associated genetic variants from a large number of genotyped variants. We alleviate the problem of multiple comparisons by proposing a hierarchical modeling approach that is fundamentally different from the existing methods. The proposed hierarchical models simultaneously fit as many variables as possible and shrink unimportant effects towards zero. Thus, the hierarchical models yield more efficient estimates of parameters than the traditional methods that analyze genetic variants separately, and also coherently address the multiple comparisons problem due to largely reducing the effective number of genetic effects and the number of statistically “significant” effects. We develop a method for computing the effective number of genetic effects in hierarchical generalized linear models, and propose a new adjustment for multiple comparisons, the hierarchical Bonferroni correction, based on the effective number of genetic effects. Our approach not only increases the power to detect disease-associated variants but also controls the Type I error. We illustrate and evaluate our method with real and simulated data sets from genetic association studies. The method has been implemented in our freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).

##### Keywords:

Bayesian inference; effective number of parameters; effective number of hypothesis tests; generalized linear models; genetic association studies; hierarchical modeling; hierarchical Bonferroni correction; multiple comparisons
PDF
BibTeX
XML
Cite

\textit{N. Yi} et al., Stat. Appl. Genet. Mol. Biol. 13, No. 1, 35--48 (2014; Zbl 1296.92178)

Full Text:
DOI

##### References:

[1] | Armagan, A., D. Dunson and J. Lee (2010): “Bayesian generalized double Pareto shrinkage.” Biometrika. Arxiv preprint arxiv:1104.0861. · Zbl 1259.62061 |

[2] | Balding, D. J. (2006): “A tutorial on statistical methods for population association studies,” Nat. Rev. Genet., 7, 781-791. |

[3] | Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc. Ser. B, 57, 289-300. · Zbl 0809.62014 |

[4] | Benjamini, Y. and D. Yekutieli (2001): “The control of the false discovery rate in multiple testing under dependency,” Ann. Stat., 29, 1165-1188. · Zbl 1041.62061 |

[5] | Benjamini, Y. and D. Yekutieli (2005): “Quantitative trait Loci analysis using the false discovery rate,” Genetics, 171, 783-790. |

[6] | Galwey, N. W. (2009): “A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests,” Genet. Epidemiol., 33, 559-568. |

[7] | Gao, X., L. C. Becker, D. M. Becker, J. D. Starmer and M. A. Province (2010): “Avoiding the high Bonferroni penalty in genome-wide association studies,” Genet. Epidemiol., 34, 100-105. |

[8] | Gao, X., J. Starmer and E. R. Martin (2008): “A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms,” Genet. Epidemiol., 32, 361-369. |

[9] | Gelman, A. and J. Hill (2007): Data analysis using regression and Multilevel/Hierarchical models, New York: Cambridge University Press. |

[10] | Gelman, A., J. Carlin, H. Stern and D. Rubin (2003): Bayesian data analysis, Chapman and Hall, London. · Zbl 1279.62004 |

[11] | Gelman, A., J. Hill and M. Yajima (2012): “Why we (usually) don’t have to worry about multiple comparisons,” J. Res. Educ. Eff. 5, 189-211. |

[12] | Gelman, A., A. Jakulin, M. G. Pittau and Y. S. Su (2008): “A weakly informative default prior distribution for logistic and other regression models,” Ann. Appl. Stat., 2, 1360-1383. · Zbl 1156.62017 |

[13] | Hochberg, Y. (1988): “A sharper Bonferroni procedure for multiple tests of significance,” Biometrika, 75, 800-803. · Zbl 0661.62067 |

[14] | Hoffmann, T. J., N. J. Marini and J. S. Witte (2010): “Comprehensive approach to analyzing rare genetic variants,” PLoS One, 5, e13584. |

[15] | Holm, S. (1979): “A simple sequentially rejective multiple test procedure,” Scan. J. Stat., 6, 65-70. · Zbl 0402.62058 |

[16] | Hommel, G. (1988): “A stagewise rejective multiple test procedure based on a modified Bonferroni test,” Biometrika, 75, 383-386. · Zbl 0639.62025 |

[17] | Hsu, J. C. (1996): “Multiple comparisons: theory and methods, London: Chapman and Hall. · Zbl 0898.62090 |

[18] | Hung, R., P. Brennan, C. Malaveille, S. Porru, F. Donato, P. Boffetta, and J. S. Witte (2004): “Using hierarchical modeling in genetic association studies with multiple markers: application to a case-control study of bladder cancer,” Cancer Epidem. Biomar. Prev., 13, 1013-1021. |

[19] | Kaklamani, V. G., K. B. Wisinski, M. Sadim, C. Gulden, A. Do, K. Offit, J. A. Baron, H. Ahsan, C. Mantzoros, B. Pasche (2008): “Variants of the adiponectin (ADIPOQ) and adiponectin receptor 1 (ADIPOR1) genes and colorectal cancer risk,” J. Am. Med. Assoc., 300, 1523-1531. |

[20] | Kang, G., K. Ye, N. Liu, D. B. Allison and G. Gao (2009): “Weighted multiple hypothesis testing procedures,” Stat. Appl. Genet. Mol. Biol., 8, 1-21. · Zbl 1276.62078 |

[21] | King, C. R., P. J. Rathouz and D. L. Nicolae (2010): “An evolutionary framework for association testing in resequencing studies,” PLoS Genet., 6, e1001202. |

[22] | Kyung, M., J. Gill, M. Ghosh and G. Casella (2010): “Penalized regression, standard errors, and Bayesian lassos,” Bayesian Anal., 5, 369-412. · Zbl 1330.62289 |

[23] | Lu, H., J. S. Hodges and B. P. Carlin (2007): “Measuring the complexity of generalized linear hierarchical models,” Can. J. Stat., 35, 69-87. · Zbl 1219.62114 |

[24] | Madsen, B. E. and S. R. Browning (2009): “A groupwise association test for rare mutations using a weighted sum statistic,” PLoS Genet., 5, e1000384. |

[25] | McCullagh, P. and J. A. Nelder (1989): Generalized linear models, London: Chapman and Hall. · Zbl 0744.62098 |

[26] | Park, T. and G. Casella (2008): “The Bayesian lasso,” J. Am. Stat. Assoc., 103, 681-686. · Zbl 1330.62292 |

[27] | Price, A. L., G. V. Kryukov, P. I. de Bakker, S. M. Purcell, J. Staples, L. J Wei, S. R Sunyaev (2010): “Pooled association tests for rare variants in exon-resequencing studies,” Am. J. Hum. Genet., 86, 832-838. |

[28] | Pritchard, J. K. (2001): “Are rare variants responsible for susceptibility to complex diseases?” Am. J. Hum. Genet., 69, 124-137. |

[29] | Pritchard, J. K. and N. J. Cox (2002): “The allelic architecture of human disease genes: common disease-common variant…or not?” Hum. Mol. Genet., 11, 2417-2423. |

[30] | Rice, T. K., N. J. Schork and D. C. Rao (2008): “Methods for handling multiple testing,” Adv. Genet., 60, 293-308. |

[31] | Roeder, K., B. Devlin and L. Wasserman (2007): “Improving power in genome-wide association studies: weights tip the scale,” Genet. Epidemiol., 31, 741-747. |

[32] | Romeo, S., L. A. Pennacchio, Y. Fu, E. Boerwinkle, A. Tybjaerg-Hansen, H.H. Hobbs and J.C. Cohen (2007): “Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL,” Nat. Genet., 39, 513-516. |

[33] | Romeo, S., W. Yin, J. Kozlitina, L. A. Pennacchio and E. Boerwinkle (2009): “Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans,” J. Clin. Invest., 119, 70-79. |

[34] | Sabatti, C., S. Service and N. Freimer (2003): “False discovery rate in linkage and association genome screens for complex disorders,” Genetics, 164, 829-833. |

[35] | Schaid, D. J., J. P. Sinnwell, G. D. Jenkins, S. K. McDonnell and J. N. Ingle (2011): “Using the gene ontology to scan multilevel gene sets for associations in genome wide association studies,” Genet. Epidemiol., 36, 3-16. |

[36] | Spiegelhalter, D. J., N. G. Best, B. P. Carlin and A. v. d. Linde (2002): “Bayesian measures of model complexity and fit (with discussion),” J. R. Stat. Soc. Ser. B, 64, 583-639. · Zbl 1067.62010 |

[37] | Thomas, D. C., D. V. Conti, J. Baurley, F. Nijhout and M. Reed (2009): “Use of pathway information in molecular epidemiology,” Hum. Genom., 4, 21-42. |

[38] | Wang, K., M. Li and H. Hakonarson (2010): “Analysing biological pathways in genome-wide association studies,” Nat. Rev. Genet., 11, 843-854. |

[39] | Yi, N. and S. Banerjee (2009): “Hierarchical generalized linear models for multiple quantitative trait locus mapping,” Genetics, 181, 1101-1113. |

[40] | Yi, N. and S. Xu (2008): “Bayesian LASSO for quantitative trait loci mapping,” Genetics, 179, 1045-1055. |

[41] | Yi, N. and D. Zhi (2011): “Bayesian analysis of rare variants in genetic association studies,” Genet. Epidemiol., 35: 57-69. |

[42] | Yi, N. and S. Ma (2012): “Hierarchical shrinkage priors and model fitting algorithms for high-dimensional generalized linear models,” Stat. App. Genet. Mol. Biol., 11, 1544-6115. |

[43] | Yi, N., V. G. Kaklamani and B. Pasche (2011a): “Bayesian analysis of genetic interactions in case-control studies, with application to adiponectin genes and colorectal cancer risk,” Ann. Hum. Genet., 75, 90-104. |

[44] | Yi, N., N. Liu, D. Zhi and J. Li (2011b): “Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects,” PLoS Genet., 7, e1002382. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.