Pooled association tests for rare genetic variants: a review and some new results. (English) Zbl 1332.62410

Summary: In the search for genetic factors that are associated with complex heritable human traits, considerable attention is now being focused on rare variants that individually have small effects. In response, numerous recent papers have proposed testing strategies to assess association between a group of rare variants and a trait, with competing claims about the performance of various tests. The power of a given test in fact depends on the nature of any association and on the rareness of the variants in question. We review such tests within a general framework that covers a wide range of genetic models and types of data. We study the performance of specific tests through exact or asymptotic power formulas and through novel simulation studies of over 10,000 different models. The tests considered are also applied to real sequence data from the 1000 Genomes project and provided by the GAW17. We recommend a testing strategy, but our results show that power to detect association in plausible genetic scenarios is low for studies of medium size unless a high proportion of the chosen variants are causal. Consequently, considerable attention must be given to relevant biological information that can guide the selection of variants for testing.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62H15 Hypothesis testing in multivariate analysis
92D10 Genetics and epigenetics
Full Text: DOI arXiv Euclid


[1] Genomes Project Consortium (2010). A map of human genome variation from population-scale sequencing. Nature 467 1061-1073.
[2] Almasy, L., Dyer, T. D., Peralta, J. M., Kent, J. W., Charlesworth, J. C., Curran, J. E. and Blangero, J. (2011). Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc. 5 Suppl 9 S2.
[3] Asimit, J. and Zeggini, E. (2010). Rare variant association analysis methods for complex traits. Annu. Rev. Genet. 44 293-308.
[4] Bansal, V., Libiger, O., Torkamani, A. and Schork, N. J. (2010). Statistical analysis strategies for association studies involving rare variants. Nat. Rev. Genet. 11 773-785.
[5] Barnett, I. J., Lee, S. and Lin, X. (2013). Detecting rare variant effects using extreme phenotype sampling in sequencing association studies. Genet. Epidemiol. 37 142-151.
[6] Basu, S. and Pan, W. (2011). Comparison of statistical tests for disease association with rare variants. Genet. Epidemiol. 35 606-619.
[7] Daye, Z. J., Li, H. and Wei, Z. (2012). A powerful test for multiple rare variants association studies that incorporates sequencing qualities. Nucleic Acids Res. 40 e60.
[8] Derkach, A., Lawless, J. F. and Sun, L. (2013a). Supplement to “Pooled association tests for rare genetic variants: A review and some new results.” . · Zbl 1332.62410
[9] Derkach, A., Lawless, J. F. and Sun, L. (2013b). Robust and powerful tests for rare variants using Fisher’s method to combine evidence of association from two or more complementary tests. Genet. Epidemiol. 37 110-121.
[10] Derkach, A., Lawless, J. F., Merico, D., Paterson, A. D. and Sun, L. (2014). Evaluation of gene-based association tests for analyzing rare variants using Genetic Analysis Workshop 18 data. BMC Proc. 8 Suppl 1 S9.
[11] Duchesne, P. and Lafaye de Micheaux, P. (2010). Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods. Comput. Statist. Data Anal. 54 858-862. · Zbl 1465.62010
[12] Goeman, J. J., van de Geer, S. A. and van Houwelingen, H. C. (2006). Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 477-493. · Zbl 1110.62002
[13] Han, F. and Pan, W. (2010). A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 70 42-54.
[14] Hindorff, L. A., Sethupathy, P., Junkins, H. A., Ramos, E. M., Mehta, J. P., Collins, F. S. and Manolio, T. A. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106 9362-9367.
[15] Hoffmann, T. J., Marini, N. J. and Witte, J. S. (2010). Comprehensive approach to analyzing rare genetic variants. PLoS ONE 5 e13584.
[16] Huang, B. E. and Lin, D. Y. (2007). Efficient association mapping of quantitative trait loci with selective genotyping. Am. J. Hum. Genet. 80 567-576.
[17] King, C. R., Rathouz, P. J. and Nicolae, D. L. (2010). An evolutionary framework for association testing in resequencing studies. PLoS Genet. 6 e1001202.
[18] Ladouceur, M., Dastani, Z., Aulchenko, Y. S., Greenwood, C. M. T. and Richards, J. B. (2012). The empirical power of rare variant association methods: Results from sanger sequencing in 1998 individuals. PLoS Genet. 8 e1002496.
[19] Lee, S., Wu, M. C. and Lin, X. (2012). Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13 762-775.
[20] Li, Q. H. and Lagakos, S. W. (2006). On the relationship between directional and omnibus statistical tests. Scand. J. Stat. 33 239-246. · Zbl 1121.62019
[21] Li, B. and Leal, S. M. (2008). Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data. Am. J. Hum. Genet. 83 311-321.
[22] Lin, D.-Y. and Tang, Z.-Z. (2011). A general framework for detecting disease associations with rare variants in sequencing studies. The American Journal of Human Genetics 89 354-367.
[23] Madsen, B. E. and Browning, S. R. (2009). A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5 e1000384.
[24] Manolio, T. A., Brooks, L. D. and Collins, F. S. (2008). A HapMap harvest of insights into the genetics of common disease. J. Clin. Invest. 118 1590-1605.
[25] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis . Academic Press, Waltham, MA. · Zbl 0432.62029
[26] Morgenthaler, S. and Thilly, W. G. (2007). A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: A cohort allelic sums test (CAST). Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis 615 28-56.
[27] Morris, A. P. and Zeggini, E. (2010). An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet. Epidemiol. 34 188-193.
[28] Neale, B. M., Rivas, M. A., Voight, B. F., Altshuler, D. et al. (2011). Testing for an unusual distribution of rare variants. PLoS Genet. 7 e1001322.
[29] Owen, A. B. (2009). Karl Pearson’s meta-analysis revisited. Ann. Statist. 37 3867-3892. · Zbl 1191.62023
[30] Pan, W. (2009). Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Genet. Epidemiol. 33 497-507.
[31] Price, A. L., Kryukov, G. V., de Bakker, P. I., Purcell, S. M. et al. (2010). Pooled association tests for rare variants in exon-resequencing studies. The American Journal of Human Genetics 86 832-838.
[32] Rao, C. R. (1973). Linear Statistical Inference and Its Applications , 2nd ed. Wiley, Hoboken, NJ. · Zbl 0256.62002
[33] Reich, D. E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P. C., Richter, D. J., Lavery, T., Kouyoumjian, R., Farhadian, S. F., Ward, R. and Lander, E. S. (2001). Linkage disequilibrium in the human genome. Nature 411 199-204.
[34] Skotte, L., Korneliussen, T. S. and Albrechtsen, A. (2012). Association testing for next-generation sequencing data using score statistics. Genet. Epidemiol. 36 430-437.
[35] Sul, J. H., Buhm, H. and Eleazar, E. (2011). Increasing power of groupwise association test with likelihood ratio test. J. Comput. Biol. 18 1611-1624.
[36] Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence Kernel association test. The American Journal of Human Genetics 89 82-93.
[37] Yi, N. and Zhi, D. (2011). Bayesian analysis of rare variants in genetic association studies. Genet. Epidemiol. 35 57-69.
[38] Yilmaz, Y. E. and Bull, S. B. (2011). Are quantitative trait-dependent sampling designs cost-effective for analysis of rare and common variants? BMC Proc. 5 Suppl 9 S111.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.