Bayesian group Lasso for nonparametric varying-coefficient models with application to functional genome-wide association studies. (English) Zbl 1397.62260

Summary: Although genome-wide association studies (GWAS) have proven powerful for comprehending the genetic architecture of complex traits, they are challenged by a high dimension of single-nucleotide polymorphisms (SNPs) as predictors, the presence of complex environmental factors, and longitudinal or functional natures of many complex traits or diseases. To address these challenges, we propose a high-dimensional varying-coefficient model for incorporating functional aspects of phenotypic traits into GWAS to formulate a so-called functional GWAS or \(f\mathrm{GWAS}\). The Bayesian group lasso and the associated MCMC algorithms are developed to identify significant SNPs and estimate how they affect longitudinal traits through time-varying genetic actions. The model is generalized to analyze the genetic control of complex traits using subject-specific sparse longitudinal data. The statistical properties of the new model are investigated through simulation studies. We use the new model to analyze a real GWAS data set from the Framingham Heart Study, leading to the identification of several significant SNPs associated with age-specific changes of body mass index. The \(f\mathrm{GWAS}\) model, equipped with the Bayesian group Lasso, will provide a useful tool for genetic and developmental analysis of complex traits or diseases.


62J07 Ridge regression; shrinkage estimators (Lasso)
62G08 Nonparametric regression and quantile regression
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI arXiv Euclid


[1] Cho, S., Kim, H., Oh, S., Kim, K. and Park, T. (2009). Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis. BMC Proc. 3 Suppl 7 S25.
[2] Cui, Y., Wu, R., Casella, G. and Zhu, J. (2008). Nonparametric functional mapping of quantitative trait loci underlying programmed cell death. Stat. Appl. Genet. Mol. Biol. 7 Art. 4, 32. · Zbl 1276.92027
[3] Daly, A. K. (2010). Genome-wide association studies in pharmacogenomics. Nat. Rev. Genet. 11 241-246.
[4] Das, K., Li, J., Wang, Z., Fu, G., Li, Y., Mauger, D., Li, R. and Wu, R. (2011). A dynamic model for genome-wide association studies. Hum. Genet. 129 629-639.
[5] Dawber, T. R., Meadors, G. F. and Moore, F. E. Jr. (1951). Epidemiological approaches to heart disease: The framingham study. Am. J. Publ. Health 41 279-286.
[6] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849-911.
[7] Filiault, D. L. and Maloof, J. N. (2012). A genome-wide association study identifies variants underlying the arabidopsis thaliana shade avoidance response. PLoS Genet. 8 e1002589.
[8] Frayling, T. M. (2007). Genome-wide association studies provide new insights into type 2 diabetes aetiology. Nat. Rev. Genet. 8 657-662.
[9] Frayling, T. M., Timpson, N. J., Weedon, M. N., Zeggini, E., Freathy, R. M., Lindgren, C. M., Perry, J. R. B., Elliott, K. S., Lango, H., Rayner, N. W., Shields, B., Harries, L. W., Barrett, J. C., Ellard, S., Groves, C. J., Knight, B., Patch, A.-M., Ness, A. R., Ebrahim, S., Lawlor, D. A., Ring, S. M., Ben-Shlomo, Y., Jarvelin, M.-R., Sovio, U., Bennett, A. J., Melzer, D., Ferrucci, L., Loos, R. J. F., Barroso, I., Wareham, N. J., Karpe, F., Owen, K. R., Cardon, L. R., Walker, M., Hitman, G. A., Palmer, C. N. A., Doney, A. S. F., Morris, A. D., Smith, G. D., Hattersley, A. T. and McCarthy, M. I. (2007). A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316 889-894.
[10] Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457-472. · Zbl 1386.65060
[11] Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis , 2nd ed. Chapman & Hall, Boca Raton, FL. · Zbl 1039.62018
[12] Gorlova, O. Y., Amos, C. I., Wang, N. W., Shete, S., Turner, S. T. and Boerwinkle, E. (2003). Genetic linkage and imprinting effects on body mass index in children and young adults. European Journal of Human Genetics 11 425-432.
[13] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning : Data Mining , Inference , and Prediction , 2nd ed. Springer, New York. · Zbl 1273.62005
[14] He, Q. and Lin, D.-Y. (2011). A variable selection method for genome-wide association studies. Bioinformatics 27 1-8. · Zbl 1309.92004
[15] Jaquish, C. E. (2007). The framingham heart study, on its way to becoming the gold standard for cardiovascular genetic epidemiology? BMC Med. Genet. 8 63.
[16] Jood, K., Jern, C., Wilhelmsen, L. and Rosengren, A. (2004). Body mass index in mid-life is associated with a first stroke in men: A prospective population study over 28 years. Stroke 35 2764-2769.
[17] Lettre, G. (2011). Recent progress in the study of the genetics of height. Human Genetics 129 465-472.
[18] Li, J., Das, K., Fu, G., Li, R. and Wu, R. (2012). Bayesian lasso for genome-wide association studies. Bioinformatics 27 516-523. · Zbl 1022.68519
[19] Li, J., Wang, Z., Li, R. and Wu, R. (2015). Supplement to “Bayesian group Lasso for nonparametric varying-coefficient models with application to functional genome-wide association studies.” . · Zbl 1397.62260
[20] Lin, M. and Wu, R. (2006). A joint model for nonparametric functional mapping of longitudinal trajectory and time-to-event. BMC Bioinformatics 7 138.
[21] Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272-2297. · Zbl 1106.62041
[22] Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits . Sinauer, Sunderland, MA.
[23] Ma, C. X., Casella, G. and Wu, R. L. (2002). Functional mapping of quantitative trait loci underlying the character process: A theoretical framework. Genetics 161 1751-1762. · Zbl 1126.92036
[24] Michel, S., Liang, L., Depner, M., Klopp, N., Ruether, A., Kumar, A., Schedel, M., Vogelberg, C., von Mutius, E., von Berg, A., Bufe, A., Rietschel, E., Heinzmann, A., Laub, O., Simma, B., Frischer, T., Genuneit, J., Gut, I. G., Schreiber, S., Lathrop, M., Illig, T. and Kabesch, M. (2010). Unifying candidate gene and GWAS approaches in asthma. PLoS ONE 5 e13894.
[25] Morgan, A. R., Thompson, J. M., Murphy, R., Black, P. N., Lam, W. J., Ferguson, L. R. and Mitchell, E. A. (2010). Obesity and diabetes genes are associated with being born small for gestational age: Results from the auckland birthweight collaborative study. BMC Medical Genetics 11 125.
[26] Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681-686. · Zbl 1330.62292
[27] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis , 2nd ed. Springer, New York. · Zbl 1079.62006
[28] Sandhu, M. S., Weedon, M. N., Fawcett, K. A., Wasson, J., Debenham, S. L., Daly, A., Lango, H., Frayling, T. M., Neumann, R. J., Sherva, R., Blech, I., Pharoah, P. D., Palmer, C. N. A., Kimber, C., Tavendale, R., Morris, A. D., McCarthy, M. I., Walker, M., Hitman, G., Glaser, B., Permutt, M. A., Hattersley, A. T., Wareham, N. J. and Barroso, I. (2007). Common variants in WFS1 confer risk of type 2 diabetes. Nat. Genet. 39 951-953.
[29] Scott, L. J., Mohlke, K. L., Bonnycastle, L. L., Willer, C. J., Li, Y., Duren, W. L., Erdos, M. R., Stringham, H. M., Chines, P. S., Jackson, A. U., Prokunina-Olsson, L., Ding, C.-J., Swift, A. J., Narisu, N., Hu, T., Pruim, R., Xiao, R., Li, X.-Y., Conneely, K. N., Riebow, N. L., Sprau, A. G., Tong, M., White, P. P., Hetrick, K. N., Barnhart, M. W., Bark, C. W., Goldstein, J. L., Watkins, L., Xiang, F., Saramies, J., Buchanan, T. A., Watanabe, R. M., Valle, T. T., Kinnunen, L., Abecasis, G. R., Pugh, E. W., Doheny, K. F., Bergman, R. N., Tuomilehto, J., Collins, F. S. and Boehnke, M. (2007). A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316 1341-1345.
[30] Shuldiner, A. R. et al. (2009). Association of cytochrome P450 2C19 genotype with the antiplatelet effect and clinical efficacy of clopidogrel therapy. J. Am. Med. Assoc. 302 849-857.
[31] Steinthorsdottir, V., Thorleifsson, G., Reynisdottir, I., Benediktsson, R., Jonsdottir, T., Walters, G. B., Styrkarsdottir, U., Gretarsdottir, S., Emilsson, V., Ghosh, S., Baker, A., Snorradottir, S., Bjarnason, H., Ng, M. C. Y., Hansen, T., Bagger, Y., Wilensky, R. L., Reilly, M. P., Adeyemo, A., Chen, Y., Zhou, J., Gudnason, V., Chen, G., Huang, H., Lashley, K., Doumatey, A., So, W.-Y., Ma, R. C. Y., Andersen, G., Borch-Johnsen, K., Jorgensen, T., van Vliet-Ostaptchouk, J. V., Hofker, M. H., Wijmenga, C., Christiansen, C., Rader, D. J., Rotimi, C., Gurney, M., Chan, J. C. N., Pedersen, O., Sigurdsson, G., Gulcher, J. R., Thorsteinsdottir, U., Kong, A. and Stefansson, K. (2007). A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat. Genet. 39 770-775.
[32] Suchocki, T. and Szyda, J. (2011). Statistical modelling of growth using a mixed model with orthogonal polynomials. J. Appl. Genet. 52 95-100.
[33] Takeuchi, F., McGinnis, R., Bourgeois, S., Barnes, C., Eriksson, N., Soranzo, N., Whittaker, P., Ranganath, V., Kumanduri, V., McLaren, W., Holm, L., Lindh, J., Rane, A., Wadelius, M. and Deloukas, P. (2009). A genome-wide association study confirms VKORC1, CYP2C9, and CYP4F2 as principal genetic determinants of warfarin dose. PLoS Genet. 5 e1000433.
[34] Teichert, M., Eijgelsheim, M., Rivadeneira, F., Uitterlinden, A. G., van Schaik, R. H. N., Hofman, A., Smet, P. A. G. M. D., van Gelder, T., Visser, L. E. and Stricker, B. H. C. (2009). A genome-wide association study of acenocoumarol maintenance dosage. Hum. Mol. Genet. 18 3758-3768.
[35] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[36] Vidal-Puig, A. J., Considine, R. V., Jimenez-Liñan, M., Werman, A., Pories, W. J., Caro, J. F. and Flier, J. S. (1997). Peroxisome proliferator-activated receptor gene expression in human tissues. Effects of obesity, weight loss, and regulation by insulin and glucocorticoids. J. Clin. Invest. 99 2416-2422.
[37] Wang, L., Li, H. and Huang, J. Z. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Amer. Statist. Assoc. 103 1556-1569. · Zbl 1286.62034
[38] Wang, Z., Li, Y., Li, Q. and Wu, R. (2009). Joint functional mapping of quantitative trait loci for HIV-1 and CD4\(^+\) dynamics. Int. J. Biostat. 5 Art. 9, 26.
[39] Wu, R. and Lin, M. (2006). Functional mapping-How to map and study the genetic architecture of dynamic complex traits. Nature Review Genetics 7 229-237.
[40] Wu, R., Ma, C.-X., Lin, M., Wang, Z. and Casella, G. (2004). Functional mapping of quantitative trait loci underlying growth trajectories using a transform-both-sides logistic model. Biometrics 60 729-738. · Zbl 1274.62907
[41] Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E. and Lange, K. (2009). Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25 714-721.
[42] Xu, Z. and Taylor, J. A. (2009). SNPinfo: Integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res. 37(suppl 2) W600-W605.
[43] Yang, R. and Xu, S. (2007). Bayesian shrinkage analysis of quantitative trait loci for dynamic traits. Genetics 176 1169-1185.
[44] Yang, J., Benyamin, B., McEvoy, B. P., Gordon, S., Henders, A. K., Nyholt, D. R., Madden, P. A., Heath, A. C., Martin, N. G., Montgomery, G. W., Goddard, M. E. and Visscher, P. M. (2010). Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42 565-569.
[45] Yap, J. S., Fan, J. and Wu, R. (2009). Nonparametric modeling of longitudinal covariance structure in functional mappings of quantitative trait loci. Biometrics 65 1068-1077. · Zbl 1181.62186
[46] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49-67. · Zbl 1141.62030
[47] Zhang, H. H. and Lin, Y. (2006). Component selection and smoothing for nonparametric regression in exponential families. Statist. Sinica 16 1021-1041. · Zbl 1107.62036
[48] Zhao, W., Chen, Y. Q., Casella, G., Cheverud, J. M. and Wu, R. L. (2005). A nonstationary model for functional mapping of complex traits. Bioinformatics 21 2469-2477. · Zbl 1258.62001
[49] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.