Modeling hybrid traits for comorbidity and genetic studies of alcohol and nicotine co-dependence. (English) Zbl 1412.62190

Summary: We propose a novel multivariate model for analyzing hybrid traits and identifying genetic factors for comorbid conditions. Comorbidity is a common phenomenon in mental health in which an individual suffers from multiple disorders simultaneously. For example, in the Study of Addiction: Genetics and Environment (SAGE), alcohol and nicotine addiction were recorded through multiple assessments that we refer to as hybrid traits. Statistical inference for studying the genetic basis of hybrid traits has not been well developed. Recent rank-based methods have been utilized for conducting association analyses of hybrid traits but do not inform the strength or direction of effects. To overcome this limitation, a parametric modeling framework is imperative. Although such parametric frameworks have been proposed in theory, they are neither well developed nor extensively used in practice due to their reliance on complicated likelihood functions that have high computational complexity. Many existing parametric frameworks tend to instead use pseudo-likelihoods to reduce computational burdens. Here, we develop a model fitting algorithm for the full likelihood. Our extensive simulation studies demonstrate that inference based on the full likelihood can control the type-I error rate, and gains power and improves the effect size estimation when compared with several existing methods for hybrid models. These advantages remain even if the distribution of the latent variables is misspecified. After analyzing the SAGE data, we identify three genetic variants (rs7672861, rs958331, rs879330) that are significantly associated with the comorbidity of alcohol and nicotine addiction at the chromosome-wide level. Moreover, our approach has greater power in this analysis than several existing methods for hybrid traits. Although the analysis of the SAGE data motivated us to develop the model, it can be broadly applied to analyze any hybrid responses.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62N05 Reliability and life testing
62H20 Measures of association (correlation, canonical correlation, etc.)


Full Text: DOI Euclid


[1] Anderson, J. A. and Pemberton, J. D. (1985). The grouped continuous model for multivariate ordered categorical variables and covariate adjustment. Biometrics41 875-885. · Zbl 0615.62065 · doi:10.2307/2530960
[2] Boscardin, W. J., Zhang, X. and Belin, T. R. (2008). Modeling a mixture of ordinal and continuous repeated measures. J. Stat. Comput. Simul.78 873-886. · Zbl 1149.62014 · doi:10.1080/00949650701480259
[3] Chen, X., Cho, K., Singer, B. and Zhang, H. (2011). The nuclear transcription factor PKNOX2 is a candidate gene for substance dependence in European-origin women. PLoS ONE6 e16002.
[4] de Leon, A. R. (2005). Pairwise likelihood approach to grouped continuous model and its extension. Statist. Probab. Lett.75 49-57. · Zbl 1080.62039 · doi:10.1016/j.spl.2005.05.017
[5] de Leon, A. R. and Carrière, K. C. (2007). General mixed-data model: Extension of general location and grouped continuous models. Canad. J. Statist.35 533-548. · Zbl 1143.62323
[6] de Leon, A. R. and Carrière, K. C. (2013). Analysis of Mixed Data: Methods and Applications. Chapman and Hall/CRC, Boca Raton, FL. · Zbl 1318.62006
[7] Ferreira, M. A. and Purcell, S. M. (2009). A multivariate test of association. Bioinformatics25 132-133.
[8] Galesloot, T. E., van Steen, K., Kiemeney, L. A. L. M., Janss, L. L. and Vermeulen, S. H. (2014). A comparison of multivariate genome-wide association methods. PLoS ONE9 e95923.
[9] He, Q., Avery, C. L. and Lin, D.-Y. (2013). A general framework for association tests with multivariate traits in large-scale genomics studies. Genet. Epidemiol.37 759-767.
[10] He, J., Li, H., Edmondson, A. C., Rader, D. J. and Li, M. (2012). A Gaussian copula approach for the analysis of secondary phenotypes in case-control genetic association studies. Biostatistics13 497-508. · Zbl 1244.62156 · doi:10.1093/biostatistics/kxr025
[11] Jiang, Y., Li, N. and Zhang, H. (2014). Identifying genetic variants for addiction via propensity score adjusted generalized Kendall’s tau. J. Amer. Statist. Assoc.109 905-930. · Zbl 1368.62283 · doi:10.1080/01621459.2014.901223
[12] Kawakatsu, H. and Largey, A. G. (2009). EM algorithms for ordered probit models with endogenous regressors. Econom. J.12 164-186. · Zbl 1190.62200 · doi:10.1111/j.1368-423X.2008.00272.x
[13] Kim, D. K., Cho, M. H., Hersh, C. P., Lomas, D. A., Miller, B. E., Kong, X., Bakke, P., Gulsvik, A., Agustí, A., Wouters, E. et al. (2012). Genome-wide association analysis of blood biomarkers in chronic obstructive pulmonary disease. Am. J. Respir. Crit. Care Med.186 1238-1247.
[14] Klei, L., Luca, D., Devlin, B. and Roeder, K. (2008). Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet. Epidemiol.32 9-19.
[15] Kwak, M., Zheng, G. and Wu, C. O. (2013). Joint tests for mixed traits in genetic association studies. In Analysis of Mixed Data 31-41. CRC Press, Boca Raton, FL.
[16] Laird, N. M. and Lange, C. (2011). The Fundamentals of Modern Statistical Genetics. Springer, New York. · Zbl 1280.62011
[17] Lange, C., Silverman, E. K., Xu, X., Weiss, S. T. and Laird, N. M. (2003). A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics4 195-206. · Zbl 1139.62317 · doi:10.1093/biostatistics/4.2.195
[18] Lenz, G., Davis, R., Ngo, V., Lam, L., George, T., Wright, G., Dave, S., Zhao, H., Xu, W., Rosenwald, A., Ott, G., Muller-Hermelink, H., Gascoyne, R., Connors, J., Rimsza, L., Campo, E., Jaffe, E., Delabie, J., Smeland, E., Fisher, R., Chan, W. and Staudt, L. (2008). Oncogenic CARD11 mutations in human diffuse large B cell lymphoma. Science319 1676-1679.
[19] Li, M. D. and Burmeister, M. (2009). New insights into the genetics of addiction. Nat. Rev. Genet.10 225-231.
[20] Liu, C., Rubin, D. B. and Wu, Y. N. (1998). Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika85 755-770. · Zbl 0921.62071 · doi:10.1093/biomet/85.4.755
[21] Meng, X.-L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika80 267-278. · Zbl 0778.62022 · doi:10.1093/biomet/80.2.267
[22] O’Reilly, P. F., Hoggart, C. J., Pomyen, Y., Calboli, F. C. F., Elliott, P., Jarvelin, M.-R. and Coin, L. J. M. (2012). MultiPhen: Joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE7 e34861.
[23] Poon, W.-Y. and Lee, S. Y. (1987). Maximum likelihood estimation of multivariate polyserial and polychoric correlation coefficients. Psychometrika52 409-430. · Zbl 0627.62060 · doi:10.1007/BF02294364
[24] Ruud, P. A. (1991). Extensions of estimation methods using the EM algorithm. J. Econometrics49 305-341. · Zbl 0742.62106 · doi:10.1016/0304-4076(91)90001-T
[25] Yang, Q., Wu, H., Guo, C.-Y. and Fox, C. S. (2010). Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet. Epidemiol.34 444-454.
[26] Zhang, H. (2011). Statistical analysis in genetic studies of mental illnesses. Statist. Sci.26 116-129. · Zbl 1219.92050 · doi:10.1214/11-STS353
[27] Zhang, H., Liu, C.-T. and Wang, X. (2010). An association test for multiple traits based on the generalized Kendall’s tau. J. Amer. Statist. Assoc.105 473-481. · Zbl 1392.62133 · doi:10.1198/jasa.2009.ap08387
[28] Zhang, H., Liu, D., Zhao, J. and Bi, X. (2018). Supplement to “Modeling hybrid traits for comorbidity and genetic studies of alcohol and nicotine co-dependence.” DOI:10.1214/18-AOAS1156SUPP. · Zbl 1412.62190
[29] Zhao, J. and Zhang, H. (2016). Modeling multiple responses via bootstrapping margins with an application to genetic association testing. Stat. Interface9 47-56. · Zbl 1388.62339 · doi:10.4310/SII.2016.v9.n1.a5
[30] Zhu, W., Jiang, Y. and Zhang, H. (2012). Nonparametric covariate-adjusted association tests based on the generalized Kendall’s tau. J. Amer. Statist. Assoc.107 1-11. · Zbl 1328.62603 · doi:10.1080/01621459.2011.643707
[31] Zhu, W. and Zhang, H. (2009). Why do we test multiple traits in genetic association studies? J. Korean Statist. Soc.38 1-10. · Zbl 1293.62230 · doi:10.1016/j.jkss.2008.10.006
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.