×

Network assisted analysis to reveal the genetic basis of autism. (English) Zbl 1454.62354

Summary: While studies show that autism is highly heritable, the nature of the genetic basis of this disorder remains illusive. Based on the idea that highly correlated genes are functionally interrelated and more likely to affect risk, we develop a novel statistical tool to find more potentially autism risk genes by combining the genetic association scores with gene co-expression in specific brain regions and periods of development. The gene dependence network is estimated using a novel partial neighborhood selection (PNS) algorithm, where node specific properties are incorporated into network estimation for improved statistical and computational efficiency. Then we adopt a hidden Markov random field (HMRF) model to combine the estimated network and the genetic association scores in a systematic manner. The proposed modeling framework can be naturally extended to incorporate additional structural information concerning the dependence between genes. Using currently available genetic association data from whole exome sequencing studies and brain gene expression levels, the proposed algorithm successfully identified 333 genes that plausibly affect autism risk.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62M05 Markov processes: estimation; hidden Markov models
62M40 Random fields; image analysis

Software:

WGCNA; glasso; hglasso; HotNet
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Anney, R., Klei, L., Pinto, D., Almeida, J., Bacchelli, E., Baird, G., Bolshakova, N., Bölte, S., Bolton, P. F., Bourgeron, T., Brennan, S., Brian, J., Casey, J., Conroy, J., Correia, C., Corsello, C., Crawford, E. L., de Jonge, M., Delorme, R., Duketis, E., Duque, F., Estes, A., Farrar, P., Fernandez, B. A., Folstein, S. E., Fombonne, E., Gilbert, J., Gillberg, C., Glessner, J. T., Green, A., Green, J., Guter, S. J., Heron, E. A., Holt, R., Howe, J. L., Hughes, G., Hus, V., Igliozzi, R., Jacob, S., Kenny, G. P., Kim, C., Kolevzon, A., Kustanovich, V., Lajonchere, C. M., Lamb, J. A., Law-Smith, M., Leboyer, M., Couteur, A. L., Leventhal, B. L., Liu, X.-Q., Lombard, F., Lord, C., Lotspeich, L., Lund, S. C., Magalhaes, T. R., Mantoulan, C., McDougle, C. J., Melhem, N. M., Merikangas, A., Minshew, N. J., Mirza, G. K., Munson, J., Noakes, C., Nygren, G., Papanikolaou, K., Pagnamenta, A. T., Parrini, B., Paton, T., Pickles, A., Posey, D. J., Poustka, F., Ragoussis, J., Regan, R., Roberts, W., Roeder, K., Roge, B., Rutter, M. L., Schlitt, S., Shah, N., Sheffield, V. C., Soorya, L., Sousa, I., Stoppioni, V., Sykes, N., Tancredi, R., Thompson, A. P., Thomson, S., Tryfon, A., Tsiantis, J., Engeland, H. V., Vincent, J. B., Volkmar, F., Vorstman, J. A. S., Wallace, S., Wing, K., Wittemeyer, K., Wood, S., Zurawiecki, D., Zwaigenbaum, L., Bailey, A. J., Battaglia, A., Cantor, R. M., Coon, H., Cuccaro, M. L., Dawson, G., Ennis, S., Freitag, C. M., Geschwind, D. H., Haines, J. L., Klauck, S. M., McMahon, W. M., Maestrini, E., Miller, J., Monaco, A. P., Nelson, S. F., Nurnberger, J. I., Oliveira, G., Parr, J. R., Pericak-Vance, M. A., Piven, J., Schellenberg, G. D., Scherer, S. W., Vicente, A. M., Wassink, T. H., Wijsman, E. M., Betancur, C., Buxbaum, J. D., Cook, E. H., Gallagher, L., Gill, M., Hallmayer, J., Paterson, A. D., Sutcliffe, J. S., Szatmari, P., Vieland, V. J., Hakonarson, H. and Devlin, B. (2012). Individual common variants exert weak effects on the risk for autism spectrum disorderspi. Hum. Mol. Genet. 21 4781-4792.
[2] Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509-512. · Zbl 1226.05223 · doi:10.1126/science.286.5439.509
[3] Ben-David, E. and Shifman, S. (2012). Combined analysis of exome sequencing points toward a major role for transcription regulation during brain development in autism. Mol. Psychiatry 18 1054-1056.
[4] Besag, J. (1986). On the statistical analysis of dirty pictures. J. R. Stat. Soc. Ser. B. Stat. Methodol. 48 259-302. · Zbl 0609.62150
[5] Betancur, C. (2011). Etiological heterogeneity in autism spectrum disorders: More than 100 genetic and genomic disorders and still counting. Brain Res. 1380 42-77.
[6] Butte, A. J. and Kohane, I. S. (1999). Unsupervised knowledge discovery in medical databases using relevance networks. In Proceedings of the AMIA Symposium 711. American Medical Informatics Association, Bethesda, MD.
[7] Buxbaum, J. D., Daly, M. J., Devlin, B., Lehner, T., Roeder, K., State, M. W. and Autism Sequencing Consortium (2012). The autism sequencing consortium: Large-scale, high-throughput sequencing in autism spectrum disorders. Neuron 76 1052-1056.
[8] Cai, T., Liu, W. and Luo, X. (2011). A constrained \(\ell_{1}\) minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594-607. · Zbl 1232.62087 · doi:10.1198/jasa.2011.tm10155
[9] Cai, T. T., Liu, W. and Zhou, H. H. (2012). Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Preprint. Available at . arXiv:1212.2882 · Zbl 1341.62115 · doi:10.1214/13-AOS1171
[10] Darnell, J. C., Van Driesche, S. J., Zhang, C., Hung, K. Y. S., Mele, A., Fraser, C. E., Stone, E. F., Chen, C., Fak, J. J., Chi, S. W. et al. (2011). FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146 247-261.
[11] De Rubeis, S., He, X., Goldberg, A. P., Poultney, C. S., Samocha, K., Cicek, A. E., Kou, Y., Liu, L., Fromer, M., Walker, S., Singh, T., Klei, L., Kosmicki, J., Shih-Chen, F., Aleksic, B., Biscaldi, M., Bolton, P. F., Brownfeld, J. M., Cai, J., Campbell, N. G., Carracedo, A., Chahrour, M. H., Chiocchetti, A. G., Coon, H., Crawford, E. L., Curran, S. R., Dawson, G., Duketis, E., Fernandez, B. A., Gallagher, L., Geller, E., Guter, S. J., Hill, R. S., Ionita-Laza, J., Jimenz Gonzalez, P., Kilpinen, H., Klauck, S. M., Kolevzon, A., Lee, I., Lei, I., Lei, J., Lehtimäki, T., Lin, C.-F., Ma’ayan, A., Marshall, C. R., McInnes, A. L., Neale, B., Owen, M. J., Ozaki, N., Parellada, M., Parr, J. R., Purcell, S., Puura, K., Rajagopalan, D., Rehnström, K., Reichenberg, A., Sabo, A., Sachse, M., Sanders, S. J., Schafer, C., Schulte-Rüther, M., Skuse, D., Stevens, C., Szatmari, P., Tammimies, K., Valladares, O., Voran, A., Li-San, W., Weiss, L. A., Willsey, A. J., Yu, T. W., Yuen, R. K. C., DDD Study, Homozygosity Mapping Collaborative for Autism, UK10K Consortium, Cook, E. H., Freitag, C. M., Gill, M., Hultman, C. M., Lehner, T., Palotie, A., Schellenberg, G. D., Sklar, P., State, M. W., Sutcliffe, J. S., Walsh, C. A., Scherer, S. W., Zwick, M. E., Barett, J. C., Cutler, D. J., Roeder, K., Devlin, B., Daly, M. J. and Buxbaum, J. D. (2014). Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515 209-15.
[12] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432-441. · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[13] Gaugler et al. (2014). Most genetic risk for autism resides with common variation. Nature Genetics 46 881-885.
[14] He, X., Sanders, S. J., Liu, L., Rubeis, S. D., Lim, E. T., Sutcliffe, J. S., Schellenberg, G. D., Gibbs, R. A., Daly, M. J., Buxbaum, J. D., State, M. W., Devlin, B. and Roeder, K. (2013). Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet. 9 e1003671.
[15] Iossifov, I., Ronemus, M., Levy, D., Wang, Z., Hakker, I., Rosenbaum, J., Yamrom, B., Lee, Y. H., Narzisi, G., Leotta, A., Kendall, J., Grabowska, E., Ma, B., Marks, S., Rodgers, L., Stepansky, A., Troge, J., Andrews, P., Bekritsky, M., Pradhan, K., Ghiban, E., Kramer, M., Parla, J., Demeter, R., Fulton, L. L., Fulton, R. S., Magrini, V. J., Ye, K., Darnell, J. C., Darnell, R. B. et al. (2012). De novo gene disruptions in children on the autistic spectrum. Neuron 74 285-299.
[16] Iossifov, I., O’Roak, B. J., Sanders, S. J., Ronemus, M., Krumm, N., Levy, D., Stessman, H. A., Witherspoon, K. T., Vives, L., Patterson, K. E., Smith, J. D., Paeper, B., Nickerson, D. A., Dea, J., Dong, S., Gonzalez, L. E., Mandell, J. D., Mane, S. M., Murtha, M. T., Sullivan, C. A., Walker, M. F., Waqar, Z., Wei, L., Willsey, A. J., Yamrom, B., Lee, Y.-h., Grabowska, E., Dalkic, E., Wang, Z., Marks, S., Andrews, P., Leotta, A., Kendall, J., Hakker, I., Rosenbaum, J., Ma, B., Rodgers, L., Troge, J., Narzisi, G., Yoon, S., Schatz, M. C., Ye, K., McCombie, W. R., Shendure, J., Eichler, E. E., State, M. W. and Wigler, M. (2014). The contribution of de novo coding mutations to autism spectrum disorder. Nature 515 216-221.
[17] Kang, H. J., Kawasawa, Y. I., Cheng, F., Zhu, Y., Xu, X., Li, M., Sousa, A. M. M., Pletikos, M., Meyer, K. A., Sedmak, G., Guennel, T., Shin, Y., Johnson, M. B., Krsnik, Z., Mayer, S., Fertuzinhos, S., Umlauf, S., Lisgo, S. N., Vortmeyer, A., Weinberger, D. R., Mane, S., Hyde, T. M., Huttner, A., Reimers, M., Kleinman, J. E. and Sestan, N. (2011). Spatio-temporal transcriptome of the human brain. Nature 478 483-489. · Zbl 1226.91042 · doi:10.1134/S0005117911070034
[18] Khanin, R. and Wit, E. (2006). How scale-free are biological networks. J. Comput. Biol. 13 810-818 (electronic). · doi:10.1089/cmb.2006.13.810
[19] Klei, L., Sanders, S. J., Murtha, M. T., Hus, V., Lowe, J. K., Willsey, A. J., Moreno-De-Luca, D., Yu, T. W., Fombonne, E., Geschwind, D., Grice, D. E., Ledbetter, D. H., Lord, C., Mane, S. M., Lese Martin, C., Martin, D. M., Morrow, E. M., Walsh, C. A., Melhem, N. M., Chaste, P., Sutcliffe, J. S., State, M. W., Cook, E. H. Jr, Roeder, K. and Devlin, B. (2012). Common genetic variants, acting additively, are a major source of risk for autism. Mol. Autism 3 .
[20] Kong, A., Frigge, M. L., Masson, G., Besenbacher, S., Sulem, P., Magnusson, G., Gudjonsson, S. A., Sigurdsson, A., Jonasdottir, A., Jonasdottir, A., Wong, W. S. W., Sigurdsson, G., Walters, G. B., Steinberg, S., Helgason, H., Thorleifsson, G., Gudbjartsson, D. F., Helgason, A., Magnusson, O. T., Thorsteinsdottir, U. and Stefansson, K. (2012). Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488 471-475.
[21] Lachmann, A., Xu, H., Krishnan, J., Berger, S. I., Mazloom, A. R. and Ma’ayan, A. (2010). ChEA: Transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 26 2438-2444.
[22] Langfelder, P. and Horvath, S. (2008). WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics 9 .
[23] Lederer, J. and Müller, C. (2014a). Don’t fall for tuning parameters: Tuning-free variable selection in high dimensions with the TREX. Preprint. Available at . arXiv:1404.0541
[24] Lederer, J. and Müller, C. (2014b). Topology adaptive graph estimation in high dimensions. Preprint. Available at . arXiv:1410.7279
[25] Levina, E. and Bickel, P. J. (2004). Maximum likelihood estimation of intrinsic dimension. In Advances in Neural Information Processing Systems 777-784.
[26] Li, H., Wei, Z. and Maris, J. (2010). A hidden Markov random field model for genome-wide association studies. Biostatistics 11 139-150.
[27] Li, S., Hsu, L., Peng, J. and Wang, P. (2011). Bootstrap inference for network construction. Preprint. Available at . arXiv:1111.5028
[28] Liu, L., Lei, L. and Roeder, K. (2015). Supplement to “Network assisted analysis to reveal the genetic basis of autism.” . · Zbl 1454.62354
[29] Liu, H., Roeder, K. and Wasserman, L. (2010). Stability approach to regularization selection (stars) for high dimensional graphical models. In Advances in Neural Information Processing Systems 1432-1440.
[30] Liu, J. Z., Mcrae, A. F., Nyholt, D. R., Medland, S. E., Wray, N. R., Brown, K. M., Hayward, N. K., Montgomery, G. W., Visscher, P. M., Martin, N. G. et al. (2010). A versatile gene-based test for genome-wide association studies. The American Journal of Human Genetics 87 139-145.
[31] Liu, L., Sabo, A., Neale, B. M., Nagaswamy, U., Stevens, C., Lim, E., Bodea, C. A., Muzny, D., Reid, J. G., Banks, E., Coon, H., Depristo, M., Dinh, H., Fennel, T., Flannick, J., Gabriel, S., Garimella, K., Gross, S., Hawes, A., Lewis, L., Makarov, V., Maguire, J., Newsham, I., Poplin, R., Ripke, S., Shakir, K., Samocha, K. E., Wu, Y., Boerwinkle, E., Buxbaum, J. D., Cook, E. H., Devlin, B., Schellenberg, G. D., Sutcliffe, J. S., Daly, M. J., Gibbs, R. A. and Roeder, K. (2013). Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls. PLoS Genet. 9 e1003443.
[32] Liu, L., Lei, J., Sanders, S. J., Willsey, A. J., Kou, Y., Cicek, A. E., Klei, L., Lu, C., He, X., Li, M. et al. (2014). DAWN: A framework to identify autism genes and subnetworks using gene expression and genetics. Mol. Autism 5 22.
[33] Luo, F., Yang, Y., Zhong, J., Gao, H., Khan, L., Thompson, D. K. and Zhou, J. (2007). Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory. BMC Bioinformatics 8 299.
[34] Ma, S., Xue, L. and Zou, H. (2013). Alternating direction methods for latent variable Gaussian graphical model selection. Neural Comput. 25 2172-2198. · doi:10.1162/NECO_a_00379
[35] Mairal, J. and Yu, B. (2013). Supervised feature selection in graphs with path coding penalties and network flows. J. Mach. Learn. Res. 14 2449-2485. · Zbl 1317.68175
[36] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[37] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417-473. · doi:10.1111/j.1467-9868.2010.00740.x
[38] Müller, P., Parmigiani, G. and Rice, K. (2006). FDR and Bayesian multiple comparisons rules. Bayesian Statistics 8 349-470. · Zbl 0893.00018 · doi:10.1007/978-1-4612-1732-9
[39] Neale, B. M., Kou, Y., Liu, L., Ma’ayan, A., Samocha, K. E., Sabo, A., Lin, C. F., Stevens, C., Wang, L. S., Makarov, V., Polak, P., Yoon, S., Maguire, J., Crawford, E. L., Campbell, N. G., Geller, E. T., Valladares, O., Schafer, C., Liu, H., Zhao, T., Cai, G., Lihm, J., Dannenfelser, R., Jabado, O., Peralta, Z., Nagaswamy, U., Muzny, D., Reid, J. G., Newsham, I., Wu, Y. et al. (2012). Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485 242-245.
[40] O’Roak, B. J., Deriziotis, P., Lee, C., Vives, L., Schwartz, J. J., Girirajan, S., Karakoc, E., Mackenzie, A. P., Ng, S. B., Baker, C., Rieder, M. J., Nickerson, D. A., Bernier, R., Fisher, S. E., Shendure, J. and Eichler, E. E. (2011). Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat. Genet. 43 585-589.
[41] O’Roak, B. J., Vives, L., Girirajan, S., Karakoc, E., Krumm, N., Coe, B. P., Levy, R., Ko, A., Lee, C., Smith, J. D., Turner, E. H., Stanaway, I. B., Vernot, B., Malig, M., Baker, C., Reilly, B., Akey, J. M., Borenstein, E., Rieder, M. J., Nickerson, D. A., Bernier, R., Shendure, J. and Eichler, E. E. (2012). Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485 246-250.
[42] Opgen-Rhein, R. and Strimmer, K. (2007). From correlation to causation networks: A simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Syst. Biol. 1 37. · Zbl 1166.62361
[43] Parikshak, N. N., Luo, R., Zhang, A., Won, H., Lowe, J. K., Chandran, V., Horvath, S. and Geschwind, D. H. (2013). Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell 155 1008-1021.
[44] Peng, J., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735-746. · Zbl 1388.62046 · doi:10.1198/jasa.2009.0126
[45] Pers, T. H., Dworzyński, P., Thomas, C. E., Lage, K. and Brunak, S. (2013). MetaRanker 2.0: A web server for prioritization of genetic variation data. Nucleic Acids Res. 41 W104-W108.
[46] Raychaudhuri, S., Plenge, R. M., Rossin, E. J., Ng, A. C., Purcell, S. M., Sklar, P., Scolnick, E. M., Xavier, R. J., Altshuler, D., Daly, M. J. et al. (2009). Identifying relationships among genomic disease regions: Predicting genes at pathogenic SNP associations and rare deletions. PLoS Genetics 5 e1000534.
[47] Rossin, E. J., Lage, K., Raychaudhuri, S., Xavier, R. J., Tatar, D., Benita, Y., Cotsapas, C., Daly, M. J., Constortium, I. I. B. D. G. et al. (2011). Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genetics 7 e1001273.
[48] Sanders, S. J., Murtha, M. T., Gupta, A. R., Murdoch, J. D., Raubeson, M. J., Willsey, A. J., Ercan-Sencicek, A. G., DiLullo, N. M., Parikshak, N. N., Stein, J. L., Walker, M. F., Ober, G. T., Teran, N. A., Song, Y., El-Fishawy, P., Murtha, R. C., Choi, M., Overton, J. D., Bjornson, R. D., Carriero, N. J., Meyer, K. A., Bilguvar, K., Mane, S. M., Sestan, N., Lifton, R. P., Gunel, M., Roeder, K., Geschwind, D. H., Devlin, B. and State, M. W. (2012). De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485 82-93.
[49] Schäfer, J. and Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4 Art. 32, 28 pp. (electronic).
[50] Stumpf, M. P. H., Wiuf, C. and May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc. Natl. Acad. Sci. USA 102 4221-4224.
[51] Tan, K. M., London, P., Mohan, K., Lee, S.-I., Fazel, M. and Witten, D. (2014). Learning graphical models with hubs. J. Mach. Learn. Res. 15 3297-3331. · Zbl 1318.68155
[52] Vandin, F., Upfal, E. and Raphael, B. J. (2011). Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18 507-522. · doi:10.1089/cmb.2010.0265
[53] Wei, P. and Pan, W. (2008). Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics 24 404-411.
[54] Willsey, A. J., Sanders, S. J., Li, M., Dong, S., Tebbenkamp, A. T., Muhle, R. A., Reilly, S. K., Lin, L., Fertuzinhos, S., Miller, J. A., Murtha, M. T., Bichsel, C., Niu, W., Cotney, J., Ercan-Sencicek, A. G., Gockley, J., Gupta, A. R., Han, W., He, X., Hoffman, E. J., Klei, L., Lei, J., Liu, W., Liu, L., Lu, C., Xu, X., Zhu, Y., Mane, S. M., Lein, E. S., Wei, L. et al. (2013). Coexpression networks implicate human midfetal deep cortical projection neurons in the pathogenesis of autism. Cell 155 997-1007.
[55] Yip, A. M. and Horvath, S. (2007). Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics 8 22.
[56] Zhang, B. and Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4 Art. 17, 45 pp. (electronic). · Zbl 1077.92042 · doi:10.2202/1544-6115.1128
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.