Estimation of multiple networks in Gaussian mixture models. (English) Zbl 1335.62098

Summary: We aim to estimate multiple networks in the presence of sample heterogeneity, where the independent samples (i.e. observations) may come from different and unknown populations or distributions. Specifically, we consider penalized estimation of multiple precision matrices in the framework of a Gaussian mixture model. A major innovation is to take advantage of the commonalities across the multiple precision matrices through possibly nonconvex fusion regularization, which for example makes it possible to achieve simultaneous discovery of unknown disease subtypes and detection of differential gene (dys)regulations in functional genomics. We embed in the EM algorithm one of two recently proposed methods for estimating multiple precision matrices in Gaussian graphical models. We demonstrate the feasibility and potential usefulness of the proposed methods in an application to glioblastoma subtype discovery and differential gene network analysis with a microarray gene expression data set. We also conduct realistic simulation studies to evaluate and compare the performance of various methods.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62A09 Graphical methods in statistics
62P10 Applications of statistics to biology and medical sciences; meta analysis


HdBCS; glasso; mclust
Full Text: DOI Euclid


[1] Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers., Foundations and Trends in Machine Learning , 3 (1), 1-122. · Zbl 1229.90122
[2] Brennan, C. W., Verhaak, R. G., McKenna, A., Campos, B., Noushmehr, H., Salama, S. R., Zheng, S., Chakravarty, D., Sanborn, J. Z., Berman, S. H., et al. (2013). The somatic genomic landscape of glioblastoma. Cell , 155 (2), 462-477.
[3] Cantley, L. C. and Neel, B. G. (1999). New insights into tumor suppression: PTEN suppresses tumor formation by restraining the phosphoinositide 3-kinase/AKT pathway., Proceedings of the National Academy of Sciences , 96 (8), 4240-4245.
[4] Danaher, P., Wang, P., and Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes., Journal of the Royal Statistical Society, Series B , 76 (2), 373-397.
[5] de Souto, M. C., Costa, I. G., de Araujo, D. S., Ludermir, T. B., and Schliep, A. (2008). Clustering cancer gene expression data: a comparative study., BMC Bioinformatics , 9 (1), 497.
[6] Dempster, A. P., Laird, N. M., Rubin, D. B., et al. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B , 39 (1), 1-38. · Zbl 0364.62022
[7] Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data., Journal of Multivariate Analysis , 90 (1), 196-212. · Zbl 1047.62104
[8] Fraley, C. and Raftery, A.E. (2006). MCLUST version 3 for R: normal mixture modeling and model-based clustering. Technical Report no. 504, Department of Statistics, University of, Washington.
[9] Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models., Science , 305 (5659), 799-805.
[10] Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso., Biostatistics , 9 (3), 432-441. · Zbl 1143.62076
[11] Guo, J., Levina, E., Michailidis, G., Zhu, J. (2011). Joint estimation of multiple graphical models., Biometrika , 98 , 1-15. · Zbl 1214.62058
[12] Hill, S.M., and Mukherjee, S. (2013). Network-based clustering with mixtures of L1-penalized Gaussian graphical models: an empirical investigation., .
[13] Huang, S., Li, J., Sun, L., Ye, J., Fleisher, A., Wu, T., Chen, K., Reiman, E. and Alzheimer’s Disease NeuroImaging Initiative (2010). Learning brain connectivity of Alzheimer’s disease by sparse inverse covariance estimation., Neuroimage , 50 (3), 935-949.
[14] Kerr, G., Ruskin, H. J., Crane, M., and Doolan, P. (2008). Techniques for clustering gene expression data., Computers in Biology and Medicine , 38 (3), 283-293.
[15] Kolar, M., Liu, H. and Xing, E. P. (2014). Graph estimation from multi-attribute data., Journal of Machine Learning Research , 15 (1), 1713-1750. · Zbl 1319.62113
[16] Liu, X. and Ling, Z. Q. (2015). Role of isocitrate dehydrogenase 1/2 (IDH 1/2) gene mutations in human tumors, Histology and Histopathology , 30 (10), 1155-1160.
[17] McLachlan, G. J. (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture., Applied Statistics , 318-324.
[18] McLachlan, G., and Peel, D. (2001)., Finite Mixture Models , Wiley.
[19] McLendon, R., Friedman, A., Bigner, D., Van Meir, E. G., Brat, D. J., Mastrogianakis, G. M., Olson, J. J., Mikkelsen, T., Lehman, N., Aldape, K., et al. (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature , 455 (7216), 1061-1068.
[20] Mirzaa, G., Parry, D. A., Fry, A. E., Giamanco, K. A., Schwartzentruber, J., Vanstone, M., Logan, C. V., Roberts, N., Johnson, C. A., Singh, S. and Kholmanskikh, S. S. (2014)., De novo CCND2 mutations leading to stabilization of cyclin D2 cause megalencephaly-polymicrogyria-polydactyly-hydrocephalus syndrome. Nature Genetics , 46 (5), 510.
[21] Mohan, K., London, P., Fazel, M., Witten, D., and Lee, S. I. (2014). Node-based learning of multiple gaussian graphical models., The Journal of Machine Learning Research , 15 (1), 445-488. · Zbl 1318.62181
[22] Narita, Y., Nagane, M., Mishima, K., Huang, H. S., Furnari, F. B. and Cavenee, W. K. (2002). Mutant epidermal growth factor receptor signaling down-regulates p27 through activation of the phosphatidylinositol 3-kinase/Akt pathway in glioblastomas., Cancer Research , 62 (22), 6764-6769.
[23] Pan, W. and Shen, X. (2007). Penalized model-based clustering with application to variable selection., Journal of Machine Learning Research , 8 , 1145-1164. · Zbl 1222.68279
[24] Peterson, C., Stingo, F. C. and Vannucci, M. (2015). Bayesian inference of multiple Gaussian graphical models., Journal of the American Statistical Association , 110 (509), 159-174. · Zbl 1373.62106
[25] Qiu, H., Han, F., Liu, H. and Caffo, B. (2015). Joint estimation of multiple graphical models from high dimensional time series., Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 78 (2), 487-504.
[26] Reynolds, J. H. and Templin, W. D. (2004). Comparing mixture estimates by parametric bootstrapping likelihood ratios., Journal of Agricultural, Biological, and Environmental Statistics , 9 (1), 57-74.
[27] Rozenblatt-Rosen, O., Mosonego-Ornan, E., Sadot, E., Madar-Shapiro, L., Sheinin, Y., Ginsberg, D., and Yayon, A. (2002). Induction of chondrocyte growth arrest by fgf: transcriptional and cytoskeletal alterations., Journal of Cell Science , 115 (3), 553-562.
[28] Shen, X., Pan, W. and Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation., Journal of the American Statistical Association , 107 (497), 223-232. · Zbl 1261.62020
[29] Snuderl, M., Triscott, J., Northcott, P. A., Shih, H. A., Kong, E., Robinson, H., Dunn, S. E., Iafrate, A. J. and Yip, S. (2015). Deep sequencing identifies IDH1 R132S mutation in adult medulloblastoma., Journal of Clinical Oncology , 33 (6), 27-31.
[30] Telesca, D., Müller, P., Kornblau, S. M., Suchard, M. A. and Ji, Y., 2012. (2012). Modeling protein expression and protein signaling pathways., Journal of the American Statistical Association , 107 (500), 1372-1384. · Zbl 1258.62110
[31] Thalamuthu, A., Mukhopadhyay, I., Zheng, X., and Tseng, G. C. (2006). Evaluation and comparison of gene clustering methods in microarray analysis., Bioinformatics , 22 (19), 2405-2412.
[32] Turkalp, Z., Karamchandani, J. and Das, S. (2014). IDH mutation in glioma: new insights and promises for the future., JAMA neurology . 71 (10), 1319-1325.
[33] Verhaak, R. G., Hoadley, K. A., Purdom, E., Wang, V., Qi, Y., Wilkerson, M. D., Miller, C. R., Ding, L., Golub, T., Mesirov, J. P., et al. (2010). Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA , IDH1 , EGFR , and NF1 . Cancer Cell , 17 (1), 98-110.
[34] Wang, S. and Zhu, J. (2008). Variable selection for model-based high-dimensional clustering and its application to microarray data., Biometrics , 64 , 440-448. · Zbl 1137.62041
[35] Wu M-Y, Dai D-Q, Zhang X-F, Zhu Y (2013). Cancer subtype discovery and biomarker identification via a new robust network clustering algorithm., PLoS ONE , 8 (6), e66256.
[36] Xie, B., Pan, W., Shen, X. (2008). Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables., Electronic Journal of Statistics , 2 , 168-212. · Zbl 1135.62055
[37] Zhou, H., Pan, W., and Shen, X. (2009). Penalized model-based clustering with unconstrained covariance matrices., Electronic Journal of Statistics , 3 , 1473. · Zbl 1326.62143
[38] Zhu, Y., Shen, X., and Pan, W. (2014). Structural pursuit over multiple undirected graphs., Journal of the American Statistical Association , 109 , 1683-1696. · Zbl 1368.62181
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.