zbMATH — the first resource for mathematics

Graphical model selection for Gaussian conditional random fields in the presence of latent variables. (English) Zbl 1420.62244
Summary: We consider the problem of learning a conditional Gaussian graphical model in the presence of latent variables. Building on recent advances in this field, we suggest a method that decomposes the parameters of a conditional Markov random field into the sum of a sparse and a low-rank matrix. We derive convergence bounds for this estimator and show that it is well-behaved in the high-dimensional regime as well as “sparsistent” (i.e., capable of recovering the graph structure). We then show how proximal gradient algorithms and semi-definite programming techniques can be employed to fit the model to thousands of variables. Through extensive simulations, we illustrate the conditions required for identifiability and show that there is a wide range of situations in which this model performs significantly better than its counterparts, for example, by accommodating more latent variables. Finally, the suggested method is applied to two datasets comprising individual level data on genetic variants and metabolites levels. We show our results replicate better than alternative approaches and show enriched biological signal.

62H12 Estimation in multivariate analysis
62H99 Multivariate analysis
62P10 Applications of statistics to biology and medical sciences; meta analysis
glasso; SDPT3
Full Text: DOI
[1] 2008Consistency of Trace Norm MinimizationJournal of Machine Learning Research810191048
[2] Bai, Z.; Silverstein, J., Spectral Analysis of Large Dimensional Random Matrices, (2009), New York: Springer, New York
[3] Banerjee, O.; Ghaoui, L.; d’Aspremont, A., Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data, Journal of Machine Learning Research, 9, 485-516, (2008) · Zbl 1225.68149
[4] Boyd, A. et al., Cohort Profile: The Children of the 90s–the Index Offspring of the Avon Longitudinal Study of Parents and Children, International Journal of Epidemiology, 42, 111-127, (2012)
[5] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J., Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Foundations and Trends in Machine Learning, 3, 1-122, (2010) · Zbl 1229.90122
[6] Candès, E. J.; Li, X.; Ma, Y.; Wright, J., Robust Principal Component Analysis?, Journal of ACM, 58, 1-37, (2011) · Zbl 1327.62369
[7] Chandrasekaran, V.; Parrilo, P. A.; Willsky, A. S., Latent Variable Graphical Model Selection via Convex Optimization, The Annals of Statistics, 40, 1935-1967, (2012) · Zbl 1257.62061
[8] Chandrasekaran, V.; Recht, B.; Parrilo, P. A.; Willsky, A. S., The Convex Geometry of Linear Inverse Problems, Foundations of Computational Mathematics, 12, 805-849, (2012) · Zbl 1280.52008
[9] Chandrasekaran, V.; Sanghavi, S.; Parrilo, P. A.; Willsky, A. S., Rank-sparsity Incoherence for Matrix Decomposition, SIAM Journal on Optimization, 21, 572-596, (2009) · Zbl 1226.90067
[10] Eckstein, J.; Bertsekas, D. P., On the Douglas—Rachford Splitting Method and the Proximal Point Algorithm for Maximal Monotone Operators, Mathematical Programming, 55, 293-318, (1992) · Zbl 0765.90073
[11] Fraser, A. et al., Cohort Profile: The Avon Longitudinal Study of Parents and Children: ALSPAC Mothers Cohort, International Journal of Epidemiology, 42, 97-110, (2012)
[12] Friedman, J.; Hastie, T.; Tibshirani, R., Sparse Inverse Covariance Estimation with the Graphical Lasso, Biostatistics, 9, 432-41, (2008) · Zbl 1143.62076
[13] Goldstein, T.; Osher, S., The Split Bregman Method for l1-regularized Problems, SIAM Journal on Imaging Sciences, 2, 323-343, (2009) · Zbl 1177.65088
[14] Hastings, J. et al., The ChEBI Reference Database and Ontology for Biologically Relevant Chemistry: Enhancements for 2013, Nucleic Acids Research, 41, D456-D463, (2012)
[15] Lauritzen, S., Graphical Models, (1996), Oxford, UK: Clarendon Press, Oxford, UK
[16] Li, X.; Xie, H.; Chen, L.; Wang, J.; Deng, X., News Impact on Stock Price Return via Sentiment Analysis, Knowledge-Based Systems, 69, 14-23, (2014)
[17] 2004Yalmip : A Toolbox for Modeling and Optimization in Matlabin ‘Proceedings of the CACSD Conference’284289
[18] Ma, S.; Xue, L.; Zou, H., Alternating Direction Methods for Latent Variable Gaussian Graphical Model Selection, Neural Computation, 25, 2172-2198, (2013) · Zbl 1418.62234
[19] Meinshausen, N.; Bühlmann, P., Stability Selection, Journal of the Royal Statistical Society, Series B, 72, 417-473, (2010) · Zbl 1411.62142
[20] 2011High-dimensional Covariance Estimation by Minimizing l1-penalized Log-determinant DivergenceElectronic Journal of Statistics5
[21] Shah, R. D.; Samworth, R. J., Variable Selection with Error Control: Another look at Stability Selection, Journal of the Royal Statistical Society, Series B, 75, 55-80, (2013)
[22] 2012Joint Estimation of Structured Sparsity and Output Structure in Multiple-Output Regression via Inverse-Covariance RegularizationConference on Artificial Intelligence and Statistics
[23] Stearns, F. W., One Hundred Years of Pleiotropy: A Retrospective, Genetics, 186, 767-773, (2010)
[24] Tütüncü, R. H.; Toh, K. C.; Todd, M. J., Solving Semidefinite-Quadratic-linear Programs using Sdpt3, Mathematical Programming, Series B, 95, 189-217, (2003) · Zbl 1030.90082
[25] Vandenberghe, L.; Boyd, S., Semidefinite Programming, SIAM Review, 38, 49-95, (1996) · Zbl 0845.65023
[26] Wang, C.; Sun, D.; Toh, K.-C., Solving Log-determinant Optimization Problems by a Newton-cg Primal Proximal Point Algorithm, SIAM Journal on Optimization, 20, 2994-3013, (2010) · Zbl 1211.90130
[27] 2011NOA: A Novel Network Ontology Analysis MethodNucleic Acids Research39
[28] 2013Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to Energy ForecastingProceedings of the 2013 International Conference on Machine Learning12651273
[29] 2011Efficient Latent Variable Graphical Model Selection via Split Bregman Methodavailable on the arXiv at http://arxiv.org/pdf/1110.3076v1.pdf
[30] Yin, J.; Li, H., A Sparse Conditional Gaussian Graphical Model for Analysis of Genetical Genomics Data, The Annals of Applied Statistics, 5, 2630-2650, (2011) · Zbl 1234.62151
[31] Yuan, M.; Lin, Y., Model Selection and Estimation in the Gaussian Graphical Model, Biometrika, 94, 19-35, (2007) · Zbl 1142.62408
[32] 2014Learning Gene Networks under snp Perturbations using Eqtl DatasetsPLoS Computational Biology120
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.