×

Empirical comparison study of approximate methods for structure selection in binary graphical models. (English) Zbl 1441.62521

Summary: Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine, and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. In the binary case, however, exact inference is generally very slow or even intractable because of the form of the so-called log-partition function. In this paper, we review various approximate methods for structure selection in binary graphical models that have recently been proposed in the literature and compare them through an extensive simulation study. We also propose a modification of one existing method, that is shown to achieve good performance and to be generally very fast. We conclude with an application in which we search for associations among causes of death recorded on French death certificates.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

MIM; glasso; glmnet
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Agresti, Categorical Data Analysis (1990) · Zbl 0716.62001
[2] Anandkumar, High-dimensional structure estimation in Ising models: local separation criterion, The Annals of Statistics 40 pp 1346– (2012) · Zbl 1297.62124 · doi:10.1214/12-AOS1009
[3] Banerjee, Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data, Journal of Machine Learning Research 9 pp 485– (2008) · Zbl 1225.68149
[4] Besag, Statistical analysis of non-lattice data, The Statistician 24 pp 179– (1975) · doi:10.2307/2987782
[5] Cox, A note on the quadratic exponential binary distribution, Biometrika 81 pp 403– (1994) · Zbl 0825.62363 · doi:10.1093/biomet/81.2.403
[6] Dahinden, Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries, BMC Bioinformatics 8 pp 476– (2007) · Zbl 05326119 · doi:10.1186/1471-2105-8-476
[7] Dempster, Covariance selection, Biometrika 32 pp 95– (1972)
[8] Edwards, Introduction to Graphical Modelling (2000) · Zbl 0952.62003 · doi:10.1007/978-1-4612-0493-0
[9] Efron, Least angle regression, The Annals of Statistics 32 pp 407– (2004) · Zbl 1091.62054 · doi:10.1214/009053604000000067
[10] Ghaoui, Safe feature elimination for the LASSO and sparse supervised learning problems, Journal of Pacific Optimization 8 pp 667– (2012)
[11] EUROSTAT, Health Statistics. Atlas on Mortality in the European Union (data 1994-96) (2002)
[12] Friedman, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 pp 432– (2008a) · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[13] Friedman , J. Hastie , T. Tibshirani , R. 2008b Regularization paths for generalized linear models via coordinate descent Available at http://www-stat.stanford.edu/ hastie/Papers/glmnet.pdf
[14] Friedman , J. Hastie , T. Tibshirani , R. 2010 Applications of the lasso and grouped lasso to the estimation of sparse graphical models http://www-stat.stanford.edu/ tibs/ftp/ggraph.pdf
[15] Gao, Tuning parameter selection for penalized likelihood estimation of inverse covariance matrix, Statistica Sinica 22 pp 1123– (2009)
[16] Genkin, Bayesian Logistic Regression for text categorization, Technometrics 49 pp 291– (2007) · doi:10.1198/004017007000000245
[17] Höfling, Estimation of sparse binary pairwise Markov networks using Pseudo-likelihoods, Journal of Machine Learning Research 10 pp 883– (2009) · Zbl 1245.62121
[18] Koh, An interior-point method for large-scale 1-regularized logistic regression, Journal of Machine Learning Research 8 pp 1519– (2007) · Zbl 1222.62092
[19] Lauritzen, Graphical Models (1996) · Zbl 0907.62001
[20] Lee, Efficient structure learning of Markov networks using L1- regularization, Advances in Neural Information Processing Systems 19 pp 817– (2007)
[21] McCullagh, Generalized Linear Models (2nd edn.) (1989) · Zbl 0744.62098 · doi:10.1007/978-1-4899-3242-6
[22] Meinshausen, Relaxed lasso, Computational Statistics and Data Analysis 52 pp 374– (2007) · Zbl 1452.62522 · doi:10.1016/j.csda.2006.12.019
[23] Meinshausen, High-dimensional graphs with the lasso, The Annals of Statistics 34 pp 1436– (2006) · Zbl 1113.62082 · doi:10.1214/009053606000000281
[24] Parise, Bayesian model scoring in Markov random fields, Neural Information Processing Systems 19 pp 1073– (2006)
[25] Ravikumar, High-dimensional Ising model selection using 1-regularized logistic regression, The Annals of Statistics 38 pp 1287– (2006) · Zbl 1189.62115 · doi:10.1214/09-AOS691
[26] Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society B 58 pp 267– (1996) · Zbl 0850.62538
[27] Tibshirani, Strong rules for discarding predictors in Lasso-type problems, Journal of the Royal Statistical Society, B 74 pp 245– (2012) · doi:10.1111/j.1467-9868.2011.01004.x
[28] Wainwright , M. Jaakkola , T. Willsky , A. 2003 Tree-reweighted belief propagation algorithms and approximate ML estimation via pseudomoment matching AISTATS
[29] Wainwright, Log-determinant relaxation for approximate inference in discrete Markov random fields, IEEE Transactions on Signal Processing 54 pp 2099– (2006) · Zbl 1374.94616 · doi:10.1109/TSP.2006.874409
[30] Wainwright, Graphical models, exponential families, and variational inference, Foundations and Trends in Machine Learning 1 (1-2) pp 1– (2008) · Zbl 1193.62107
[31] Wainwright , M. Ravikumar , P. Lafferty , J. 2006 High-dimensional graphical model selection using 1 -regularized logistic regression Proceedings of Advances in Neural Information Processing Systems 1465 1472
[32] Wang, Learning networks from high dimensional binary data: an application to genomic instability data, Biometrics 67 pp 164– (2011) · Zbl 1216.62180 · doi:10.1111/j.1541-0420.2010.01417.x
[33] Whittaker, Graphical Models in Applied Multivariate Statistics (2000) · Zbl 1151.62053
[34] World Health Organization, International Classification of Diseases (1994)
[35] Yang , E. Ravikumar , P. 2011 On the use of variational inference for learning discrete graphical models Proceedings of the Twenty-Eighth International Conference on Machine Learning
[36] Yedidia , J. Freeman , W. Weiss , Y. 2001 Generalized belief propagation Proceedings of Advances in Neural Information Processing Systems 689 695
[37] Yuan, Model selection and estimation in the Gaussian graphical model, Biometrika 94 pp 19– (2007) · Zbl 1142.62408 · doi:10.1093/biomet/asm018
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.