## High-dimensional Ising model selection using $$\ell _{1}$$-regularized logistic regression.(English)Zbl 1189.62115

Summary: We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on $$\ell _{1}$$-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an $$\ell_{1}$$-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes $$p$$ and maximum neighborhood size $$d$$ are allowed to grow as a function of the number of observations $$n$$. Our main results provide sufficient conditions on the triple $$(n, p, d)$$ and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously.
With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes $$n=\Omega (d^{3} \log p)$$ with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of $$n=\Omega (d^{2} \log p)$$ suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

### MSC:

 62J12 Generalized linear models (logistic models) 62F12 Asymptotic properties of parametric estimators 05C90 Applications of graph theory 68T99 Artificial intelligence 62M40 Random fields; image analysis

### Software:

TETRAD; pcalg; spatial
Full Text:

### References:

 [1] Abbeel, P., Koller, D. and Ng, A. Y. (2006). Learning factor graphs in polynomial time and sample complexity. J. Mach. Learn. Res. 7 1743-1788. · Zbl 1222.68128 [2] Banerjee, O., Ghaoui, L. E. and d’Asprémont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485-516. · Zbl 1225.68149 [3] Bertsekas, D. (1995). Nonlinear Programming . Athena Scientific, Belmont, MA. · Zbl 0935.90037 [4] Bresler, G., Mossel, E. and Sly, A. (2009). Reconstruction of Markov random fields from samples: Some easy observations and algorithms. Available at http://front.math.ucdavis.edu/0712.1402. · Zbl 1271.68239 [5] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313-2351. · Zbl 1139.62019 [6] Chickering, D. (1995). Learning Bayesian networks is NP-complete. In Learning from Data: Artificial Intelligence and Statistics V (D. Fisher and H. Lenz, eds.). Lecture Notes in Statistics 112 121-130. Springer, New York. [7] Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14 462-467. · Zbl 0165.22305 [8] Cross, G. and Jain, A. (1983). Markov random field texture models. IEEE Trans. PAMI 5 25-39. [9] Csiszár, I. and Talata, Z. (2006). Consistent estimation of the basic neighborhood structure of Markov random fields. Ann. Statist. 34 123-145. · Zbl 1102.62105 [10] Dasgupta, S. (1999). Learning polytrees. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI-99) . Morgan Kaufmann, San Francisco, CA. [11] Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices, and Banach spaces. In Handbook of the Geometry of Banach Spaces 1 317-336. Elsevier, Amsterdam. · Zbl 1067.46008 [12] Donoho, D. and Elad, M. (2003). Maximal sparsity representation via \ell 1 minimization. Proc. Natl. Acad. Sci. USA 100 2197-2202. · Zbl 1064.94011 [13] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI 6 721-741. · Zbl 0573.62030 [14] Hassner, M. and Sklansky, J. (1980). The use of Markov random fields as models of texture. Comp. Graphics Image Proc. 12 357-370. [15] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-30. JSTOR: · Zbl 0127.10602 [16] Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis . Cambridge Univ. Press, Cambridge. · Zbl 0576.15001 [17] Ising, E. (1925). Beitrag zur theorie der ferromagnetismus. Zeitschrift für Physik 31 253-258. [18] Kalisch, M. and Buhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J. Mach. Learn. Res. 8 613-636. · Zbl 1222.68229 [19] Kim, Y., Kim, J. and Kim, Y. (2005). Blockwise sparse regression. Statist. Sinica 16 375-390. · Zbl 1096.62076 [20] Koh, K., Kim, S. J. and Boyd, S. (2007). An interior-point method for large-scale \ell 1 -regularized logistic regression. J. Mach. Learn. Res. 3 1519-1555. · Zbl 1222.62092 [21] Manning, C. D. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing . MIT Press, Cambridge, MA. · Zbl 0951.68158 [22] Meier, L., van de Geer, S. and Bühlmann, P. (2007). The group lasso for logistic regression. Technical report, Mathematics Dept., Swiss Federal Institute of Technology Zürich. · Zbl 1400.62276 [23] Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 [24] Ng, A. Y. (2004). Feature selection, \ell 1 vs. \ell 2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML-04) . Morgan Kaufmann, San Francisco, CA. [25] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2008). Union support recovery in high-dimensional multivariate regression. Technical report, Dept. Statistics, Univ. California, Berkeley. · Zbl 1373.62372 [26] Ripley, B. D. (1981). Spatial Statistics . Wiley, New York. · Zbl 0583.62087 [27] Rockafellar, G. (1970). Convex Analysis . Princeton Univ. Press, Princeton. · Zbl 0193.18401 [28] Rothman, A., Bickel, P., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494-515. · Zbl 1320.62135 [29] Santhanam, N. P. and Wainwright, M. J. (2008). Information-theoretic limits of high-dimensional graphical model selection. In International Symposium on Information Theory . Toronto, Canada. [30] Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction and Search . MIT Press, Cambridge, MA. · Zbl 0806.62001 [31] Srebro, N. (2003). Maximum likelihood bounded tree-width Markov networks. Artificial Intelligence 143 123-138. · Zbl 1011.68066 [32] Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals. IEEE Trans. Inform. Theory 51 1030-1051. · Zbl 1288.94025 [33] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using \ell 1 -constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183-2202. · Zbl 1367.62220 [34] Wainwright, M. J. and Jordan, M. I. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Dept. Statistics, Univ. California, Berkeley. · Zbl 1193.62107 [35] Wainwright, M. J., Ravikumar, P. and Lafferty, J. D. (2007). High-dimensional graphical model selection using \ell 1 -regularized logistic regression. In Advances in Neural Information Processing Systems (B. Schölkopf, J. Platt and T. Hoffman, eds.) 19 1465-1472. MIT Press, Cambridge, MA. [36] Welsh, D. J. A. (1993). Complexity: Knots, Colourings, and Counting . Cambridge Univ. Press, Cambridge. · Zbl 0799.68008 [37] Woods, J. (1978). Markov image modeling. IEEE Trans. Automat. Control 23 846-850. [38] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030 [39] Zhao, P. and Yu, B. (2007). On model selection consistency of lasso. J. Mach. Learn. Res. 7 2541-2567. · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.