×

Exact recovery in the Ising blockmodel. (English) Zbl 1420.62268

Summary: We consider the problem associated to recovering the block structure of an Ising model given independent observations on the binary hypercube. This new model, called the Ising blockmodel, is a perturbation of the mean field approximation of the Ising model known as the Curie-Weiss model: the sites are partitioned into two blocks of equal size and the interaction between those of the same block is stronger than across blocks, to account for more order within each block. We study probabilistic, statistical and computational aspects of this model in the high-dimensional case when the number of sites may be much larger than the sample size.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
82B20 Lattice systems (Ising, dimer, Potts, etc.) and systems on graphs arising in equilibrium statistical mechanics
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Abbe, E. (2017). Community detection and stochastic block models: Recent developments. Preprint. Available at arXiv:1703.10146. · Zbl 1403.62110
[2] Abbe, E., Bandeira, A. S. and Hall, G. (2016). Exact recovery in the stochastic block model. IEEE Trans. Inform. Theory62 471-487. · Zbl 1359.94047 · doi:10.1109/TIT.2015.2490670
[3] Abbe, E. and Sandon, C. (2015). Detection in the stochastic block model with multiple clusters: Proof of the achievability conjectures, acyclic BP, and the information-computation gap. Preprint. Available at arXiv:1512.09080.
[4] Abbe, E. and Sandon, C. (2016a). Achieving the ks threshold in the general stochastic block model with linearized acyclic belief propagation. In Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon and R. Garnett, eds.) 1334-1342. Curran Associates, Inc., New York.
[5] Abbe, E. and Sandon, C. (2016b). Crossing the KS threshold in the stochastic block model with information theory. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT) 840-844. IEEE, New York.
[6] Alon, N., Krivelevich, M. and Sudakov, B. (1998). Finding a large hidden clique in a random graph. In Proceedings of the 1998 ACM-SIAM Symposium on Discrete Algorithms 594-598. SIAM, Philadelphia, PA. · Zbl 0930.68109
[7] Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res.9 485-516. · Zbl 1225.68149
[8] Banks, J., Moore, C., Neeman, J. and Netrapalli, P. (2016). Information-theoretic thresholds for community detection in sparse networks. Preprint. Available at arXiv:1601.02658.
[9] Berthet, Q., Rigollet, P. and Srivastava, P. (2019). Supplement to “Exact recovery in the Ising blockmodel.” DOI:10.1214/17-AOS1620SUPP. · Zbl 1420.62268
[10] Besag, J. (1986). On the statistical analysis of dirty pictures. J. Roy. Statist. Soc. Ser. B48 259-302. · Zbl 0609.62150 · doi:10.1111/j.2517-6161.1986.tb01412.x
[11] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge. · Zbl 1058.90049
[12] Bresler, G. (2015). Efficiently learning Ising models on arbitrary graphs [extended abstract]. In Proceedings of the 2015 ACM Symposium on Theory of Computing 771-782. ACM, New York. · Zbl 1321.68397
[13] Bresler, G., Gamarnik, D. and Shah, D. (2014). Learning graphical models from the glauber dynamics. Preprint. Available at arXiv:1410.7659. · Zbl 1395.62254 · doi:10.1109/TIT.2017.2713828
[14] Bresler, G., Mossel, E. and Sly, A. (2008). Reconstruction of Markov random fields from samples: Some observations and algorithms. In Approximation, Randomization and Combinatorial Optimization. Lecture Notes in Computer Science5171 343-356. Springer, Berlin. · Zbl 1159.68636
[15] Bunea, F., Giraud, C. and Luo, X. (2015). Minimax optimal variable clustering in \(G\)-models via Cord. Preprint. Available at arXiv:1508.01939.
[16] Bunea, F., Giraud, C., Royer, M. and Verzelen, N. (2016). PECOK: A convex optimization approach to variable clustering. Preprint. Available at arXiv:1606.05100.
[17] Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E (3) 84 066106.
[18] Diaconis, P., Goel, S. and Holmes, S. (2008). Horseshoes in multidimensional scaling and local kernel methods. Ann. Appl. Stat.2 777-807. · Zbl 1149.62316 · doi:10.1214/08-AOAS165
[19] Dyer, M. E. and Frieze, A. M. (1989). The solution of some random NP-hard problems in polynomial expected time. J. Algorithms10 451-489. · Zbl 0689.68049 · doi:10.1016/0196-6774(89)90001-1
[20] Fedele, M. and Unguendoli, F. (2012). Rigorous results on the bipartite mean-field model. J. Phys. A45 385001. · Zbl 1260.82087 · doi:10.1088/1751-8113/45/38/385001
[21] Feige, U. and Krauthgamer, R. (2002). A polylogarithmic approximation of the minimum bisection. SIAM J. Comput.31 1090-1118 (electronic). · Zbl 1015.68240 · doi:10.1137/S0097539701387660
[22] Gao, C., Ma, Z., Zhang, A. Y. and Zhou, H. H. (2015). Achieving optimal misclassification proportion in stochastic block model. Preprint. Available at arXiv:1505.03772. · Zbl 1440.62244
[23] Gao, C., Ma, Z., Zhang, A. Y. and Zhou, H. H. (2016). Community detection in degree-corrected block models. Preprint. Available at arXiv:1607.06993. · Zbl 1408.62116 · doi:10.1214/17-AOS1615
[24] Garey, M. R., Johnson, D. S. and Stockmeyer, L. (1976). Some simplified NP-complete graph problems. Theoret. Comput. Sci.1 237-267. · Zbl 0338.05120 · doi:10.1016/0304-3975(76)90059-1
[25] Goemans, M. X. and Williamson, D. P. (1995). Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. Assoc. Comput. Mach.42 1115-1145. · Zbl 0885.68088 · doi:10.1145/227683.227684
[26] Hajek, B., Wu, Y. and Xu, J. (2016). Achieving exact cluster recovery threshold via semidefinite programming. IEEE Trans. Inform. Theory62 2788-2797. · Zbl 1359.94222 · doi:10.1109/TIT.2016.2546280
[27] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw.5 109-137.
[28] Ising, E. (1925). Beitrag zur Theorie des Ferromagnetismus. Z. Phys.31 253-258. · Zbl 1439.82056
[29] Laurent, M. and Poljak, S. (1996). On the facial structure of the set of correlation matrices. SIAM J. Matrix Anal. Appl.17 530-547. · Zbl 0855.15011 · doi:10.1137/0617031
[30] Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series17. Oxford Univ. Press, New York.
[31] Lauritzen, S. L. and Sheehan, N. A. (2003). Graphical models for genetic analyses. Statist. Sci.18 489-514. · Zbl 1055.62126 · doi:10.1214/ss/1081443232
[32] Lesieur, T., Krzakala, F. and Zdeborová, L. (2017). Constrained low-rank matrix estimation: Phase transitions, approximate message passing and applications. J. Stat. Mech. Theory Exp.2017 073403. · Zbl 1462.62324
[33] Li, L., Lu, P. and Yin, Y. (2012). Correlation decay up to uniqueness in spin systems. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms 67-84. SIAM, Philadelphia, PA. · Zbl 1422.68302
[34] Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA. · Zbl 0951.68158
[35] Massoulié, L. (2014). Community detection thresholds and the weak Ramanujan property. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing. ACM, New York. · Zbl 1315.68210
[36] McSherry, F. (2001). Spectral partitioning of random graphs. In 42nd IEEE Symposium on Foundations of Computer Science (Las Vegas, NV, 2001) 529-537. IEEE Computer Soc., Los Alamitos, CA.
[37] Moitra, A., Perry, W. and Wein, A. S. (2016). How robust are reconstruction thresholds for community detection? In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing 828-841. · Zbl 1381.62190
[38] Montanari, A. and Saberi, A. (2010). The spread of innovations in social networks. Proc. Natl. Acad. Sci. USA107 20196-20201.
[39] Mossel, E., Neeman, J. and Sly, A. (2013). A proof of the block model threshold conjecture. Preprint. Available at arXiv:1311.4115. · Zbl 1424.05272 · doi:10.1007/s00493-016-3238-8
[40] Mossel, E., Neeman, J. and Sly, A. (2015). Reconstruction and estimation in the planted partition model. Probab. Theory Related Fields162 431-461. · Zbl 1320.05113 · doi:10.1007/s00440-014-0576-6
[41] Mossel, E., Neeman, J. and Sly, A. (2016). Belief propagation, robust reconstruction and optimal recovery of block models. Ann. Appl. Probab.26 2211-2256. · Zbl 1350.05154 · doi:10.1214/15-AAP1145
[42] Mukherjee, R., Mukherjee, S. and Yuan, M. (2016). Global testing against sparse alternatives under Ising models. Preprint. Available at arXiv:1611.08293. · Zbl 1407.62160 · doi:10.1214/17-AOS1612
[43] Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using \(\ell_1\)-regularized logistic regression. Ann. Statist.38 1287-1319. · Zbl 1189.62115 · doi:10.1214/09-AOS691
[44] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist.39 1878-1915. · Zbl 1227.62042 · doi:10.1214/11-AOS887
[45] Schneidman, E., Berry, M. J., Segev, R. and Bialek, W. (2006). Weak pairwise correlations imply strongly correlated network states in a neural population. Nature440 1007-1012.
[46] Sebastiani, P., Ramoni, M. F., Nolan, V., Baldwin, C. T. and Steinberg, M. H. (2005). Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia. Nat. Genet.37 435-440.
[47] Sinclair, A., Srivastava, P. and Thurley, M. (2014). Approximation algorithms for two-state anti-ferromagnetic spin systems on bounded degree graphs. J. Stat. Phys.155 666-686. · Zbl 1297.82009 · doi:10.1007/s10955-014-0947-5
[48] Sly, A. and Sun, N. (2014). Counting in two-spin models on \(d\)-regular graphs. Ann. Probab.42 2383-2416. · Zbl 1311.60117 · doi:10.1214/13-AOP888
[49] Tropp, J. A. (2015). An introduction to matrix concentration inequalities. Found. Trends Mach. Learn.8 1-230. · Zbl 1391.15071 · doi:10.1561/2200000048
[50] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York. Revised and extended from the 2004 French original, Translated by Vladimir Zaiats. · Zbl 1029.62034
[51] Wang, T., Berthet, Q. and Samworth, R. J. (2016). Statistical and computational trade-offs in estimation of sparse principal components. Ann. Statist.44 1896-1930. · Zbl 1349.62254 · doi:10.1214/15-AOS1369
[52] Weitz, D. (2006). Counting independent sets up to the tree threshold. In Proceedings of the 2006 ACM Symposium on the Theory of Computing 140-149. ACM, New York. · Zbl 1301.68276
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.