##
**Consistency of community detection in networks under degree-corrected stochastic block models.**
*(English)*
Zbl 1257.62095

Ann. Stat. 40, No. 4, 2266-2292 (2012); correction ibid. 43, No. 1, 462-466 (2015).

Summary: Community detection is a fundamental problem in network analysis, with applications in many diverse areas. The stochastic block model is a common tool for model-based community detection, and asymptotic tools for checking consistency of community detection under the block model have been recently developed. However, the block model is limited by its assumption that all nodes within a community are stochastically equivalent, and provides a poor fit to networks with hubs or highly varying node degrees within communities, which are common in practice. The degree-corrected stochastic block model was proposed to address this shortcoming and allows variation in node degrees within a community while preserving the overall block community structure. In this paper we establish general theory for checking consistency of community detection under the degree-corrected stochastic block model and compare several community detection criteria under both the standard and the degree-corrected models. We show which criteria are consistent under which models and constraints, as well as compare their relative performance in practice. We find that methods based on the degree-corrected block model, which includes the standard block model as a special case, are consistent under a wider class of models and that modularity-type methods require parameter constraints for consistency, whereas likelihood-based methods do not. On the other hand, in practice, the degree correction involves estimating many more parameters, and empirically we find it is only worth doing if the node degrees within communities are indeed highly variable. We illustrate the methods on simulated networks and on a network of political blogs.

### MSC:

62M45 | Neural nets and related approaches to inference from stochastic processes |

62G20 | Asymptotic properties of nonparametric inference |

65C60 | Computational problems in statistics (MSC2010) |

### References:

[1] | Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 US Election: Divided they blog. In Proceedings of the 3 rd International Workshop on Link Discovery 36-43. ACM, New York. |

[2] | Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981-2014. · Zbl 1225.68143 |

[3] | Airoldi, E. M. and Choi, D. (2011). Summary of proof in “A nonparametric view of network models and Newman-Girvan and other modularities.” Personal communication. |

[4] | Beasley, J. E. (1998). Heuristic algorithms for the unconstrained binary quadratic programming problem. Technical report, Management School, Imperial College, London, UK. |

[5] | Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068-21073. · Zbl 1359.62411 |

[6] | Bickel, P. J. and Chen, A. (2012). Weak consistency of community detection criteria under the stochastic block model. Unpublished manuscript. |

[7] | Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with growing number of classes. Biometrika 99 273-284. · Zbl 1318.62207 · doi:10.1093/biomet/asr053 |

[8] | Coja-Oghlan, A. and Lanka, A. (2010). Finding planted partitions in random graphs with general degree distributions. SIAM J. Discrete Math. 23 1682-1714. · Zbl 1207.05178 · doi:10.1137/070699354 |

[9] | Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal Complex Systems 1695. |

[10] | Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2012). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84 066106. |

[11] | Fortunato, S. (2010). Community detection in graphs. Phys. Rep. 486 75-174. · doi:10.1016/j.physrep.2009.11.002 |

[12] | Fruchterman, T. M. J. and Reingold, E. M. (1991). Graph drawing by force-directed placement. Software : Practice and Experience 21 1129-1164. |

[13] | Getoor, L. and Diehl, C. P. (2005). Link mining: A survey. ACM SIGKDD Explorations Newsletter 7 3-12. |

[14] | Glover, F. W. and Lagunas, M. (1997). Tabu Search . Kluwer Academic, Norwell. · Zbl 0930.90083 |

[15] | Goldenberg, A., Zheng, A. X., Fienberg, S. E. and Airoldi, E. M. (2010). A survey of statistical network models. Foundations and Trends in Machine Learning 2 129-233. · Zbl 1184.68030 · doi:10.1561/2200000005 |

[16] | Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering for social networks. J. Roy. Statist. Soc. Ser. A 170 301-354. · doi:10.1111/j.1467-985X.2007.00471.x |

[17] | Hoff, P. D. (2007). Modeling homophily and stochastic equivalence in symmetric relational data. In Advances in Neural Information Processing Systems , 19 MIT Press, Cambridge, MA. |

[18] | Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109-137. · doi:10.1016/0378-8733(83)90021-7 |

[19] | Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 2 193-218. · Zbl 0587.62128 |

[20] | Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107. · doi:10.1103/PhysRevE.83.016107 |

[21] | McCulloch, C. E. and Searle, S. R. (2001). Generalized , Linear , and Mixed Models . Wiley-Interscience, New York. · Zbl 0964.62061 |

[22] | Newman, M. E. J. (2004). Detecting community structure in networks. Eur. Phys. J. B 38 321-330. |

[23] | Newman, M. E. J. (2006). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103 8577-8582. |

[24] | Newman, M. E. J. (2006). Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E (3) 74 036104, 19. · doi:10.1103/PhysRevE.74.036104 |

[25] | Newman, M. E. J. (2010). Networks : An Introduction . Oxford Univ. Press, Oxford. · Zbl 1195.94003 |

[26] | Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E 69 026113. · Zbl 1032.91716 |

[27] | Newman, M. E. J. and Leicht, E. A. (2007). Mixture models and exploratory analysis in networks. Proc. Natl. Acad. Sci. USA 104 9564-9569. · Zbl 1155.91026 · doi:10.1073/pnas.0610537104 |

[28] | Ng, A., Jordan, M. and Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Neural Information Processing Systems 14 (T. Dietterich, S. Becker and Z. Ghahramani, eds.) 849-856. MIT Press, Cambridge. |

[29] | Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077-1087. · Zbl 1072.62542 · doi:10.1198/016214501753208735 |

[30] | Perry, P. O. and Wolfe, P. J. (2012). Null models for network data. Available at . |

[31] | Robins, G., Snijders, T., Wang, P., Handcock, M. and Pattison, P. (2007). Recent developments in exponential random graphs models (\(p^\ast\)) for social networks. Social Networks 29 192-215. |

[32] | Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878-1915. · Zbl 1227.62042 · doi:10.1214/11-AOS887 |

[33] | Schlitt, T. and Brazma, A. (2007). Current approaches to gene regulatory network modelling. BMC Bioinformatics 8 S9. Suppl 6. |

[34] | Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence 22 888-905. |

[35] | Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75-100. · Zbl 0896.62063 · doi:10.1007/s003579900004 |

[36] | Wang, Y. J. and Wong, G. Y. (1987). Stochastic blockmodels for directed graphs. J. Amer. Statist. Assoc. 82 8-19. · Zbl 0613.62146 · doi:10.2307/2289119 |

[37] | Wasserman, S. and Faust, K. (1994). Social Network Analysis : Methods and Applications ( Structural Analysis in the Social Sciences ). Cambridge Univ. Press, Cambridge. · Zbl 0926.91066 |

[38] | Wei, Y. C. and Cheng, C. K. (1989). Toward efficient hierarchical designs by ratio cut partitioning. In Proceedings of the IEEE International Conference on Computer Aided Design 298-301. IEEE, New York. |

[39] | Zhang, S. and Zhao, H. (2012). Community identification in networks with unbalanced structure. Phys. Rev. E 85 066114. |

[40] | Zhao, Y., Levina, E. and Zhu, J. (2011). Community extraction for social networks. Proc. Natl. Acad. Sci. USA 108 7321-7326. |

[41] | Zhao, Y., Levina, E. and Zhu, J. (2012). Supplement to “Consistency of community detection in networks under degree-corrected stochastic block models.” . · Zbl 1257.62095 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.