##
**Bayesian degree-corrected stochastic blockmodels for community detection.**
*(English)*
Zbl 1395.62370

Summary: Community detection in networks has drawn much attention in diverse fields, especially social sciences. Given its significance, there has been a large body of literature with approaches from many fields. Here we present a statistical framework that is representative, extensible, and that yields an estimator with good properties. Our proposed approach considers a stochastic blockmodel based on a logistic regression formulation with node correction terms. We follow a Bayesian approach that explicitly captures the community behavior via prior specification. We further adopt a data augmentation strategy with latent Pólya-Gamma variables to obtain posterior samples. We conduct inference based on a principled, canonically mapped centroid estimator that formally addresses label non-identifiability and captures representative community assignments. We demonstrate the proposed model and estimation on real-world as well as simulated benchmark networks and show that the proposed model and estimator are more flexible, representative, and yield smaller error rates when compared to the MAP estimator from classical degree-corrected stochastic blockmodels.

### MSC:

62P25 | Applications of statistics to social sciences |

62F15 | Bayesian inference |

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

91D30 | Social networks; opinion dynamics |

### Keywords:

community detection; label non-identifiability; canonical remapping; centroid estimation; community detection in networks
PDF
BibTeX
XML
Cite

\textit{L. Peng} and \textit{L. Carvalho}, Electron. J. Stat. 10, No. 2, 2746--2779 (2016; Zbl 1395.62370)

### References:

[1] | Adamic, L. and N. Glance (2005). The political blogosphere and the 2004 US election: Divided they blog. In, Proceedings of the 3rd International Workshop on Link Discovery , pp. 36-43. |

[2] | Airoldi, E. M., D. M. Blei, S. E. Fienberg, and E. P. Xing (2008). Mixed membership stochastic blockmodels., The Journal of Machine Learning Research 9 , 1981-2014. · Zbl 1225.68143 |

[3] | Albert, R. and A.-L. Barabási (2002, Jan). Statistical mechanics of complex networks., Rev. Mod. Phys. 74 , 47-97. · Zbl 1205.82086 |

[4] | Anderson, C., S. Wasserman, and K. Faust (1992). Building stochastic blockmodels., Social Networks 14 (1), 137-161. |

[5] | Barbieri, M. and J. Berger (2004). Optimal predictive model selection., The Annals of Statistics 32 (3), 870-897. · Zbl 1092.62033 |

[6] | Barnes, E. (1982). An algorithm for partitioning the nodes of a graph., SIAM Journal on Algebraic Discrete Methods 3 (4), 541-550. · Zbl 0505.05050 |

[7] | Besag, J. (1986). On the statistical analysis of dirty pictures., Journal of the Royal Statistical Society. Series B (Methodological) 48 (3), 259-302. · Zbl 0609.62150 |

[8] | Bickel, P. and A. Chen (2009). A nonparametric view of network models and Newman-Girvan and other modularities., Proceedings of the National Academy of Sciences 106 (50), 21068-21073. · Zbl 1359.62411 |

[9] | Bickel, P., D. Choi, X. Chang, and H. Zhang (2013, 08). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels., The Annals of Statistics 41 (4), 1922-1943. · Zbl 1292.62042 |

[10] | Binder, D. A. (1978). Bayesian cluster analysis., Biometrika 65 (1), 31-38. · Zbl 0376.62007 |

[11] | Binder, D. A. (1981). Approximations to Bayesian clustering rules., Biometrika 68 (1), 275-285. |

[12] | Blondel, V. D., J.-L. Guillaume, R. Lambiotte, and E. Lefebvre (2008). Fast unfolding of communities in large networks., Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P10008. |

[13] | Brandes, U., D. Delling, M. Gaertler, R. Görke, M. Hoefer, Z. Nikoloski, and D. Wagner (2007). On finding graph clusterings with maximum modularity. In, Graph-Theoretic Concepts in Computer Science , pp. 121-132. Springer. · Zbl 1141.68519 |

[14] | Carvalho, L. and C. Lawrence (2008). Centroid estimation in discrete high-dimensional spaces with applications in biology., Proceedings of the National Academy of Sciences 105 (9), 3209-3214. |

[15] | Celisse, A., J.-J. Daudin, and L. Pierre (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model., Electronic Journal of Statistics 6 , 1847-1899. · Zbl 1295.62028 |

[16] | Choi, D. S., P. J. Wolfe, and E. M. Airoldi (2012). Stochastic blockmodels with a growing number of classes., Biometrika . · Zbl 1318.62207 |

[17] | Clauset, A., M. E. J. Newman, and C. Moore (2004, August). Finding community structure in very large networks., Physical Review E 70 (6), 066111. |

[18] | Danon, L., A. Díaz-Guilera, and J. Duch (2005). Comparing community structure identification., Journal of Statistical Mechanics: Theory and Experiment , 09008. |

[19] | Daudin, J. J., F. Picard, and S. Robin (2008, June). A mixture model for random graphs., Statistics and Computing 18 (2), 173-183. |

[20] | Donath, E. and J. Hoffman (1973). Lower bounds for the partitioning of graphs., IBM J. Res. Dev. 17 (5), 420-425. · Zbl 0259.05112 |

[21] | Duch, J. and A. Arenas (2005). Community identification using extremal optimization., Physical Review E 72 , 027104. |

[22] | Fienberg, S. E., M. M. Meyer, and S. S. Wasserman (1985). Statistical analysis of multiple sociometric relations., Journal of the American Statistical Association 80 (389), 51-67. |

[23] | Fienberg, S. E. and S. Wasserman (1981). An exponential family of probability distributions for directed graphs: Comment., Journal of the American Statistical Association 76 (373), 54-57. · Zbl 0457.62090 |

[24] | Fortunato, S. and M. Barthelemy (2007). Resolution limit in community detection., Proceedings of the National Academy of Sciences 104 (1), 36-41. |

[25] | Fosdick, B. and P. Hoff (2013). Testing and modeling dependencies between a network and nodal attributes., . · Zbl 1373.62273 |

[26] | Fritsch, A. and K. Ickstadt (2009). Improved criteria for clustering based on the posterior similarity matrix., Bayesian Analysis 4 (2), 367-392. · Zbl 1330.62249 |

[27] | Gelfand, A. E. and S. K. Ghosh (1998). Model choice: a minimum posterior predictive loss approach., Biometrika 85 (1), 1-11. · Zbl 0904.62036 |

[28] | Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (2003)., Bayesian Data Analysis . CRC press. · Zbl 1117.62343 |

[29] | Geman, S. and D. Geman (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images., IEEE Transactions on Pattern Analysis and Machine Intelligence 6 , 721-741. · Zbl 0573.62030 |

[30] | Goodnight, J. H. (1979). A tutorial on the sweep operator., The American Statistician 33 (3), 149-158. · Zbl 0439.62047 |

[31] | Gower, J. (1966). Some distance properties of latent root and vector methods used in multivariate analysis., Biometrika 53 (3-4), 325-338. · Zbl 0192.26003 |

[32] | Hancock, T., I. Takigawa, and H. Mamitsuka (2010). Mining metabolic pathways through gene expression., Bioinformatics 26 (17), 2128-2135. |

[33] | Handcock, M., A. Raftery, and J. Tantrum (2007). Model-based clustering for social networks., Journal of the Royal Statistical Society: Series A 170 (2), 301-354. · Zbl 05273954 |

[34] | Hastie, T., R. Tibshirani, and J. Friedman (2001). Maximum likelihood from incomplete data via the em algorithm., The Elements of Statistical Learning , 520-528. · Zbl 0973.62007 |

[35] | Hoff, P., A. Raftery, and M. Handcock (2002). Latent space approaches to social network analysis., Journal of the American Statistical Association 97 (460), 1090-1098. · Zbl 1041.62098 |

[36] | Hoff, P., A. Raftery, and M. Handcock (2005). Bilinear mixed-effects models for dyadic data., Journal of the American Statistical Association 100 (469), 286-295. · Zbl 1117.62353 |

[37] | Hofman, J. and C. Wiggins (2008). Bayesian approach to network modularity., Physical Review Letters 100 (25), 258701. |

[38] | Holland, P. and S. Leinhardt (1981). An exponential family of probability distributions for directed graphs., Journal of the American Statistical Association 76 (373), 33-50. · Zbl 0457.62090 |

[39] | Holland, P. W., K. B. Laskey, and S. Leinhardt (1983). Stochastic blockmodels: First steps., Social networks 5 (2), 109-137. |

[40] | Jin, J. (2015). Fast community detection by SCORE., The Annals of Statistics 43 (1), 57-89. · Zbl 1310.62076 |

[41] | Karrer, B. and M. Newman (2011). Stochastic blockmodels and community structure in networks., Physical Review E 83 (1), 016107. |

[42] | Kernighan, B. and S. Lin (1970). An efficient heuristic procedure for partitioning graphs., Bell Sys. Tech. J. 49 (2), 291-308. · Zbl 0333.05001 |

[43] | Kim, M. and J. Leskovec (2011). Modeling social networks with node attributes using the multiplicative attribute graph model., UAI 7AUAI Press , 400-409. |

[44] | Lancichinetti, A., S. Fortunato, and F. Radicchi (2008). Benchmark graphs for testing community detection algorithms., Physical Review E 78 (1), 046110. |

[45] | Lau, J. W. and P. J. Green (2007). Bayesian model-based clustering procedures., Journal of Computational and Graphical Statistics 16 (3), 526-558. |

[46] | Lorrain, F. and H. C. White (1971). Structural equivalence of individuals in social networks., The Journal of Mathematical Sociology 1 (1), 49-80. |

[47] | Mariadassou, M., S. Robin, and C. Vacher (2010, 06). Uncovering latent structure in valued graphs: A variational approach., The Annals of Applied Statistics 4 (2), 715-742. · Zbl 1194.62125 |

[48] | McCullagh, P. and J. A. Nelder (1989)., Generalized Linear Models , Volume 37. CRC Press. · Zbl 0744.62098 |

[49] | Newman, M. (2002, Oct). Assortative mixing in networks., Phys. Rev. Lett. 89 , 208701. |

[50] | Newman, M. (2004). Fast algorithm for detecting community structure in networks., Physical Review E 69 (6), 066133. |

[51] | Newman, M. (2006). Modularity and community structure in networks., Proceedings of the National Academy of Sciences 103 (23), 8577-8582. |

[52] | Newman, M. and M. Girvan (2004). Finding and evaluating community structure in networks., Physical Review E 69 (2), 026113. · Zbl 1032.91716 |

[53] | Newman, M. E. J. (2003). Mixing patterns in networks., Phys. Rev. E (67). · Zbl 1151.91746 |

[54] | Nocedal, J. and S. J. Wright (2006)., Numerical Optimization (2nd ed.). Springer-Verlag. · Zbl 1104.65059 |

[55] | Nowicki, K. and T. A. B. Snijders (2001). Estimation and prediction for stochastic blockstructures., Journal of the American Statistical Association 96 (455), 1077-1087. · Zbl 1072.62542 |

[56] | Parthasarathy, S., Y. Ruan, and V. Satuluri (2011). Community discovery in social networks: Applications, methods and emerging trends. In, Social Network Data Analytics , pp. 79-113. Springer. |

[57] | Peng, L. and Carvalho, L. (2016). Supplementary material for “Bayesian degree-corrected stochastic blockmodels for community detection”. DOI:, . · Zbl 1395.62370 |

[58] | Polson, N. G., J. G. Scott, and J. Windle (2012). Bayesian inference for logistic models using polya-gamma latent variables., . · Zbl 1283.62055 |

[59] | Pons, P. and M. Latapy (2004). Computing communities in large networks using random walks., J. of Graph Alg. and App. 10 , 284-293. · Zbl 1161.68694 |

[60] | Qin, T. and K. Rohe (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. In, Advances in Neural Information Processing Systems , pp. 3120-3128. |

[61] | Raghavan, U. N., R. Albert, and S. Kumara (2007). Near linear time algorithm to detect community structures in large-scale networks., Physical Review E 76 (3). |

[62] | Robert, C. and G. Casella (1999)., Monte Carlo Statistical Methods . Springer New York. · Zbl 0935.62005 |

[63] | Robins, G., P. Pattison, Y. Kalish, and D. Lusher (2007). An introduction to exponential random graph (\(p^*\)) models for social networks., Social networks 29 (2), 173-191. |

[64] | Rohe, K., S. Chatterjee, and B. Yu (2011). Spectral clustering and the high-dimensional stochastic blockmodel., The Annals of Statistics 39 (4), 1878-1915. · Zbl 1227.62042 |

[65] | Sampson, S. F. (1968)., A novitiate in a period of change: an experimental and case study of social relationships . Ph. D. thesis, Cornell University, September. |

[66] | Snijders, T. A. and K. Nowicki (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure., Journal of Classification 14 (1), 75-100. · Zbl 0896.62063 |

[67] | Stephens, M. (2000). Dealing with label switching in mixture models., Journal of the Royal Statistical Society. Series B 62 (4), 795-809. · Zbl 0957.62020 |

[68] | Tallberg, C. (2005). A Bayesian approach to modeling stochastic blockstructures with covariates., Journal of Mathematical Sociology 29 , 1-23. · Zbl 1101.91067 |

[69] | Vázquez, A. (2003). Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations., Physical Review E 67 (5), 056104. |

[70] | Von Luxburg, U. (2007). A tutorial on spectral clustering., Statistics and computing 17 (4), 395-416. |

[71] | Vu, D. Q., D. R. Hunter, and M. Schweinberger (2013, 06). Model-based clustering of large networks., Annals of Applied Statistics 7 (2), 1010-1039. · Zbl 1288.62106 |

[72] | Wang, Y. J. and G. Y. Wong (1987). Stochastic blockmodels for directed graphs., Journal of the American Statistical Association 82 (397), 8-19. · Zbl 0613.62146 |

[73] | Yan, X., C. Shalizi, J. E. Jensen, F. Krzakala, C. Moore, L. Zdeborová, P. Zhang, and Y. Zhu (2014). Model selection for degree-corrected block models., Journal of Statistical Mechanics: Theory and Experiment , P05007. |

[74] | Zachary, W. W. (1977). An information flow model for conflict and fission in small groups., Journal of Anthropological Research 33 (4), 452-473. |

[75] | Zanghi, H., F. Picard, V. Miele, and C. Ambroise (2010, 06). Strategies for online inference of model-based clustering in large and growing networks., The Annals of Applied Statistics 4 (2), 687-714. · Zbl 1194.62096 |

[76] | Zhao, Y., E. Levina, J. Zhu, et al. (2012). Consistency of community detection in networks under degree-corrected stochastic block models., The Annals of Statistics 40 (4), 2266-2292. · Zbl 1257.62095 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.