Extended stochastic block models with application to criminal networks. (English) Zbl 1498.62323

Summary: Reliably learning group structures among nodes in network data is challenging in several applications. We are particularly motivated by studying covert networks that encode relationships among criminals. These data are subject to measurement errors, and exhibit a complex combination of an unknown number of core-periphery, assortative and disassortative structures that may unveil key architectures of the criminal organization. The coexistence of these noisy block patterns limits the reliability of routinely-used community detection algorithms, and requires extensions of model-based solutions to realistically characterize the node partition process, incorporate information from node attributes, and provide improved strategies for estimation and uncertainty quantification. To cover these gaps, we develop a new class of extended stochastic block models (esbm) that infer groups of nodes having common connectivity patterns via Gibbs-type priors on the partition process. This choice encompasses many realistic priors for criminal networks, covering solutions with fixed, random and infinite number of possible groups, and facilitates the inclusion of node attributes in a principled manner. Among the new alternatives in our class, we focus on the Gnedin process as a realistic prior that allows the number of groups to be finite, random and subject to a reinforcement process coherent with criminal networks. A collapsed Gibbs sampler is proposed for the whole esbm class, and refined strategies for estimation, prediction, uncertainty quantification and model selection are outlined. The esbm performance is illustrated in realistic simulations and in an application to an Italian mafia network, where we unveil key complex block structures, mostly hidden from state-of-the-art alternatives.


62P25 Applications of statistics to social sciences
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G05 Nonparametric estimation
62F15 Bayesian inference
91D30 Social networks; opinion dynamics
Full Text: DOI arXiv


[1] ABBE, E. (2017). Community detection and stochastic block models: Recent developments. J. Mach. Learn. Res. 18 1-86. · Zbl 1403.62110
[2] AGRESTE, S., CATANESE, S., DE MEO, P., FERRARA, E. and FIUMARA, G. (2016). Network structure and resilience of mafia syndicates. Inform. Sci. 351 30-47.
[3] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981-2014. · Zbl 1225.68143
[4] AMINI, A., CHEN, A., BICKEL, P. J. and LEVINA, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist. 41 2097-2122. · Zbl 1277.62166 · doi:10.1214/13-AOS1138
[5] ATHREYA, A., FISHKIND, D. E., TANG, M., PRIEBE, C. E., PARK, Y., VOGELSTEIN, J. T., LEVIN, K., LYZINSKI, V., QIN, Y. and SUSSMAN, D. L. (2017). Statistical inference on random dot product graphs: A survey. J. Mach. Learn. Res. 18 1-92. · Zbl 1473.05279
[6] BICKEL, P. J., CHOI, D., CHANG, X. and ZHANG, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Statist. 41 1922-1943. · Zbl 1292.62042 · doi:10.1214/13-AOS1124
[7] BINKIEWICZ, N., VOGELSTEIN, J. T. and ROHE, K. (2017). Covariate-assisted spectral clustering. Biometrika 104 361-377. · Zbl 1506.62319 · doi:10.1093/biomet/asx008
[8] BLONDEL, V. D., GUILLAUME, J. L., LAMBIOTTE, R. and LEFEBVRE, E. (2008). Fast unfolding of communities in large networks. J. Stat. Mech. 10 P10008. · Zbl 1459.91130
[9] CALDERONI, F., BRUNETTO, D. and PICCARDI, C. (2017). Communities in criminal networks: A case study. Soc. Netw. 48 116-125.
[10] CALDERONI, F. and PICCARDI, C. (2014). Uncovering the structure of criminal organizations by community analysis: The Infinito network. In 2014 Tenth International Conference on Signal-Image Technology and Internet-Based Systems 301-308. IEEE, Marrakech, Morocco.
[11] CAMPANA, P. (2016). Explaining criminal networks: Strategies and potential pitfalls. Methodol. Innov. 9 1-10.
[12] CAMPANA, P. and VARESE, F. (2022). Studying organized crime networks: Data sources, boundaries and the limits of structural measures. Soc. Netw. 69 149-159.
[13] CARLEY, K. M., LEE, J.-S. and KRACKHARDT, D. (2002). Destabilizing networks. Connections 24 79-92.
[14] CATINO, M. (2014). How do mafias organize? Conflict and violence in three mafia organizations. Eur. J.Sociol. 55 177-220.
[15] CAVALLARO, L., FICARA, A., DE MEO, P., FIUMARA, G., CATANESE, S., BAGDASAR, O., SONG, W. and LIOTTA, A. (2020). Disrupting resilient criminal networks through data analysis: The case of Sicilian Mafia. PLoS ONE 15 1-22.
[16] Chen, K. and Lei, J. (2018). Network cross-validation for determining the number of communities in network data. J. Amer. Statist. Assoc. 113 241-251. · Zbl 1398.62159 · doi:10.1080/01621459.2016.1246365
[17] Côme, E. and Latouche, P. (2015). Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood. Stat. Model. 15 564-589. · Zbl 07259003 · doi:10.1177/1471082X15577017
[18] CÔME, E., JOUVIN, N., LATOUCHE, P. and BOUVEYRON, C. (2021). Hierarchical clustering with discrete latent variable models and the integrated classification likelihood. Adv. Data Anal. Classif. 15 957-986. · Zbl 07538936 · doi:10.1007/s11634-021-00440-z
[19] DE BLASI, P., LIJOI, A. and PRÜNSTER, I. (2013). An asymptotic analysis of a class of discrete nonparametric priors. Statist. Sinica 23 1299-1321. · Zbl 1534.62024
[20] DE BLASI, P., FAVARO, S., LIJOI, A., MENA, R. H., PRÜNSTER, I. and RUGGIERO, M. (2015). Are Gibbs-type priors the most natural generalization of the Dirichlet process? IEEE Trans. Pattern Anal. Mach. Intell. 37 212-229.
[21] DIVIÁK, T. (2022). Key aspects of covert networks data collection: Problems, challenges, and opportunities. Soc. Netw. 69 160-169.
[22] FAUST, K. and TITA, G. E. (2019). Social networks and crime: Pitfalls and promises for advancing the field. Annu. Rev. Criminol. 2 99-122.
[23] FERRARA, E., DE MEO, P., CATANESE, S. and FIUMARA, G. (2014). Detecting criminal organizations in mobile phone networks. Expert Syst. Appl. 41 5733-5750.
[24] FORTUNATO, S. and HRIC, D. (2016). Community detection in networks: A user guide. Phys. Rep. 659 1-44. · doi:10.1016/j.physrep.2016.09.002
[25] Fosdick, B. K., McCormick, T. H., Murphy, T. B., Ng, T. L. J. and Westling, T. (2019). Multiresolution network models. J. Comput. Graph. Statist. 28 185-196. · Zbl 07499021 · doi:10.1080/10618600.2018.1505633
[26] FRUCHTERMAN, T. M. and REINGOLD, E. M. (1991). Graph drawing by force-directed placement. Softw. Pract. Exp. 21 1129-1164.
[27] Gelman, A., Hwang, J. and Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Stat. Comput. 24 997-1016. · Zbl 1332.62090 · doi:10.1007/s11222-013-9416-2
[28] Geng, J., Bhattacharya, A. and Pati, D. (2019). Probabilistic community detection with unknown number of communities. J. Amer. Statist. Assoc. 114 893-905. · Zbl 1420.62271 · doi:10.1080/01621459.2018.1458618
[29] Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 7821-7826. · Zbl 1032.91716 · doi:10.1073/pnas.122653799
[30] GNEDIN, A. (2010). A species sampling model with finitely many types. Electron. Commun. Probab. 15 79-88. · Zbl 1202.60056 · doi:10.1214/ECP.v15-1532
[31] GNEDIN, A. and PITMAN, J. (2005). Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. (POMI) S.-Peterburg. 325 83-102. · Zbl 1293.60010 · doi:10.1007/s10958-006-0335-z
[32] GORMLEY, I. C. and MURPHY, T. B. (2010). A mixture of experts latent position cluster model for social network data. Stat. Methodol. 7 385-405. · Zbl 1233.62205 · doi:10.1016/j.stamet.2010.01.002
[33] GRASSI, R., CALDERONI, F., BIANCHI, M. and TORRIERO, A. (2019). Betweenness to assess leaders in criminal networks: New evidence using the dual projection approach. Soc. Netw. 56 23-32.
[34] Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering for social networks. J. Roy. Statist. Soc. Ser. A 170 301-354. · doi:10.1111/j.1467-985X.2007.00471.x
[35] HARTIGAN, J. A. (1990). Partition models. Comm. Statist. Theory and Methods 19 2745-2756. · doi:10.1080/03610929008830345
[36] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109-137. · doi:10.1016/0378-8733(83)90021-7
[37] Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107, 10. · doi:10.1103/PhysRevE.83.016107
[38] Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc. 90 773-795. · Zbl 0846.62028 · doi:10.1080/01621459.1995.10476572
[39] KEMP, C., TENENBAUM, J. B., GRIFFITHS, T. L., YAMADA, T. and UEDA, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence 381-388.
[40] KIM, D., HUGHES, M. and SUDDERTH, E. (2012). The nonparametric metadata dependent relational model. In ICML’12: Proceedings of the 29th International Conference on Machine Learning 1411-1418. IEEE, Edinburgh, UK.
[41] KREBS, V. E. (2002). Mapping networks of terrorist cells. Connections 24 43-52.
[42] LE, V. (2012). Organised crime typologies: Structure, activities and conditions. Int. J. Criminol. Sociol. 1 121-131.
[43] LE, C. M. and LEVINA, E. (2015). Estimating the number of communities in networks by spectral methods. Preprint. Available at arXiv:1507.00827. · Zbl 1493.62313
[44] LEE, C. and WILKINSON, D. J. (2019). A review of stochastic block models and extensions for graph clustering. Appl. Netw. Sci. 4 1-50.
[45] LEGRAMANTI, S., RIGON, T. and DURANTE, D. (2020). Bayesian testing for exogenous partition structures in stochastic block models. Sankhya A. In press. · Zbl 1490.62142
[46] LEGRAMANTI, S., RIGON, T., DURANTE, D. and DUNSON, D. B (2022). Supplement to “Extended stochastic block models with application to criminal networks.” https://doi.org/10.1214/21-AOAS1595SUPP · Zbl 1490.62142
[47] Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215-237. · Zbl 1308.62041 · doi:10.1214/14-AOS1274
[48] LENK, P. (2009). Simulation pseudo-bias correction to the harmonic mean estimator of integrated likelihoods. J. Comput. Graph. Statist. 18 941-960. · doi:10.1198/jcgs.2009.08022
[49] LI, T., LEVINA, E. and ZHU, J. (2020). Network cross-validation by edge sampling. Biometrika 107 257-276. · Zbl 1441.62049 · doi:10.1093/biomet/asaa006
[50] LIJOI, A., MENA, R. H. and PRÜNSTER, I. (2007a). Controlling the reinforcement in Bayesian non-parametric mixture models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 69 715-740. · Zbl 07555373 · doi:10.1111/j.1467-9868.2007.00609.x
[51] LIJOI, A., MENA, R. H. and PRÜNSTER, I. (2007b). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94 769-786. · Zbl 1156.62374 · doi:10.1093/biomet/asm061
[52] LIJOI, A., PRÜNSTER, I. and WALKER, S. G. (2008). Bayesian nonparametric estimators derived from conditional Gibbs structures. Ann. Appl. Probab. 18 1519-1547. · Zbl 1142.62333 · doi:10.1214/07-AAP495
[53] LIU, F., CHOI, D., XIE, L. and ROEDER, K. (2018). Global spectral clustering in dynamic networks. Proc. Natl. Acad. Sci. USA 115 927-932. · Zbl 1418.91430 · doi:10.1073/pnas.1718449115
[54] MAGALINGAM, P., DAVIS, S. and RAO, A. (2015). Using shortest path to discover criminal community. Digit. Investig. 15 1-17.
[55] MALM, A. and BICHLER, G. (2011). Networks of collaborating criminals: Assessing the structural vulnerability of drug markets. J. Res. Crime Delinq. 48 271-297.
[56] MEILĂ, M. (2007). Comparing clusterings—an information based distance. J. Multivariate Anal. 98 873-895. · Zbl 1298.91124 · doi:10.1016/j.jmva.2006.11.013
[57] MILLER, J. W. and HARRISON, M. T. (2014). Inconsistency of Pitman-Yor process mixtures for the number of components. J. Mach. Learn. Res. 15 3333-3370. · Zbl 1319.62100
[58] Miller, J. W. and Harrison, M. T. (2018). Mixture models with a prior on the number of components. J. Amer. Statist. Assoc. 113 340-356. · Zbl 1398.62066 · doi:10.1080/01621459.2016.1255636
[59] MORSELLI, C. (2009). Hells Angels in springtime. Trends Organ. Crime 12 145-158.
[60] MORSELLI, C., GIGUÈRE, C. and PETIT, K. (2007). The efficiency/security trade-off in criminal networks. Soc. Netw. 29 143-153.
[61] Müller, P., Quintana, F. and Rosner, G. L. (2011). A product partition model with regression on covariates. J. Comput. Graph. Statist. 20 260-278. · doi:10.1198/jcgs.2011.09066
[62] NEWMAN, M. E. J. and GIRVAN, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E 69 026113.
[63] NEWMAN, M. E. J. (2006). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103 8577-8582.
[64] NEWMAN, M. E. J. and CLAUSET, A. (2016). Structure and inference in annotated networks. Nat. Commun. 7 1-11.
[65] NOROOZI, M. and PENSKY, M. (2020). Statistical inference in heterogeneous block model. Preprint. Available at arXiv:2002.02610. · Zbl 1490.62143
[66] Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077-1087. · Zbl 1072.62542 · doi:10.1198/016214501753208735
[67] Olhede, S. C. and Wolfe, P. J. (2014). Network histograms and universality of blockmodel approximation. Proc. Natl. Acad. Sci. USA 111 14722-14727.
[68] PAJOR, A. (2017). Estimating the marginal likelihood using the arithmetic mean identity. Bayesian Anal. 12 261-287. · Zbl 1384.62096 · doi:10.1214/16-BA1001
[69] PAOLI, L. (2007). Mafia and organised crime in Italy: The unacknowledged successes of law enforcement. West Eur. Polit. 30 854-880.
[70] PARK, J.-H. and DUNSON, D. B. (2010). Bayesian generalized product partition model. Statist. Sinica 20 1203-1226. · Zbl 1507.62242
[71] Quintana, F. A. and Iglesias, P. L. (2003). Bayesian clustering and product partition models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 65 557-574. · Zbl 1065.62115 · doi:10.1111/1467-9868.00402
[72] RAFTERY, A. E., NEWTON, M. A., SATAGOPAN, J. M. and KRIVITSKY, P. N. (2007). Estimating the integrated likelihood via posterior simulation using the harmonic mean identity. In Bayesian Statistics 8. 1-45. Oxford Univ. Press, Oxford. · Zbl 1252.62038
[73] RANCIATI, S., VINCIOTTI, V. and WIT, E. C. (2020). Identifying overlapping terrorist cells from the Noordin Top actor-event network. Ann. Appl. Stat. 14 1516-1534. · Zbl 1470.62178 · doi:10.1214/20-AOAS1358
[74] RASTELLI, R., LATOUCHE, P. and FRIEL, N. (2018). Choosing the number of groups in a latent stochastic blockmodel for dynamic networks. Netw. Sci. 6 469-493.
[75] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878-1915. · Zbl 1227.62042 · doi:10.1214/11-AOS887
[76] Saldaña, D. F., Yu, Y. and Feng, Y. (2017). How many communities are there? J. Comput. Graph. Statist. 26 171-181. · doi:10.1080/10618600.2015.1096790
[77] SANGKARAN, T., ABDULLAH, A. and JHANJHI, N. (2020). Criminal community detection based on isomorphic subgraph analytics. Open Comput. Sci. 10 164-174.
[78] Sarkar, P. and Bickel, P. J. (2015). Role of normalization in spectral clustering for stochastic blockmodels. Ann. Statist. 43 962-990. · Zbl 1320.62150 · doi:10.1214/14-AOS1285
[79] SCHMIDT, M. N. and MORUP, M. (2013). Nonparametric Bayesian modeling of complex networks: An introduction. IEEE Signal Process. Mag. 30 110-128.
[80] SENGUPTA, S. and CHEN, Y. (2018). A block model for node popularity in networks with community structure. J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 365-386. · Zbl 1458.62166 · doi:10.1111/rssb.12245
[81] Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 583-639. · Zbl 1067.62010 · doi:10.1111/1467-9868.00353
[82] STANLEY, N., BONACCI, T., KWITT, R., NIETHAMMER, M. and MUCHA, P. J. (2019). Stochastic block models with multiple continuous attributes. Appl. Netw. Sci. 4 1-22.
[83] SUSSMAN, D. L., TANG, M., FISHKIND, D. and PRIEBE, C. E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Amer. Statist. Assoc. 107 1119-1128. · Zbl 1443.62188 · doi:10.1080/01621459.2012.699795
[84] TALLBERG, C. (2004). A Bayesian approach to modeling stochastic blockstructures with covariates. J. Math. Sociol. 29 1-23. · Zbl 1101.91067
[85] von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17 395-416. · doi:10.1007/s11222-007-9033-z
[86] WADE, S. and GHAHRAMANI, Z. (2018). Bayesian cluster analysis: Point estimation and credible balls (with discussion). Bayesian Anal. 13 559-626. · Zbl 1407.62241 · doi:10.1214/17-BA1073
[87] Wang, Y. X. R. and Bickel, P. J. (2017). Likelihood-based model selection for stochastic block models. Ann. Statist. 45 500-528. · Zbl 1371.62017 · doi:10.1214/16-AOS1457
[88] WANG, Y.-B., CHEN, M.-H., KUO, L. and LEWIS, P. O. (2018). A new Monte Carlo method for estimating marginal likelihoods. Bayesian Anal. 13 311-333. · Zbl 1407.62412 · doi:10.1214/17-BA1049
[89] Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11 3571-3594. · Zbl 1242.62024
[90] Watanabe, S. (2013). A widely applicable Bayesian information criterion. J. Mach. Learn. Res. 14 867-897. · Zbl 1320.62058
[91] WHITE, A. and MURPHY, T. B. (2016). Mixed-membership of experts stochastic blockmodel. Netw. Sci. 4 48-80.
[92] XU, Z., KE, Y., WANG, Y., CHENG, H. and CHENG, J. (2012). A model-based approach to attributed graph clustering. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data 505-516.
[93] YANG, J., MCAULEY, J. and LESKOVEC, J. (2013). Community detection in networks with node attributes. In 2013 IEEE 13th International Conference on Data Mining 1151-1156.
[94] ZHANG, Y., LEVINA, E. and ZHU, J. (2016). Community detection in networks with node features. Electron. J. Stat. 10 3153-3178. · Zbl 1359.62271 · doi:10.1214/16-EJS1206
[95] ZHAO, H., DU, L. and BUNTINE, W. (2017). Leveraging node attributes for incomplete relational data. In International Conference on Machine Learning 4072-4081.
[96] ZHAO, Y., LEVINA, E. and ZHU, J. (2012). Consistency of community detection in networks under degree corrected stochastic block models. Ann. Statist. 40 2266-2292. · Zbl 1257.62095 · doi:10.1214/12-AOS1036
[97] ZHOU, Z. and AMINI, A. (2019). Analysis of spectral clustering algorithms for community detection: The general bipartite setting. J. Mach. Learn. Res. 20 1-47 · Zbl 1484.62088
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.