## Variational inference for stochastic block models from sampled data.(English)Zbl 1437.62072

Summary: This article deals with nonobserved dyads during the sampling of a network and consecutive issues in the inference of the stochastic block model (SBM). We review sampling designs and recover missing at random (MAR) and not missing at random (NMAR) conditions for the SBM. We introduce variants of the variational EM algorithm for inferring the SBM under various sampling designs (MAR and NMAR) all available as an R package. Model selection criteria based on integrated classification likelihood are derived for selecting both the number of blocks and the sampling design. We investigate the accuracy and the range of applicability of these algorithms with simulations. We explore two real-world networks from ethnology (seed circulation network) and biology (protein-protein interaction network), where the interpretations considerably depend on the sampling designs considered.

### MSC:

 62D10 Missing data 68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.) 62K10 Statistical block designs 62F15 Bayesian inference 65K10 Numerical optimization and variational techniques

### Software:

missSBM; STRING; R
Full Text:

### References:

 [1] Aicher, C.; Jacobs, A. Z.; Clauset, A., “Learning Latent Block Structure in Weighted Networks,”, Journal of Complex Networks, 3, 221-248 (2014) · Zbl 1397.68151 [2] Airoldi, E. M.; Blei, D. M.; Fienberg, S. E.; Xing, E. P., “Mixed Membership Stochastic Blockmodels,”, Journal of Machine Learning Research, 9, 1981-2014 (2008) · Zbl 1225.68143 [3] Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S.; Eppig, J. T.; Harris, M. A., “Gene Ontology: Tool for the Unification of Biology,”, Nature Genetics, 25, 25 (2000) [4] Balachandran, P.; Kolaczyk, E. D.; Viles, W. D., “On the Propagation of Low-Rate Measurement Error to Subgraph Counts in Large Networks,”, Journal of Machine Learning Research, 18, 1-33 (2017) · Zbl 1440.62226 [5] Barbillon, P.; Donnet, S.; Lazega, E.; Bar-Hen, A., “Stochastic Block Models for Multiplex Networks: An Application to Networks of Researchers,”, Journal of the Royal Statistical Society, Series A, 180, 295-314 (2015) [6] Bickel, P.; Choi, D.; Chang, X.; Zhang, H., “Asymptotic Normality of Maximum Likelihood and Its Variational Approximation for Stochastic Blockmodels,”, The Annals of Statistics, 41, 1922-1943 (2013) · Zbl 1292.62042 [7] Biernacki, C.; Celeux, G.; Govaert, G., “Assessing a Mixture Model for Clustering With the Integrated Completed Likelihood,”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 719-725 (2000) [8] Celisse, A.; Daudin, J.-J.; Pierre, L., “Consistency of Maximum-Likelihood and Variational Estimators in the Stochastic Block Model,”, Electronic Journal of Statistics, 6, 1847-1899 (2012) · Zbl 1295.62028 [9] Chatterjee, S., “Matrix Estimation by Universal Singular Value Thresholding,”, The Annals of Statistics, 43, 177-214 (2015) · Zbl 1308.62038 [10] Daudin, J.-J.; Picard, F.; Robin, S., “A Mixture Model for Random Graphs,”, Statistics and Computing, 18, 173-183 (2008) [11] Davenport, M. A.; Plan, Y.; van den Berg, E.; Wootters, M., “1-Bit Matrix Completion,”, Information and Inference, 3, 189-223 (2014) · Zbl 1309.62124 [12] Frank, O.; Harary, F., “Cluster Inference by Using Transitivity Indices in Empirical Graphs,”, Journal of the American Statistical Association, 77, 835-840 (1982) · Zbl 0505.62043 [13] Goldenberg, A.; Zheng, A. X.; Fienberg, S. E.; Airoldi, E. M., “A Survey of Statistical Network Models,”, Foundations and Trends® in Machine Learning, 2, 129-233 (2010) · Zbl 1184.68030 [14] Handcock, M. S.; Gile, K. J., “Modeling Social Networks From Sampled Data,”, The Annals of Applied Statistics, 4, 5-25 (2010) · Zbl 1189.62187 [15] Holland, P. W.; Laskey, K. B.; Leinhardt, S., “Stochastic Blockmodels: First Steps,”, Social Networks, 5, 109-137 (1983) [16] Jordan, M. I.; Ghahramani, Z.; Jaakkola, T. S.; Saul, L. K., Learning in Graphical Models, “An Introduction to Variational Methods for Graphical Models,”, 105-161 (1998), Dordrecht: Springer, Dordrecht · Zbl 0910.68175 [17] Karrer, B.; Newman, M. E. J., “Stochastic Blockmodels and Community Structure in Networks,”, Physical Review E, 83, 016107 (2011) [18] Kolaczyk, E. D., Statistical Analysis of Network Data, Methods and Models (2009), New York: Springer, New York · Zbl 1277.62021 [19] Labeyrie, V.; Deu, M.; Barnaud, A.; Calatayud, C.; Buiron, M.; Wambugu, P.; Manel, S.; Glaszmann, J.-C.; Leclerc, C., “Influence of Ethnolinguistic Diversity on the Sorghum Genetic Patterns in Subsistence Farming Systems in Eastern Kenya,”, PLoS One, 9, e92178 (2014) [20] Labeyrie, V.; Thomas, M.; Muthamia, Z. K.; Leclerc, C., “Seed Exchange Networks, Ethnicity, and Sorghum Diversity,”, Proceedings of the National Academy of Sciences of the United States of, 113, 98-103 (2016) [21] Latouche, P.; Birmelé, É.; Ambroise, C., “Overlapping Stochastic Block Models With Application to the French Political Blogosphere,”, The Annals of Applied Statistics, 5, 309-336 (2011) · Zbl 1220.62083 [22] Latouche, P.; Birmelé, É.; Ambroise, C., “Variational Bayesian Inference and Complexity Control for Stochastic Block Models,” Statistical Modelling, 12, 93-115 (2012) · Zbl 1420.62114 [23] Latouche, P.; Robin, S.; Ouadah, S., Goodness of Fit of Logistic Models for Random Graphs (2017) [24] Little, R. J.; Rubin, D. B., Statistical Analysis With Missing Data (2014), Hoboken, NJ: Wiley, Hoboken, NJ [25] Mariadassou, M.; Robin, S.; Vacher, C., “Uncovering Latent Structure in Valued Graphs: A Variational Approach,”, The Annals of Applied Statistics, 4, 715-742 (2010) · Zbl 1194.62125 [26] Matias, C.; Miele, V., “Statistical Clustering of Temporal Networks Through a Dynamic Stochastic Block Model,”, Journal of the Royal Statistical Society, Series B, 79, 1119-1141 (2016) · Zbl 1373.62312 [27] Matias, C.; Robin, S., “Modeling Heterogeneity in Random Graphs Through Latent Space Models: A Selective Review,”, ESAIM: Proceedings and Surveys, 47, 55-74 (2014) · Zbl 1335.05002 [28] Molenberghs, G.; Beunckens, C.; Sotto, C.; Kenward, G. M., “Every Missing Not at Random Model Has Got a Missing at Random Counterpart With Equal Fit,”, Journal of the Royal Statistical Society, Series B, 70, 371-388 (2008) · Zbl 1148.62046 [29] Nowicki, K.; Snijders, T. A. B., “Estimation and Prediction for Stochastic Blockstructures,”, Journal of the American Statistical Association, 96, 1077-1087 (2001) · Zbl 1072.62542 [30] Priebe, C. E.; Sussman, D. L.; Tang, M.; Vogelstein, J. T., “Statistical Inference on Errorfully Observed Graphs,”, Journal of Computational and Graphical Statistics, 24, 930-953 (2015) [31] Rand, W. M., “Objective Criteria for the Evaluation of Clustering Methods,”, Journal of the American Statistical Association, 66, 846-850 (1971) [32] Rubin, D. B., “Inference and Missing Data,”, Biometrika, 63, 581-592 (1976) · Zbl 0344.62034 [33] Snijders, T. A., “Statistical Models for Social Networks,”, Annual Review of Sociology, 37, 131-153 (2011) [34] Snijders, T. A.; Nowicki, K., “Estimation and Prediction for Stochastic Blockmodels for Graphs With Latent Block Structure,”, Journal of Classification, 14, 75-100 (1997) · Zbl 0896.62063 [35] Szklarczyk, D.; Franceschini, A.; Wyder, S.; Forslund, K.; Heller, D.; Huerta-Cepas, J.; Simonovic, M.; Roth, A.; Santos, A.; Tsafou, K. P.; Kuhn, M., “String v10: Protein-Protein Interaction Networks, Integrated Over the Tree of Life,”, Nucleic Acids Research, 43, D447-D452 (2015) [36] Thompson, S. K.; Frank, O., “Model-Based Estimation With Link-Tracing Sampling Designs,”, Survey Methodology, 26, 87-98 (2000) [37] Thompson, S. K.; Seber, G., Adaptive Sampling (1996), New York: Wiley, New York [38] Vinayak, R. K.; Oymak, S.; Hassibi, B., Advances in Neural Information Processing, “Graph Clustering With Missing Data: Convex Algorithms and Analysis (2014) [39] Vincent, K.; Thompson, S., Estimating the Size and Distribution of Networked Populations With Snowball Sampling (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.