zbMATH — the first resource for mathematics

Vertex nomination schemes for membership prediction. (English) Zbl 1454.62180
Summary: Suppose that a graph is realized from a stochastic block model where one of the blocks is of interest, but many or all of the vertices’ block labels are unobserved. The task is to order the vertices with unobserved block labels into a “nomination list” such that, with high probability, vertices from the interesting block are concentrated near the list’s beginning. We propose several vertex nomination schemes. Our basic – but principled – setting and development yields a best nomination scheme (which is a Bayes-Optimal analogue), and also a likelihood maximization nomination scheme that is practical to implement when there are a thousand vertices, and which is empirically near-optimal when the number of vertices is small enough to allow comparison to the best nomination scheme. We then illustrate the robustness of the likelihood maximization nomination scheme to the modeling challenges inherent in real data, using examples which include a social network involving human trafficking, the Enron Graph, a worm brain connectome and a political blog network.

62H22 Probabilistic graphical models
62H30 Classification and discrimination; cluster analysis (statistical aspects)
91D30 Social networks; opinion dynamics
mclust; lda
Full Text: DOI Euclid arXiv
[1] Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 U.S. election: Divided they blog. In Proceedings of the 3 rd International Workshop on Link Discovery LinkKDD’ 05 36-43. ACM, New York. · Zbl 0980.13002
[2] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in Neural Information Processing Systems 21 (D. Koller, D. Schuurmans, Y. Bengio and L. Bottou, eds.) 33-40. Curran, Red Hook, New York. · Zbl 1225.68143
[3] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068-21073. · Zbl 1359.62411
[4] Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993-1022. · Zbl 1112.68379
[5] Chang, J. and Dai, A. (2010). LDA: Collapsed Gibbs sampling methods for topic models, 2010. R Package Version 1.
[6] Conte, D., Foggia, P., Sansone, C. and Vento, M. (2004). Thirty years of graph matching in pattern recognition. Int. J. Pattern Recognit. Artif. Intell. 18 265-298.
[7] Coppersmith, G. (2014). Vertex nomination. Wiley Interdisciplinary Reviews : Computational Statistics 6 144-153.
[8] Coppersmith, G. A. and Priebe, C. E. (2012). Vertex nomination via content and context. Preprint. Available at . arXiv:1201.4118
[9] Erdős, P. and Rényi, A. (1963). Asymmetric graphs. Acta Math. Acad. Sci. Hungar 14 295-315. · Zbl 0118.18901
[10] Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T. and Priebe, C. E. (2013). Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM J. Matrix Anal. Appl. 34 23-39. · Zbl 1314.05186
[11] Fortunato, S. (2010). Community detection in graphs. Phys. Rep. 486 75-174.
[12] Fraley, C. and Raftery, A. E. (1999). MCLUST: Software for model-based cluster analysis. J. Classification 16 297-306. · Zbl 0951.91500
[13] Fraley, C. and Raftery, A. E. (2003). Enhanced model-based clustering, density estimation, and discriminant analysis software: MCLUST. J. Classification 20 263-286. · Zbl 1055.62071
[14] Garey, M. R. and Johnson, D. S. (1979). Computer and Intractability : A Guide to the Theory of NP-Completeness . W. H. Freeman, New York. · Zbl 0411.68039
[15] Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 2 193-218. · Zbl 0587.62128
[16] Lee, D. S. and Priebe, C. E. (2012). Bayesian vertex nomination. Preprint. Available at . arXiv:1205.5082
[17] Lyzinski, V., Fishkind, D. E. and Priebe, C. E. (2014). Seeded graph matching for correlated Erdos-Renyi graphs. J. Mach. Learn. Res. 15 3513-3540.
[18] Lyzinski, V., Sussman, D. L., Fishkind, D. E., Pao, H., Chen, L., Vogelstein, J. T., Park, Y. and Priebe, C. E. (2015a). Spectral clustering for divide-and-conquer graph matching. Parallel Comput. 47 70-87.
[19] Lyzinski, V., Fishkind, D., Fiori, M., Vogelstein, J. T., Priebe, C. E. and Sapiro, G. (2015b). Graph matching: Relax at your own risk. Preprint. IEEE Trans. Pattern Anal. Mach. Intell.
[20] Lyzinski, V., Sussman, D. L., Tang, M., Athreya, A. and Priebe, C. E. (2014b). Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Electron. J. Stat. 8 2905-2922. · Zbl 1308.62131
[21] Newman, M. E. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E (3) 69 026113. · Zbl 1032.91716
[22] Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077-1087. · Zbl 1072.62542
[23] Pólya, G. (1937). Kombinatorische Anzahlbestimmungen für Gruppen, Graphen und chemische Verbindungen. Acta Math. 68 145-254. · Zbl 0017.23202
[24] Priebe, C. E., Conroy, J. M., Marchette, D. J. and Park, Y. (2005). Scan statistics on enron graphs. Comput. Math. Organ. Theory 11 229-247. · Zbl 1086.68562
[25] Read, R. C. and Corneil, D. G. (1977). The graph isomorphism disease. J. Graph Theory 1 339-363. · Zbl 1055.62071
[26] Sussman, D. L., Tang, M., Fishkind, D. E. and Priebe, C. E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Amer. Statist. Assoc. 107 1119-1128. · Zbl 1443.62188
[27] Vogelstein, J. T., Conroy, J. M., Lyzinski, V., Podrazik, L. J., Kratzer, S. G., Harley, E. T., Fishkind, D. E., Vogelstein, R. J. and Priebe, C. E. (2015). Fast approximate quadratic programming for graph matching. PLOS One 10 e0121002.
[28] Zaslavskiy, M., Bach, F. and Vert, J.-P. (2009). A path following algorithm for the graph matching problem. IEEE Trans. Pattern Anal. Mach. Intell. 31 2227-2242.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.