zbMATH — the first resource for mathematics

On consistent vertex nomination schemes. (English) Zbl 07064049
Summary: Given a vertex of interest in a network \(G_1\), the vertex nomination problem seeks to find the corresponding vertex of interest (if it exists) in a second network \(G_2\). A vertex nomination scheme produces a list of the vertices in \(G_2\), ranked according to how likely they are judged to be the corresponding vertex of interest in \(G_2\). The vertex nomination problem and related information retrieval tasks have attracted much attention in the machine learning literature, with numerous applications to social and biological networks. However, the current framework has often been confined to a comparatively small class of network models, and the concept of statistically consistent vertex nomination schemes has been only shallowly explored. In this paper, we extend the vertex nomination problem to a very general statistical model of graphs. Further, drawing inspiration from the long-established classification framework in the pattern recognition literature, we provide definitions for the key notions of Bayes optimality and consistency in our extended vertex nomination framework, including a derivation of the Bayes optimal vertex nomination scheme. In addition, we prove that no universally consistent vertex nomination schemes exist. Illustrative examples are provided throughout.

68T05 Learning and adaptive systems in artificial intelligence
Full Text: Link
[1] A. Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank networked entities. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining, pages 14-23, 2006.
[2] S. Agarwal. Learning to rank on graphs.Machine Learning, 10(3):333-357, 2010.
[3] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels.The Journal of Machine Learning Research, 9:1981-2014, 2008. · Zbl 1225.68143
[4] R. Albert and A.-L. Barab´asi.Statistical mechanics of complex networks.Reviews of Modern Physics, 74(1):47, 2002.
[5] D. Asta and C. R. Shalizi. Geometric network comparison. In M. Meila and T. Haskes, editors,Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence, pages 102-110, 2015.
[6] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation, 15(6):1373-1396, 2003. · Zbl 1085.68119
[7] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold Regularization: A Geometric Framework for Learning from Examples.Journal of Machine Learning Research, 7:2399-2434, 2006. · Zbl 1222.68144
[8] P. J. Bickel and A. Chen. A nonparametric view of network models and Newman-Girvan and other modularities.Proc. National Academy of Sciences, USA, 106:21068-21073, 2009. · Zbl 1359.62411
[9] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference, 1998.
[10] L. Chen, C. Shen, J. T. Vogelstein, and C. E. Priebe. Robust vertex classification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3):578-590, 2016a.
[11] L. Chen, J. T. Vogelstein, V. Lyzinski, and C. E. Priebe. A joint graph inference case study: The C. elegans chemical and electrical connectomes.Worm, 5(2), 2016b.
[12] R. R. Coifman and S. Lafon. Diffusion maps.Applied and Computational Harmonic Analysis, 21:5-30, 2006. · Zbl 1095.68094
[13] D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years of graph matching in pattern recognition.International Journal of Pattern Recognition and Artificial Intelligence, 18 (03):265-298, 2004.
[14] G. Coppersmith.Vertex nomination.Wiley Interdisciplinary Reviews: Computational Statistics, 6(2):144-153, 2014.
[15] G. A. Coppersmith and C. E. Priebe. Vertex nomination via content and context.arXiv preprint arXiv:1201.4118, 2012.
[16] L. Devroye, L. Gy¨orfi, and G. Lugosi.A Probabilistic Theory of Pattern Recognition. Springer, 1997.
[17] K. K. Duh.Learning to Rank with Partially-Labeled Data.PhD thesis, University of Washington, 2009.
[18] P. Erd˝os and A. R´enyi. On random graphs, I.Publicationes Mathematicae, 6:290-297, 1959.
[19] P. Erd˝os and A. R´enyi. Asymmetric graphs.Acta Mathematica Academiae Scientiarum Hungarica, 14(3-4):295-315, 1963.
[20] D. E. Fishkind, V. Lyzinski, H. Pao, L. Chen, and C. E. Priebe. Vertex nomination schemes for membership prediction.The Annals of Applied Statistics, 9(3):1510-1532, 2015. · Zbl 1454.62180
[21] D. E. Fishkind, S. Adali, H. G. Patsolic, L. Meng, D. Singh, V. Lyzinski, and C. E. Priebe. Seeded graph matching.Pattern Recognition, 87:203-215, 2019.
[22] O. Frank and D. Strauss. Markov graphs.Journal of the american Statistical association, 81(395):832-842, 1986. · Zbl 0607.05057
[23] P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis.Journal of the American Statistical Association, 97(460):1090-1098, 2002. · Zbl 1041.62098
[24] P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps.Social Networks, 5(2):109-137, 1983.
[25] H. Jeong, S. P. Mason, A.-L. Barab´asi, and Z. N. Oltvai. Lethality and centrality in protein networks.Nature, 411(6833):41-42, 2001.
[26] B. Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks.Physical Review E, 83, 2011.
[27] J. H. Kim, B. Sudakov, and V. H. Vu. On the asymmetry of random regular graphs and random graphs.Random Structures and Algorithms, 21:216-224, 2002. · Zbl 1012.05143
[28] H. Li. Learning to rank for information retrieval and natural language processing.Synthesis Lectures on Human Language Technologies, 4(1), 2011.
[29] D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks.Journal of the Association for Information Science and Technology, 58(7):1019-1031, 2007.
[30] T.-Y. Liu. Learning to rank for information retrieval.Foundations and Trends in Information Retrieval, 3(3):225-331, 2009.
[31] L. L¨u and T. Zhou. Link prediction in complex networks: A survey.Physica A: Statistical Mechanics and its Applications, 390(6):1150-1170, 2011.
[32] C. Luo, W. Pang, Z. Wang, and C. Lin.Hete-CF: Social-based collaborative filtering recommendation using heterogeneous relations. InProceedings of the IEEE International Conference on Data Mining, pages 917-922, 2014.
[33] D. Lusseau and M. E. J. Newman. Identifying the role that animals play in their social networks.Proceedings of the Royal Society of London B: Biological Sciences, 271(Suppl 6):S477-S481, 2004.
[34] U. Von Luxburg. A tutorial on spectral clustering.Statistics and computing, 17(4):395-416, 2007.
[35] V. Lyzinski. Information recovery in shuffled graphs via graph matching.IEEE Transactions on Information Theory, 64(5):3254-3273, 2018. · Zbl 1395.94422
[36] V. Lyzinski and D. L. Sussman.Matchability of heterogeneous networks pairs.arXiv preprint arXiv:1705.02294, 2018.
[37] V. Lyzinski, D. L. Sussman, M. Tang, A. Athreya, and C. E. Priebe. Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding.Electronic Journal of Statistics, 8:2905-2922, 2014. · Zbl 1308.62131
[38] V. Lyzinski, D. Fishkind, M. Fiori, J.T. Vogelstein, C.E. Priebe, and G. Sapiro. Graph matching: Relax at your own risk.IEEE Transactions on Pattern Analysis and Machine Intelligence, In press, 2015.
[39] V. Lyzinski, K. Levin, D. E. Fishkind, and C. E. Priebe. On the consistency of the likelihood maximization vertex nomination scheme: Bridging the gap between maximum likelihood estimation and graph matching.Journal of Machine Learning Research, 17(179):1-34, 2016. · Zbl 1392.62189
[40] H. Ma, I. King, and M. R. Lyu. Mining web graphs for recommendations.IEEE Transactions on Knowledge and Data Engineering, 24(6):1051-1064, 2012.
[41] C. Manning, P. Raghavan, and H. Sch¨utze.Introduction to Information Retrieval. Cambridge University Press, 2008.
[42] D. Marchette, C. E. Priebe, and G. Coppersmith. Vertex nomination via attributed random dot product graphs. InProceedings of the 57th ISI World Statistics Congress, volume 6, page 16, 2011.
[43] R. Mihalcea and D. Radev.Graph-based Natural Language Processing and Information Retrieval. Cambridge University Press, 2011. · Zbl 1246.68009
[44] M. E. J. Newman. A measure of betweenness centrality based on random walks.Social Networks, 27(1):39-54, 2005.
[45] M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices.Phys. Rev. E, 74(3):036104, 2006.
[46] M. E. J. Newman and A. Clauset. Structure and inference in annotated networks.Nature Communications, 7(11863), 2016.
[47] H. G. Patsolic, Y. Park, V. Lyzinski, and C. E. Priebe. Vertex nomination via local neighborhood matching.arXiv preprint arXiv:1705.00674, 2017.
[48] T. A. N. Pham, X. Li, G. Cong, and Z. Zhang. A general recommendation model for heterogeneous networks.IEEE Transactions on Knowledge and Data Engineering, 28 (12):3140-3153, 2016.
[49] P. Resnick and H. R. Varian. Recommender systems.Communications of the ACM, 40(3): 56-58, 1997.
[50] F. Ricci, L. Rokach, and B. Shapira.Introduction to Recommender Systems Handbook. Springer, 2011. · Zbl 1214.68392
[51] G. Robins, P. Pattison, Y. Kalish, and D. Lusher. An introduction to exponential random graph (p∗) models for social networks.Social Networks, 29(2):173-191, 2007.
[52] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel.Annals of Statistics, 39:1878-1915, 2011. · Zbl 1227.62042
[53] S. Rothe and H. Sch¨utze. CoSimRank: A flexible & efficient graph-theoretic similarity measure. InProceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 1392-1402, 2014.
[54] T. Snijders, P. Pattison, G. Robins, and M. Handcock. New specifications for exponential random graph models.Sociological Methodology, 36(1):99-153, 2006.
[55] I. Steinwart. Support vector machines are universally consistent.Journal of Complexity, 18(3):768-791, 2002. · Zbl 1030.68074
[56] C. Stone. Consistent nonparametric regression.Annals of Statistics, 5:595-645, 1977. · Zbl 0366.62051
[57] M. Sun and C. E. Priebe. Efficiency investigation of manifold matching for text document classification.Pattern Recognition Letters, 34(11):1263-1269, 2013.
[58] D. L. Sussman, M. Tang, D. E. Fishkind, and C. E. Priebe. A consistent adjacency spectral embedding for stochastic blockmodel graphs.Journal of the American Statistical Association, 107(499):1119-1128, 2012. · Zbl 1443.62188
[59] D. L. Sussman, V. Lyzinski, Y. Park, and C. E. Priebe. Matched filters for noisy induced subgraph detection.arXiv preprint arXiv:1803.02423, 2018.
[60] S. Suwan, D. S. Lee, and C. E. Priebe. Bayesian vertex nomination using content and context.Wiley Interdisciplinary Reviews: Computational Statistics, 7(6):400-416, 2015.
[61] M. Tang, D. L. Sussman, and C. E. Priebe. Universally consistent vertex classification for latent positions graphs.The Annals of Statistics, 41(3):1406-1430, 2013. · Zbl 1273.62147
[62] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, Y. Park, and C. E. Priebe. A semiparametric two-sample hypothesis testing problem for random graphs.Journal of Computational and Graphical Statistics, 26(2):344-354, 2017a. · Zbl 1450.62040
[63] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, and C. E. Priebe. A nonparametric two-sample hypothesis testing problem for random dot product graphs.Bernoulli, 23(3): 1599-1630, 2017b. · Zbl 1450.62040
[64] D. Watts and S. H. Strogatz. Collective dynamics of small-world networks.Nature, 393 (6684):440, 1998. · Zbl 1368.05139
[65] J. Yoder, L. Chen, H. Pao, E. Bridgeford, K. Levin, D. E. Fishkind, C. E. Priebe, and V. Lyzinski.Vertex nomination: The canonical sampling and the extended spectral nomination schemes.arXiv preprint arXiv:1802.04960, 2018.
[66] S. Young and E. Scheinerman. Random dot product graph models for social networks. InProceedings of the 5th International Conference on Algorithms and Models for the Web-graph, pages 138-149, 2007. · Zbl 1136.05322
[67] D.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.