×

Link prediction in dynamic networks using random dot product graphs. (English) Zbl 07417903

Summary: The problem of predicting links in large networks is an important task in a variety of practical applications, including social sciences, biology and computer security. In this paper, statistical techniques for link prediction based on the popular random dot product graph model are carefully presented, analysed and extended to dynamic settings. Motivated by a practical application in cyber-security, this paper demonstrates that random dot product graphs not only represent a powerful tool for inferring differences between multiple networks, but are also efficient for prediction purposes and for understanding the temporal evolution of the network. The probabilities of links are obtained by fusing information at two stages: spectral methods provide estimates of latent positions for each node, and time series models are used to capture temporal dynamics. In this way, traditional link prediction methods, usually based on decompositions of the entire network adjacency matrix, are extended using temporal information. The methods presented in this article are applied to a number of simulated and real-world graphs, showing promising results.

MSC:

62M20 Inference from stochastic processes and prediction
05C80 Random graphs (graph-theoretic aspects)
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Abu-El-Haija S, Perozzi B, Al-Rfou R, Alemi AA (2018) Watch your step: Learning node embeddings via graph attention. In: Advances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc
[2] Arroyo-Relión JD, Athreya A, Cape J, Chen G, Priebe CE, Vogelstein JT (2020) Inference for multiple heterogeneous networks with a common invariant subspace. Journal of Machine Learning Research (to appear) · Zbl 07415085
[3] Arroyo-Relión, JD; Kessler, D.; Levina, E.; Taylor, SF, Network classification with applications to brain connectomics, Ann Appl Stat, 13, 3, 1648-1677 (2019) · Zbl 1433.62314
[4] Athreya, A.; Fishkind, DE; Tang, M.; Priebe, CE; Park, Y.; Vogelstein, JT; Levin, K.; Lyzinski, V.; Qin, Y.; Sussman, DL, Statistical inference on random dot product graphs: a survey, J Mach Learn Res, 18, 226, 1-92 (2018) · Zbl 1473.05279
[5] Benjamin, MA; Rigby, RA; Stasinopoulos, DM, Generalized autoregressive moving average models, J Am Stat Assoc, 98, 461, 214-223 (2003) · Zbl 1047.62076
[6] Brockwell, PJ; Davis, RA, Springer series in statistics. Time series: theory and methods (1987), New York: Springer, New York · Zbl 0604.62083
[7] Cai, H.; Zheng, VW; Chang, K., A comprehensive survey of graph embedding: problems, techniques, and applications, IEEE Trans Knowl Data Eng, 30, 9, 1616-1637 (2018)
[8] Charlin L, Ranganath R, McInerney J, Blei DM (2015) Dynamic poisson factorization. In: Proceedings of the 9th ACM conference on recommender systems. pp. 155-162
[9] Chen, B.; Li, F.; Chen, S.; Hu, R.; Chen, L., Link prediction based on non-negative matrix factorization, PLOS ONE, 12, 8, 1-18 (2017)
[10] Chen H, Li J (2018) Exploiting structural and temporal evolution in dynamic link prediction. In: Proceedings of the 27th ACM International conference on information and knowledge management. pp. 427-436
[11] CSIRO’s Data61: Stellargraph machine learning library. https://github.com/stellargraph/stellargraph (2018)
[12] Deng D, Shahabi C, Demiryurek U, Zhu L, Yu R, Liu Y (2016) Latent space model for road networks to predict time-varying traffic. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp. 1525-1534
[13] Dong, X.; Frossard, P.; Vandergheynst, P.; Nefedov, N., Clustering on multi-layer graphs via subspace analysis on Grassmann manifolds, IEEE Trans Signal Process, 62, 4, 905-918 (2014) · Zbl 1394.94973
[14] Dryden, IL; Mardia, KV, Statistical shape analysis, with applications in R (2016), Hoboken: John Wiley and Sons, Hoboken · Zbl 1381.62003
[15] Dunlavy, DM; Kolda, TG; Acar, E., Temporal link prediction using matrix and tensor factorizations, ACM Trans Knowl Discov Data, 5, 2, 1-27 (2011)
[16] Durante, D.; Dunson, DB, Nonparametric Bayes dynamic modelling of relational data, Biometrika, 101, 4, 883-898 (2014) · Zbl 1306.62075
[17] Durante, D.; Dunson, DB, Bayesian inference and testing of group differences in brain networks, Bayesian Anal, 13, 1, 29-58 (2018) · Zbl 06873717
[18] Gama, J.; Sebastião, R.; Rodrigues, PP, On evaluating stream learning algorithms, Mach Learn, 90, 3, 317-346 (2013) · Zbl 1260.68329
[19] Gao S, Denoyer L, Gallinari P (2011) Temporal link prediction by integrating content and structure information. In: Proceedings of the 20th ACM International conference on information and knowledge management. pp. 1169-1174
[20] Ghashami M, Liberty E, Phillips JM (2016) Efficient frequent directions algorithm for sparse matrices. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp. 845-854
[21] Ginestet, CE; Li, J.; Balachandran, P.; Rosenberg, S.; Kolaczyk, ED, Hypothesis testing for network data in functional neuroimaging, Ann Appl Stat, 11, 2, 725-750 (2017) · Zbl 1391.62217
[22] Gower, JC, Generalized Procrustes analysis, Psychometrika, 40, 1, 33-51 (1975) · Zbl 0305.62038
[23] Goyal P, Kamra N, He X, Liu Y (2017) DynGEM: Deep embedding method for dynamic graphs. In: IJCAI International Workshop on Representation Learning forGraphs,
[24] Goyal, P.; Rokka Chhetri, S.; Canedo, A., dyngraph2vec: capturing network dynamics using dynamic graph representation learning, Knowl-Based Syst, 187, 104816 (2020)
[25] Goyal P, Rokka Chhetri S, Mehrabi N, Ferrara E, Canedo A (2018) DynamicGEM: a library for dynamic graph embedding methods. arXiv e-prints
[26] Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp. 855-864
[27] Güneş, İ.; Gündüz-Öğüdücü, Ş.; Çataltepe, Z., Link prediction using time series of neighborhood-based node similarity scores, Data Min Knowl Discov, 30, 1, 147-180 (2016) · Zbl 1411.62268
[28] Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. In: Proceedings of the 31st International conference on neural information processing systems. pp. 1025-1035
[29] Hoff, PD; Raftery, AE; Handcock, MS, Latent space approaches to social network analysis, J Am Stat Assoc, 97, 460, 1090-1098 (2002) · Zbl 1041.62098
[30] Holland, PW; Laskey, KB; Leinhardt, S., Stochastic blockmodels: first steps, Soc Netw, 5, 2, 109-137 (1983)
[31] Hosseini, SA; Khodadadi, A.; Alizadeh, K.; Arabzadeh, A.; Farajtabar, M.; Zha, H.; Rabiee, HR, Recurrent Poisson factorization for temporal recommendation, IEEE Trans Knowl Data Eng, 32, 1, 121-134 (2020)
[32] Hyndman, R.; Khandakar, Y., Automatic time series forecasting: the forecast package for R, J Stat Softw, 27, 3, 1-22 (2008)
[33] Ishiguro, K.; Iwata, T.; Ueda, N.; Tenenbaum, JB, Dynamic infinite relational model for time-varying relational data analysis, Adv Neural Inf Process Syst, 23, 919-927 (2010)
[34] Jeske, DR; Stevens, NT; Tartakovsky, AG; Wilson, JD, Statistical methods for network surveillance, Appl Stoch Models Bus Ind, 34, 4, 425-445 (2018) · Zbl 1400.62344
[35] Jones A, Rubin-Delanchy P (2021) The multilayer random dot product graph
[36] Kauppi, H.; Saikkonen, P., Predicting U.S. recessions with dynamic binary response models, Rev Econ Stat, 90, 4, 777-791 (2008)
[37] Kazemi, SM; Goel, R.; Jain, K.; Kobyzev, I.; Sethi, A.; Forsyth, P.; Poupart, P., Representation learning for dynamic graphs: A survey, J Mach Learn Res, 21, 70, 1-73 (2020) · Zbl 07255101
[38] Khosla, M.; Setty, V.; Anand, A., A comparative study for unsupervised network representation learning, IEEE Trans Knowl Data Eng, 33, 5, 1807-1818 (2021)
[39] Kim Y, Levina E (2019) Graph-aware modeling of brain connectivity networks. arXiv e-prints arXiv:1903.02129
[40] Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations. ICLR. San Diego, CA, USA
[41] Kintzel, U., Procrustes problems in finite dimensional indefinite scalar product spaces, Linear Algebra Appl, 402, 1-28 (2005) · Zbl 1072.15035
[42] Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International conference on learning representations, ICLR 2017, Conference Track Proceedings
[43] Krivitsky, PN; Handcock, MS, A separable model for dynamic networks, J Royal Stat Soc: Series B (Statistical Methodology), 76, 1, 29-46 (2014) · Zbl 1411.90079
[44] Kumar S, Zhang X, Leskovec J (2019) Predicting dynamic embedding trajectory in temporal interaction networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining. pp. 1269-1278. KDD ’19
[45] Levin K, Athreya A, Tang M, Lyzinski V, Park Y, Priebe CE (2017) A central limit theorem for an omnibus embedding of multiple random graphs and implications for multiscale network inference. arXiv e-prints arXiv:1705.09355
[46] Li X, Du N, Li H, Li K, Gao J, Zhang A (2014) A deep learning approach to link prediction in dynamic networks. In: Proceedings of the 2014 SIAM International conference on data mining. pp. 289-297
[47] Liben-Nowell, D.; Kleinberg, J., The link-prediction problem for social networks, J Am Soc Inf Sci Technol, 58, 7, 1019-1031 (2007)
[48] Liu Z, Zhou D, He J (2019) Towards explainable representation of time-evolving graphs via spatial-temporal graph attention networks. In: Proceedings of the 28th ACM international conference on information and knowledge management. pp. 2137-2140. CIKM ’19
[49] Lü, L.; Zhou, T., Link prediction in complex networks: a survey, Phys A: Stat Mech Appl, 390, 6, 1150-1170 (2011)
[50] MacDonald, IL; Zucchini, W., Hidden Markov and other models for discrete-valued time series (1997), Milton Park: Taylor & Francis, Milton Park · Zbl 0868.60036
[51] Menon, AK; Elkan, C., Link prediction via matrix factorization, Joint Eur Conf Mach Learn Knowl Discov Datab, 437-452 (2011), Berlin: Springer, Berlin
[52] Metelli, S.; Heard, NA, On Bayesian new edge prediction and anomaly detection in computer networks, Ann Appl Stat, 13, 4, 2586-2610 (2019) · Zbl 1437.62234
[53] Neil, J.; Hash, C.; Brugh, A.; Fisk, M.; Storlie, CB, Scan statistics for the online detection of locally anomalous subgraphs, Technometrics, 55, 4, 403-414 (2013)
[54] Nguyen GH, Lee JB, Rossi RA, Ahmed NK, Koh E, Kim S (2018) Continuous-time dynamic network embeddings. In: Companion proceedings of the the web conference 2018. pp. 969-976. WWW ’18
[55] Nielsen AM, Witten D (2018) The multiple random dot product graph model. arXiv e-prints arXiv:1811.12172
[56] Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International conference on knowledge discovery and data mining. pp. 701-710. KDD ’14
[57] Priebe CE, Park Y, Tang M, Athreya A, Lyzinski V, Vogelstein JT, Qin Y, Cocanougher B, Eichler K, Zlatic M, Cardona A (2017) Semiparametric spectral modeling of the drosophila connectome
[58] Qiu J, Dong Y, Ma H, Li J, Wang K, Tang J (2018) Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and node2vec. In: Proceedings of the eleventh ACM International conference on web search and data mining. pp. 459-467. WSDM ’18, Association for Computing Machinery
[59] Qu L, Zhu H, Duan Q, Shi Y (2020) Continuous-time link prediction via temporal dependent graph neural network. In: Proceedings of the web conference 2020. pp. 3026-3032. WWW ’20
[60] Rubin-Delanchy P, Priebe CE, Tang M, Cape J (2017) A statistical interpretation of spectral embedding: the generalised random dot product graph. arXiv e-prints
[61] Sankar A, Wu Y, Gou L, Zhang W, Yang H (2020) DySAT: deep neural representation learning on dynamic graphs via self-attention networks. In: Proceedings of the 13th International conference on web search and data mining. pp. 519-527
[62] Sarkar, P.; Chakrabarti, D.; Jordan, M., Nonparametric link prediction in large scale dynamic networks, Electr J Stat, 8, 2, 2022-2065 (2014) · Zbl 1302.62096
[63] Sarkar, P.; Moore, AW, Dynamic social network analysis using latent space models, Adv Neural Inf Process Syst, 18, 1145-1152 (2006)
[64] Schein A, Paisley J, Blei DM, Wallach H (2015) Bayesian Poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts. In: Proceedings of the 21th ACM SIGKDD International conference on knowledge discovery and data mining. pp. 1045-1054
[65] Scheinerman, ER; Tucker, K., Modeling graphs using dot product representations, Comput Stat, 25, 1, 1-16 (2010) · Zbl 1223.62109
[66] Schönemann, PH, A generalized solution of the orthogonal Procrustes problem, Psychometrika, 31, 1, 1-10 (1966) · Zbl 0147.19401
[67] Sewell, DK; Chen, Y., Latent space models for dynamic networks, J Am Stat Assoc, 110, 512, 1646-1657 (2015) · Zbl 1373.62580
[68] Sharan U, Neville J (2008) Temporal-relational classifiers for prediction in evolving domains. In: Proceedings of the 2008 Eighth IEEE International conference on data mining. pp. 540-549
[69] Shiga, M.; Mamitsuka, H., A variational Bayesian framework for clustering with multiple graphs, IEEE Trans Knowl Data Eng, 24, 4, 577-590 (2012)
[70] Tang W, Lu Z, Dhillon IS (2009) Clustering with multiple graphs. In: Proceedings of the 2009 Ninth IEEE International conference on data mining. pp. 1016-1021. ICDM ’09, IEEE Computer Society, Washington, DC, USA
[71] Turcotte MJM, Kent AD, Hash C (2018) Unified host and network data set, chap. 1, pp. 1-22. World Scientific
[72] Velickovic P, Fedus W, Hamilton WL, Liò P, Bengio Y, Hjelm RD (2019) Deep graph infomax. In: 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019
[73] Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining. pp. 1225-1234. KDD ’16
[74] Wang, S.; Arroyo, J.; Vogelstein, JT; Priebe, CE, Joint embedding of graphs, IEEE Trans Pattern Anal Mach Intell, 43, 4, 1324-1336 (2021)
[75] Xing, EP; Fu, W.; Song, L., A state-space mixed membership blockmodel for dynamic network tomography, Ann Appl Stat, 4, 2, 535-566 (2010) · Zbl 1194.62133
[76] Xu, KS; Hero, AO III, Dynamic stochastic blockmodels for time-evolving social networks, IEEE J Select Topics Signal Process, 8, 4, 552-562 (2014)
[77] Yang, C.; Priebe, CE; Park, Y.; Marchette, DJ, Simultaneous dimensionality and complexity model selection for spectral graph clustering, J Comput Graph Stat (2021)
[78] Young, SJ; Scheinerman, ER, Random dot product graph models for social networks, Algorithms and models for the web-graph, 138-149 (2007), Berlin: Springer, Berlin · Zbl 1136.05322
[79] Yu W, Aggarwal CC, Wang W (2017a) Temporally factorized network modeling for evolutionary network analysis. In: Proceedings of the Tenth ACM International conference on web search and data mining. pp. 455-464
[80] Yu W, Cheng W, Aggarwal CC, Chen H, Wang W (2017b) Link prediction with spatial and temporal consistency in dynamic networks. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence. pp. 3343-3349
[81] Zhang, Z.; Cui, P.; Zhu, W., Deep learning on graphs: a survey, IEEE Trans Knowl Data Eng (2020)
[82] Zhou D, Zheng L, Han J, He J (2020) A data-driven graph generative model for temporal interaction networks. In: Proceedings of the 26th ACM SIGKDD International conference on knowledge discovery and data mining. pp. 401-411. KDD ’20
[83] Zhu, L.; Guo, D.; Yin, J.; Steeg, GV; Galstyan, A., Scalable temporal latent space inference for link prediction in dynamic social networks, IEEE Trans Knowl Data Eng, 28, 10, 2765-2777 (2016)
[84] Zhu, M.; Ghodsi, A., Automatic dimensionality selection from the scree plot via the use of profile likelihood, Comput Stat Data Anal, 51, 2, 918-930 (2006) · Zbl 1157.62429
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.