×

Statistical inference on random dot product graphs: a survey. (English) Zbl 1473.05279

Summary: The random dot product graph (RDPG) is an independent-edge random graph that is analytically tractable and, simultaneously, either encompasses or can successfully approximate a wide range of random graphs, from relatively simple stochastic block models to complex latent position graphs. In this survey paper, we describe a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices. We examine the graph-inferential analogues of several canonical tenets of classical Euclidean inference. In particular, we summarize a body of existing results on the consistency and asymptotic normality of the adjacency and Laplacian spectral embeddings, and the role these spectral embeddings can play in the construction of single- and multi-sample hypothesis tests for graph data. We investigate several real-world applications, including community detection and classification in large social networks and the determination of functional and biologically relevant network properties from an exploratory data analysis of the Drosophila connectome. We outline requisite background and current open problems in spectral graph inference.

MSC:

05C80 Random graphs (graph-theoretic aspects)
05C76 Graph operations (line graphs, products, etc.)
05-02 Research exposition (monographs, survey articles) pertaining to combinatorics
62H10 Multivariate distribution of statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H15 Hypothesis testing in multivariate analysis
05C90 Applications of graph theory
05C50 Graphs and linear algebra (matrices, eigenvalues, etc.)
05C82 Small world graphs, complex networks (graph-theoretic aspects)

Software:

energy; mclust
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] E. Abbe and C. Sandon. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In Proceedings of the 56th Annual Symposium on Foundations of Computer Science, pages 670–688, 2015.
[2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 62:471–487, 2016. · Zbl 1359.94047
[3] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. The Journal of Machine Learning Research, 9:1981–2014, 2008. · Zbl 1225.68143
[4] H. Akaike.A new look at the statistical model identification.IEEE Transactions on Automatic Control, 19:716–723, 1974. · Zbl 0314.62039
[5] S. M. Ali and S. D. Shelvey. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society, Series B., 28:121–132, 1966.
[6] N. Alon and J. Spencer. The Probabilistic Method. John Wiley, 3 edition, 2008. · Zbl 1148.05001
[7] E. Arias-Castro and N. Verzelen. Community detection in dense random networks. Annals of Statistics, 42:940–969, 2014. · Zbl 1305.62035
[8] A. Athreya, V. Lyzinski, D. J. Marchette, C. E. Priebe, D. L. Sussman, and M. Tang. A limit theorem for scaled eigenvectors of random dot product graphs. Sankhya A, 78:1–18, 2016. · Zbl 1338.62061
[9] R. Bhatia. Matrix Analysis. Springer, 1997. · Zbl 0863.15001
[10] P. Bickel and P. Sarkar. Role of normalization for spectral clustering in stochastic blockmodels. Annals of Statistics, 43:962–990, 2015. · Zbl 1320.62150
[11] P. Bickel, D. Choi, X. Chang, and H. Zhang. Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Annals of Statistics, 41: 1922–1943, 2013. · Zbl 1292.62042
[12] P. J. Bickel and A. Chen. A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Sciences of the United States of America, 106:21068–21073, 2009. 86 · Zbl 1359.62411
[13] P. J. Bickel, A. Chen, and E. Levina. The method of moments and degree distributions for network models. Annals of Statistics, 39:38–59, 2011. · Zbl 1232.91577
[14] B. Bollob´as, S. Janson, and O. Riordan. The phase transition in inhomogeneous random graphs. Random Structures and Algorithms, 31:3–122, 2007.
[15] J. Cape, M. Tang, and C. E. Priebe. The Kato-Temple inequality and eigenvalue concentration. Electronic Journal of Statistics, 11:3954–3978, 2017a. · Zbl 1403.15011
[16] J. Cape, M. Tang, and C. E. Priebe.The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics.Arxiv preprint at https: //arxiv.org/abs/1705.10735, 2017b.
[17] S. Chatterjee. Matrix estimation by universal singular value thresholding. Annals of Statistics, 43:177–214, 2015. · Zbl 1308.62038
[18] L. Chen, J.T. Vogelstein, V. L. Lyzinski, and C. E. Priebe. A joint graph inference case study: the C. Elegans chemical and electrical connectomes. Worm, 5:e1142041, 2016.
[19] H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493–507, 1952. · Zbl 0048.11804
[20] H. Chernoff. Large sample theory: Parametric case. Annals of Mathematical Statistics, 27: 1–22, 1956. · Zbl 0072.35703
[21] F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, 1997. · Zbl 0867.05046
[22] K. L. Chung. A course in probability theory. Academic Press, 3 edition, 2001.
[23] G. A. Coppersmith. Vertex nomination. Wiley Interdisciplinary Reviews: Computational Statistics, 6:144–153, 2014.
[24] I. Csiz´ar. Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 2:229–318, 1967.
[25] C. Davis and W. Kahan. The rotation of eigenvectors by a pertubation. III. SIAM Journal on Numerical Analysis, 7:1–46, 1970. · Zbl 0198.47201
[26] C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1:211–218, 1936. · JFM 62.1075.02
[27] K. Eichler, F. Li, A. L. Kumar, Y. Park, I. Andrade, C. Schneider-Mizell, T. Saumweber, A. Huser, D. Bonnery, B. Gerber, R. D. Fetter, J. W. Truman, C. E. Priebe, L. F. Abbott, A. Thum, M. Zlatic, and A. Cardona. The complete wiring diagram of a highorder learning and memory center, the insect mushroom body. Nature, 548:175–182, 2017.
[28] P. Erd˝os and A. R´enyi. On the evolution of random graphs. Proceedings of the Hungarian Academy of Sciences, pages 17–61, 1960. 87 · Zbl 0103.16301
[29] M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23:298– 305, 1973. · Zbl 0265.05119
[30] D. E. Fishkind, D. L. Sussman, M. Tang, J. T. Vogelstein, and Carey E Priebe. Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM Journal on Matrix Analysis and Applications, 34:23–39, 2013. · Zbl 1314.05186
[31] D. E. Fishkind, V. Lyzinski, H. Pao, L. Chen, and C. E. Priebe. Vertex nomination schemes for membership prediction. Annals of Applied Statistics, 9:1510–1532, 2015a. · Zbl 1454.62180
[32] D. E. Fishkind, C. Shen, and C. E. Priebe.On the incommensurability phenomenon. Journal of Classification, 33:185–209, 2015b. · Zbl 1349.62249
[33] S. Fortunato. Community detection in graphs. Physics Reports, 486:75–174, 2010.
[34] C. Fraley and A. E. Raftery. MCLUST: Software for model-based cluster analysis. Journal of Classification, 16:297–306, 1999. · Zbl 0951.91500
[35] C. Fraley and A. E. Raftery. Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association, 97:611–631, 2002. · Zbl 1073.62545
[36] Z. F¨uredi and J. Koml´os. The eigenvalues of random symmetric matrices. Combinatorica, 1:233–241, 1981.
[37] E. N. Gilbert. Random graphs. Annals of Statistics, 30:1141–1144, 1959. · Zbl 0168.40801
[38] A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi. A survey of statistical network models. Foundations and Trends® in Machine Learning, 2:129–233, 2010. · Zbl 1184.68030
[39] A. Gretton, K. M. Borgwadt, M. J. Rasch, B. Sch¨olkopf, and A. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723–773, 2012.
[40] P. Hoff. Modeling homophily and stochastic equivalence in symmetric relational data. In Proceedings of the 20th Conference on Neural Information Processing Systems, pages 657–664, 2007.
[41] P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97:1090–1098, 2002. · Zbl 1041.62098
[42] P. W. Holland, K. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5:109–137, 1983.
[43] R. Horn and C. Johnson. Matrix Analysis. Cambridge University Press, 1985. · Zbl 0576.15001
[44] J. E. Jackson. A User’s Guide to Principal Components. John Wiley & Sons, Inc., 2004. · Zbl 0743.62047
[45] A. K. Jain, R. P. W. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:4–37, 2000.
[46] B. Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in networks. Physical Review E, 83, 2011. 88
[47] T. Kato. On the upper and lower bounds of eigenvalues. Physical Review Letters, 77: 334–339, 1950.
[48] T. Kawamoto and Y. Kabashima. Limitations in the spectral method for graph partitioning: detectability threshold and localization of eigenvectors. Physical Review E, 91, 2015.
[49] E. D. Kolaczyk. Statistical Analysis of Network Data. Springer-Verlag, 2009. · Zbl 1277.62021
[50] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborov´a, and P. Zhang. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences of the United States of America, 110:20935–20940, 2013. · Zbl 1359.62252
[51] B. A. Landman, A. J. Huang, A. Gifford, D. S. Vikram, I. A. Lim, J. A. Farrell, et al. Multi-parametric neuroimaging reproducibility: a 3-t resource study. Neuroimage, 54: 2854–2866, 2011.
[52] C. Le, E. Levina, and R. Vershynin. Concentration and regularization of random graphs. Random Structures & Algorithms, 51:538–561, 2017. · Zbl 1373.05179
[53] C. C. Leang and D. H. Johnson. On the asymptotics of M-hypothesis bayesian detection. IEEE Transactions on Information Theory, 43:280–282, 1997. · Zbl 0868.94013
[54] J. Lei. A goodness-of-fit test for stochastic block models. Annals of Statistics, 44:401–424, 2016. · Zbl 1331.62283
[55] J. Lei and A. Rinaldo. Consistency of spectral clustering in stochastic blockmodels. Annals of Statistics, 43:215–237, 2015. · Zbl 1308.62041
[56] K. Levin and V. Lyzinski. Laplacian eigenmaps from sparse, noisy similarity measurements. IEEE Transactions on Signal Processing, 65:1988–2003, 2017.
[57] K. Levin, A. Athreya, M. Tang, V. Lyzinski, and C. E. Priebe. A central limit theorem for an omnibus embedding of random dot product graphs. arXiv preprint arXiv:1705.09355, 2017.
[58] F. Liese and I. Vadja. On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52:4394–4412, 2006. · Zbl 1287.94025
[59] B. G. Lindsay. The geometry of mixture likelihoods: A general theory. Annals of Statistics, 11:86–94, 1983. · Zbl 0512.62005
[60] L. Lu and X. Peng. Spectra of edge-independent random graphs. Electronic Journal of Combinatorics, 20, 2013. · Zbl 1295.05211
[61] R. Lyons. Distance covariance in metric spaces. Annals of Probability, 41:3284–3305, 2013. · Zbl 1292.62087
[62] V. Lyzinski, D. L. Sussman, M. Tang, A. Athreya, and C. E. Priebe. Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Electronic Journal of Statistics, 8:2905–2922, 2014. 89 · Zbl 1308.62131
[63] V. Lyzinski, D. E. Fishkind, M. Fiori, J. T. Vogelstein, C. E. Priebe, and G. Sapiro. Graph matching: Relax at your own risk. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38:60–73, 2016a.
[64] V. Lyzinski, K. Levin, D. Fishkind, and C. E. Priebe. On the consistency of the likelihood maximization vertex nomination scheme: Bridging the gap between maximum likelihood estimation and graph matching. Journal of Machine Learning Research, 17, 2016b. · Zbl 1392.62189
[65] V. Lyzinski, M. Tang, A. Athreya, Y. Park, and C. E. Priebe.Community detection and classification in hierarchical stochastic blockmodels. IEEE Transactions in Network Science and Engineering, 4:13–26, 2017.
[66] J.-F. Maa, D. K. Pearl, and R. Bartoszy´nski. Reducing multidimensional two-sample data to one-dimensional interpoint comparisons. Annals of Statistics, 24:1067–1074, 1996. · Zbl 0862.62047
[67] D. Marchette, C. E. Priebe, and G. Coppersmith. Vertex nomination via attributed random dot product graphs. In Proceedings of the 57th ISI World Statistics Congress, 2011.
[68] E. Mossel, J. Neeman, and A. Sly. Belief propagation, robust reconstruction and optimal recovery of block models. In Proceedings of the 27th Annual Conference on Learning Theory, pages 356–370, 2014. · Zbl 1350.05154
[69] E. Mossel, J. Neeman, and A. Sly. A proof of the block model threshold conjecture. Combinatorica, 2018. In press.
[70] F. Mosteller and R. A. Fisher. Questions and answer. The American Statistician, 2:30–31, 1948.
[71] M. E. J. Newman. Modularity and community structure in networks. Proccedings of the National Academy of Sciences of the United States of America, 103:8577–8582, 2006.
[72] T. Ohyama, C. M. Schneider-Mizell, R. D. Fetter J. V. Aleman, R. Franconville, M. RiveraAlba, B. D. Mensh, K. M. Branson, J. H. Simpson, J. W. Truman, A. Cardona, and M. Zlatic. A multilevel multimodal circuit enhances action selection in Drosophila. Nature, 520:633–639, 2015.
[73] R. I. Oliveira. Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges. http://arxiv.org/abs/0911.0600, 2009.
[74] D. Pollard. Strong consistency of k-means clustering. Annals of Statistics, 9:135–140, 1981. · Zbl 0451.62048
[75] C. E. Priebe, D. L. Sussman, M. Tang, and J. T. Vogelstein.Statistical inference on errorfully observed graphs. Journal of Computational and Graphical Statistics, 24:930– 953, 2015.
[76] C. E. Priebe, Y. Park, M. Tang, A. Athreya, V. Lyzinski, J. T. Vogelstein, Y. Qin, B. Cocanougher, K. Eichler, M. Zlatic, and A. Cardona. Semiparametric spectral modeling of the drosophila connectome. arXiv preprint at https://arxiv.org/abs/1705.03297, 2017. 90
[77] J. Rissanen. Modeling by shortest data description. Automatica, 14:465–471, 1978. · Zbl 0418.93079
[78] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic blockmodel. Annals of Statistics, 39:1878–1915, 2011. · Zbl 1227.62042
[79] W. G. Roncal, Z. H. Koterba, D. Mhembere, D. M. Kleissas, J. T. Vogelstein, R. Burns, et al. Migraine: MRI graph reliability analysis and inference for connectomics. Arxiv preprint at http://arxiv.org/abs/1312.4875, 2013.
[80] P. Rubin-Delanchy, C. E. Priebe, and M. Tang. The generalised random dot product graph. Arxiv preprint available at http://arxiv.org/abs/1709.05506, 2017.
[81] E. R. Scheinerman and K. Tucker. Modeling graphs using dot product representations. Computational statistics, 25:1–16, 2010. · Zbl 1223.62109
[82] C. M. Schneider-Mizell, S. Gerhard, M. Longair, T. Kazimiers, F. Li, M. F. Zwart, A. Champion, F. M. Midgley, R. D. Fetter, S. Saalfeld, and A. Cardona.Quantitative neuroanatomy for connectomics in Drosophila. Elife, 5:e12059, 2016.
[83] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:461–464, 1978. · Zbl 0379.62005
[84] D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu. Equivalence of distancebased and RKHS-based statistics in hypothesis testing. Annals of Statistics, 41:2263– 2291, 2013. · Zbl 1281.62117
[85] T. A. B. Snijders and K.Nowicki. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of Classification, 14:75–100, 1997. · Zbl 0896.62063
[86] B. K. Sriperumbudur, K. Fukumizu, and G. R. G. Lanckriet. Universality, characteristic kernels and RKHS embeddings of measures. Journal of Machine Learning Research, 12: 2389–2410, 2011. · Zbl 1280.68198
[87] I. Steinwart. On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2:67–93, 2001. · Zbl 1009.68143
[88] D. L. Sussman, M. Tang, D. E. Fishkind, and C. E. Priebe. A consistent adjacency spectral embedding for stochastic blockmodel graphs. Journal of the American Statistical Association, 107:1119–1128, 2012. · Zbl 1443.62188
[89] D. L. Sussman, M. Tang, and C. E. Priebe. Consistent latent position estimation and vertex classification for random dot product graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36:48–57, 2014.
[90] S. Suwan, D. S Lee, R. Tang, D.L. Sussman, M. Tang, and C.E. Priebe. Empirical Bayes estimation for the stochastic blockmodel. Electronic Journal of Statistics, 10:761–782, 2016. · Zbl 1333.62088
[91] G. J. Sz´ekely and M. L. Rizzo. Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143:1249–1272, 2013. 91 · Zbl 1278.62072
[92] M. Tang and C. E. Priebe. Limit theorems for eigenvectors of the normalized laplacian for random graphs. Annals of Statistics, 2018. In press. · Zbl 1408.62120
[93] M. Tang, D. L. Sussman, and C. E. Priebe. Universally consistent vertex classification for latent position graphs. Annals of Statistics, 41:1406 – 1430, 2013. · Zbl 1273.62147
[94] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, Y. Park, and C. E. Priebe. A semiparametric two-sample hypothesis testing problem for random dot product graphs. Journal of Computational and Graphical Statistics, 26:344–354, 2017a.
[95] M. Tang, A. Athreya, D. L. Sussman, V. Lyzinski, and C. E. Priebe. A nonparametric two-sample hypothesis testing problem for random dot product graphs. Bernoulli, 23: 1599–1630, 2017b. · Zbl 1450.62040
[96] R. Tang, M. Ketcha, J. T. Vogelstein, C. E. Priebe, and D. L. Sussman. Laws of large graphs. arXiv preprint at http://arxiv.org/abs/1609.01672, 2016.
[97] T. Tao and V. Vu. Random matrices: Universal properties of eigenvectors. Random Matrices: Theory and Applications, 1, 2012. · Zbl 1248.15031
[98] J. A. Tropp. An introduction to matrix concentration inequalities. Foundations and Trends in Machine Learning, 8:1–230, 2015. · Zbl 1391.15071
[99] U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17:395–416, 2007.
[100] Y. J. Wang and G. Y. Wong. Stochastic blockmodels for directed graphs. Journal of the American Statistical Association, 82:8–19, 1987. · Zbl 0613.62146
[101] P. J. Wolfe and S. C. Olhede.Nonparametric graphon estimation.arXiv preprint at http://arxiv.org/abs/1309/5936, 2013.
[102] S. Young and E. Scheinerman. Random dot product graph models for social networks. In Proceedings of the 5th international conference on algorithms and models for the webgraph, pages 138–149, 2007. · Zbl 1136.05322
[103] Y. Yu, T. Wang, and R. J. Samworth. A useful variant of the Davis-Kahan theorem for statisticians. Biometrika, 102:315–323, 2015. · Zbl 1452.15010
[104] D. Zheng, D. Mhembere, R. Burns, J. T. Vogelstein, C. E. Priebe, and A. S. Szalay. Flashgraph: Processing billion-node graphs on an array of commodity SSDs. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, 2015.
[105] M. Zhu and A. Ghodsi. Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics and Data Analysis, 51:918–930, 2006. · Zbl 1157.62429
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.