×

zbMATH — the first resource for mathematics

Review of statistical network analysis: models, algorithms, and software. (English) Zbl 07260329
Summary: The analysis of network data is an area that is rapidly growing, both within and outside of the discipline of statistics.
This review provides a concise summary of methods and models used in the statistical analysis of network data, including the Erdős-Renyi model, the exponential family class of network models, and recently developed latent variable models. Many of the methods and models are illustrated by application to the well-known Zachary karate dataset. Software routines available for implementing methods are emphasized throughout.
The aim of this paper is to provide a review with enough detail about many common classes of network models to whet the appetite and to point the way to further reading.

MSC:
62 Statistics
68 Computer science
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] N. A. Christakis and J. H. Fowler, The spread of obesity in a large social network over 32 years, New Engl J Med 357(4) (2007), 370-379.
[2] N. A. Christakis and J. H. Fowler, The collective dynamics of smoking in a large social network, New Engl J Med 358(21) (2008), 2249-2258.
[3] J. H. Fowler and N. A. Christakis, The dynamic spread of happiness in a large social network: Longitudinal analysis over 20 years in the Framingham heart study, Brit Med J 337 (2008), a2338.
[4] J. N. Rosenquist, J. Murabito, J. H. Fowler, and N. A. Christakis, The spread of alcohol consumption behavior in a large social network, Ann Intern Med 152(7) (2010), 426-433.
[5] R. Lyons, The spread of evidence-poor medicine via flawed social-network analysis, Stat Polit Policy 2(1) (2011), Article 2.
[6] A. L. Traud, E. D. Kelsic, P. J. Mucha, and M. A. Porter, Comparing community structure to characteristics in online collegiate social networks, SIAM Rev 53(3) (2011), 526-543.
[7] J. Wu, T. Vallenius, K. Ovaska, J. Westermarck, T. P. Makela, and S. Hautaniemi, Integrated network analysis platform for protein-protein interactions, Nat Meth 6 (2009), 75-77.
[8] V. E. Krebs, Mapping networks of terrorist cells, Connections 24 (2002), 43-52.
[9] P. De, A. Singh, T. Wong, W. Yacoub, and A. Jolly, Sexual network analysis of a gonorrhoea outbreak, Sexually Transmit Infect 80 (2004), 280-285.
[10] T. E. Senator, H. G. Goldberg, J. Wooton, M. A. Cottini, A. F. Umar Khan, C. D. Klinger, W. M. Llamas, M. P. Marrone, and R. W. H. Wong, The FinCEN artificial intelligence system: identifying potential money laundering from reports of large cash transactions, In Proceedings Of The Seventh Conference On Innovative Applications Of Artificial Intelligence, J. Atkins and H. Shrobe, eds, 1995. 156-170.
[11] D. Lazer, I. Mergel, and A. Friedman, Co-citation of prominent social network articles in sociology journals: the evolving canon, Connections 29 (2009), 43-64.
[12] S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications, Cambridge, UK, Cambridge University Press, 1994. · Zbl 0926.91066
[13] P. J. Carrington, J. Scott, and S. Wasserman, Models and Methods in Social Network Analysis, Cambridge, UK, Cambridge University Press, 2005.
[14] S. Wasserman, G. Robins, and D. Steinley, Statistical models for networks: a brief review of some recent research, Proceedings of the 2006 Conference on Statistical Network Analysis, ICML’06, Berlin/Heidelberg, SpringerVerlag, 2007, 45-56.
[15] E. M. Airoldi, D. M. Blei, S. E. Fienberg, A. Goldberg, E. P. Xing, and A. X. Zheng, Statistical Network Analysis: Models, Issues and New Directions, volume 4503, Lecture Notes in Computer Science, Berlin, Springer, 2007.
[16] A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi, A survey of statistical network models, Foundat Trend Mach Learn 2 (2010), 129-233. · Zbl 1184.68030
[17] W. Zachary, An information flow model for conflict and fission in small groups, J Anthropol Res 33(4) (1977), 452-473.
[18] J. Friedman, T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9(3) (2008), 432-441. · Zbl 1143.62076
[19] N. Meinshausen and P. B¨uhlmann, High-dimensional graphs and variable selection with the Lasso, Ann Stat 34(3) (2006), 1436-1462. · Zbl 1113.62082
[20] D. R. Hunter and M. S. Handcock, Inference in curved exponential family models for networks, J Comput Graph Stat 15 (2006), 565-583.
[21] G. Robins, T. A. B. Snijders, P. Wang, and M. S. Handcock, Recent developments in exponential random graph (p∗) models for social networks, Social Netw 29(2) (2006), 192-215.
[22] K. A. Stephenson and M. Zelen, Rethinking centrality: methods and examples, Social Netw 11 (1989), 1-37.
[23] P. Bonacich, Power and centrality: a family of measures, Am J Sociol 92(5) (1987), 1170-1182.
[24] P. Bonacich, Some unique properties of eigenvector centrality, Social Netw 29(4) (2007), 555-564.
[25] S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Computer Netw ISDN Syst 30(1-7) (1998), 107-117.
[26] J. M. Kleinberg, Authoritative sources in a hyperlinked environment, J ACM 46 (1999), 668-677. · Zbl 0930.68047
[27] M. Girvan and E. J. Newman, Community structure in social and biological networks, Proc Natl Acad Sci 99(12) (2002), 7821-7826. · Zbl 1032.91716
[28] J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans Pattern Anal Mach Intell 22(8) (2000), 888-905.
[29] A. Y. Ng, M. I. Jordan, and Y. Weiss, On spectral clustering: analysis and an algorithm, Adv Neural Inform Process Syst, Cambridge, MA, USA, MIT Press, 2001, 849-856.
[30] U. von Luxburg, A tutorial on spectral clustering, Stat Comput 17 (2007), 395-416.
[31] S. White and P. Smyth, A spectral clustering approach to finding communities in graphs, In Proceedings of the Fifth SIAM International Conference on Data Mining, vol. 119, 2005, 274-285.
[32] M. E. J. Newman, Modularity and community structure in networks, Proc Natl Acad Sci 103(23) (2006), 8577-8582.
[33] L. Danon, A. D´ıaz-Guilera, J. Duch, and A. Arenas, Comparing community structure identification, J Stat Mech Theory Exper 2005(9) (2005), P09008-. · Zbl 07077983
[34] U. Brandes, D. Delling, M. Gaertler, R. G¨orke, M. Hoefer, Z. Nikoloski, and D. Wagner, On finding graph clusterings with maximum modularity, In Graph-Theoretic Concepts in Computer Science, vol. 4769 Lecture Notes in Computer Science, Andreas Brandst¨adt, Dieter Kratsch, and Haiko M¨uller, eds, 2007, 121-132. · Zbl 1141.68519
[35] U. Brandes, D. Delling, M. Gaertler, R. G¨orke, M. Hoefer, Z. Nikoloski, and D. Wagner, On modularity clustering, IEEE Trans Knowled Data Eng 20(2) (2008), 172-188.
[36] S. Fortunato and M. Barth´elemy, Resolution limit in community detection, Proc Natl Acad Sci 104(1) (2007), 36-41.
[37] G. Palla, I. Der´enyi, I. Farkas, and T. Vicsek, Uncovering the overlapping community structure of complex networks in nature and society, Nature 435(7043) (2005), 814-818.
[38] B. Adamcsek, G. Palla, I. J. Farkas, I. Der´enyi, and T. Vicsek, CFinder: locating cliques and overlapping modules in biological networks, Bioinformatics 22(8) (2006), 1021-1023.
[39] P. Berkhin, A survey on PageRank computing, Internet Math 2 (2005), 73-120. · Zbl 1100.68504
[40] P. Sargolzaei and F. Soleymani, Pagerank problem, survey and future research directions, Int Math Forum 5(19) (2010), 937-956. · Zbl 1192.68205
[41] D. Zhou and B. Sch¨olkopf, A regularization framework for learning from graph data, In ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields, 2004, 132-137.
[42] D. Zhou, B. Sch¨olkopf, and T. Hofmann, Semi-supervised learning on directed graphs, Biol Cyber 17 (2005), 1633-1640.
[43] J. Callut, K. Franc¸oisse, M. Saerens, and P. Dupont, Semi-supervised classification from discriminative random walks, Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I, Berlin/Heidelberg, Springer-Verlag, ECML PKDD’08, 2008, 162-177.
[44] X. Zhu, Semi-supervised learning literature survey (revised edition), Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison, 2008.
[45] X. Zhu and A. B. Goldberg, Introduction to semisupervised learning, Synth Lect Artif Intell Mach Learn 3(1) (2009), 1-130.
[46] A. Mantrach, N. van Zeebroeck, P. Francq, M. Shimbo, H. Bersini, and M. Saerens, Semi-supervised classification and betweenness computation on large, sparse, directed graphs, Pattern Recogn 44(6) (2011), 1212-1224. · Zbl 1209.68418
[47] G. Csardi and T. Nepusz, The igraph software package for complex network research, InterJournal Complex Syst 1695 (2006).
[48] R Development Core Team, R: A Language and Environment for Statistical Computing, Vienna, Austria, R Foundation for Statistical Computing, 2010, ISBN:3-900051-07-0.
[49] M. S. Handcock, D. R. Hunter, C. T. Butts, S. M. Goodreau, P. N. Krivitsky, and M. Morris, statnet: Software Tools for the Statistical Modeling of Network Data, Seattle, WA, 2003, version 2.6. Project home page at http://statnet. org.
[50] C. T. Butts, sna: Tools for Social Network Analysis, R package version 2.1-0, Irvine, University of California, 2010.
[51] M. S. Handcock, D. R. Hunter, C. T. Butts, S. M. Goodreau, P. N. Krivitsky, and M. Morris, ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks, version 2.4-2, Seattle, WA, 2011.
[52] C. T. Butts, M. S. Handcock, and D. R. Hunter, network: Classes for Relational Data, R package version 1.7, Irvine, CA, 2011.
[53] M. Culp, spa: Semi-supervised semi-parametric graphbased estimation in R, J Stat Softw 40(10) (2011), 1-29.
[54] P. Arabie and L. J. Hubert, An overview of combinatorial data analysis, In Clustering and Classification, pages P. Arabie, L. J. Hubert, and G. De Soete, eds. River Edge, NJ, World Scientific, 1996, 5-63. · Zbl 0902.62005
[55] E. M. Reingold and J. S. Tilford, Tidier drawings of trees, IEEE Transactions on Software Engineering 7 (1981), 223-228.
[56] T. M. J. Fruchterman and E. M. Reingold, Graph drawing by force-directed placement, Softw Pract Exper 21(11) (1991), 1129-1164.
[57] T. Kamada and S. Kawai, An algorithm for drawing general undirected graphs, Inform Process Lett 31 (1989), 7-15. · Zbl 0679.68128
[58] N. Elmqvist and J.-D. Fekete, Hierarchical aggregation for information visualization: overview, techniques, and design guidelines, IEEE Trans Visual Comput Graph 16(3) (2010), 439-454.
[59] P. Simonetto, D. Auber, and D. Archambault, Fully automatic visualisation of overlapping sets, Comput Graph Forum 28(3) (2009), 967-974.
[60] I. Herman, G. Melancon, and M. S. Marshall, Graph visualization and navigation in information visualization: a survey, IEEE Trans Visual Comput Graph 6(1) (2000), 24-43.
[61] M. Kaufmann and D. Wagner, Drawing Graphs: Methods and Models, vol. 2025, Lecture Notes in Computer Science, Berlin/Heidelberg, Springer, 2001. · Zbl 0977.68644
[62] T. von Landesberger, A. Kuijper, T. Schreck, J. Kohlhammer, J. J. van Wijk, J.-D. Fekete, and D. W. Fellner, Visual analysis of large graphs: state-of-the-art and future research challenges, Comput Graph Forum 30(6) (2011), 1719-1749.
[63] Various. RSiena: Siena - Simulation Investigation for Empirical Network Analysis, R package version 1.0.12.167, 2011.
[64] M. Baur, M. Benkert, U. Brandes, S. Cornelsen, M. Gaertler, B. K¨opf, J. Lerner, and D. Wagner, Software for visual social network analysis, In Graph Drawing, vol. 2265, Lecture Notes in Computer Science, P. Mutzel, M. J¨unger, and S. Leipert, eds. Berlin/Heidelberg, Springer, 2002, 554-557.
[65] V. Batagelj, and A. Mrvar, Pajek-analysis and visualization of large networks, In Graph Drawing, vol. 2265, Lecture Notes in Computer Science, pages P. Mutzel, M. J¨unger, and S. Leipert, eds. Berlin/Heidelberg, Springer, 2002, 8-11. · Zbl 1054.68564
[66] D. Auber, Tulip : A huge graph visualisation framework, In Graph Drawing Softwares, Mathematics and Visualization, P. Mutzel and M. J¨unger, eds. Berlin/Heidelberg, SpringerVerlag, 2003, 105-126.
[67] M. Bastian, S. Heymann, and M. Jacomy, Gephi: An open source software for exploring and manipulating networks, In Proceedings of the Third International Conference on Weblogs and Social Media (ICWSM 2009), E. Adar, M. Hurst, T. Finin, N. S. Glance, N. Nicolov, and B. L. Tseng, eds. California, USA, The AAAI Press, 2009.
[68] P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B. Schwikowski, and T. Ideker, Cytoscape: a software environment for integrated models of biomolecular interaction networks, Genome Res 13 (2003), 2498-2504.
[69] P. Erd˝os and A. R´enyi, On random graphs I, Publ Math Debrecen 6 (1959), 290-297.
[70] P. Erd˝os and A. R´enyi, On the evolution of random graphs, Publ Math Inst Hungarian Acad Sci 5 (1960), 17-61.
[71] M. Draief and L. Massouil´e, Epidemics and Rumours in Complex Networks, Number 369 in London Mathematical Society Lecture Notes Series, Cambridge, UK, Cambridge University Press, 2009.
[72] P. W. Holland, and S. Leinhardt, An exponential family of probability distributions for directed graphs, J Am Stat Assoc 76(373) (1981), 33-50. · Zbl 0457.62090
[73] S. E. Fienberg, and S. Wasserman, Discussion of “An exponential family of probability distributions for directed graphs” by Holland and Leinhardt, J Am Stat Assoc 76(373) (1981), 54-57.
[74] M. A. J. van Duijn, T. A. B. Snijders, and B. J. H. Zijlstra, p2: a random effects model with covariates for directed graphs, Stat Neerland 58(2) (2004), 234-254. · Zbl 1050.62117
[75] M. Huisman and M. A. J. van Duijn, Software for social network analysis, In Models and methods in social network analysis, P. J. Carrington, J. Scott, and S. Wasserman, eds. Cambridge, UK, Cambridge University Press, 2004.
[76] P. W. Holland, K. B. Laskey, and S. Leinhardt, Stochastic blockmodels: First steps, Social Netw 5(2) (1983), 109-137.
[77] F. Lorrain and H. C. White, Structural equivalence of individuals in social networks, J Math Sociol 1(1) (1971), 49-80.
[78] Y. J. Wang and G. Y. Wong, Stochastic blockmodels for directed graphs, J Am Stat Assoc 82(397) (1987), 8-19. · Zbl 0613.62146
[79] K. Nowicki and T. A. B. Snijders, Estimation and prediction of stochastic blockstructures, J Am Stat Assoc 96(455) (2001), 1077-1087. · Zbl 1072.62542
[80] O. Frank, and D. Strauss, Markov graphs, J Am Stat AssocJ Am Stat Assoc 81(395) (1986), 832-842.
[81] S. Wasserman, and P. Pattison, Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp∗, Psychometrika 61 (1996), 401-425. doi:10.1007/BF02294547. · Zbl 0866.92029
[82] P. N. Krivitsky, M. S. Handcock, and M. Morris, Adjusting for network size and composition effects in exponentialfamily random graph models, Statistical Methodology 8(4) (2011), 319-339. · Zbl 1215.91069
[83] C. J. Anderson, S. Wasserman, and B. Crouch, A p* primer: logit models for social networks, Social Netw 21(1) (1999), 37-66.
[84] G. Robins, P. Pattison, Y. Kalish, and D. Lusher, An introduction to exponential random graph (p∗) models for social networks, Social Net 29(2) (2007), 173-191.
[85] S. Chatterjee, P. Diaconis, and A. Sly, Random graphs with a given degree sequence, Ann Appl Probab 21(4) (2011), 1400-1435. · Zbl 1234.05206
[86] D. Strauss, and M. Ikeda, Pseudolikelihood estimation for social networks, J Am Stat Assoc 85(409) (1990), 204-212.
[87] M. A. J. van Duijn, K. J. Gile, and M. S. Handcock, A framework for the comparison of maximum pseudolikelihood and maximum likelihood estimation of exponential family random graph models, Social Netw 31 (2009), 52-62.
[88] A. Caimo and N. Friel, Bayesian inference for exponential random graph models, Social Netw 33(1) (2011), 41-55.
[89] S. Chatterjee and P. Diaconis, Estimating and understanding exponential random graph models, Technical Report, Stanford University, 2012. Available at arXiv.org:1102. 2650. · Zbl 1293.62046
[90] J. Besag, Statistical analysis of non-lattice data, J R Stat Soc Ser D 24(3) (1975), 179-195.
[91] T. A. B. Snijders, Markov chain Monte Carlo estimation of exponential random graph models, J Social Struct 3(2) (2002), 1-40.
[92] C. Geyer and E. Thompson, Constrained monte carlo maximum likelihood for dependent data, J R Stat Soc Ser B 54 (1992), 657-699.
[93] M. S. Handcock, G. Robins, T. A. B. Snijders, J. Moody, and J. Besag, Assessing degeneracy in statistical models of social networks, J Am Stat Assoc 76 (2003), 33-50.
[94] D. R. Hunter, S. M. Goodreau, and M. S. Handcock, Goodness of fit of social network models, J Am Stat Assoc 103(481) (2008), 248-258. · Zbl 05564484
[95] S. Hanneke, W. Fu, and E. P. Xing, Discrete temporal models of social networks, Electron J Stat 4 (2010), 585-605. · Zbl 1329.91113
[96] A. Caimo, and N. Friel, Bergm: Bayesian inference for exponential random graph models, R package version 1.4, 2010.
[97] P. Hoff, Modeling homophily and stochastic equivalence in symmetric relational data, In Advances in Neural Information Processing Systems 20, J. C. Platt, D. Koller, Y. Singer, and S. Roweis, eds. Cambridge, MA, MIT Press, 2008, 657-664.
[98] M. I. Jordan, Graphical models, Stat Science 19(1) (2004), pp. 140-155. · Zbl 1057.62001
[99] M. I. Jordan, Learning in Graphical Models, Adaptive Computation and Machine Learning. Cambridge, MA, USA, MIT Press, 1998.
[100] M. J. Wainwright, and M. I. Jordan, Graphical Models, Exponential Families, and Variational Inference, Delft, The Netherlands, Now Publishers, 2008. · Zbl 1193.62107
[101] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. · Zbl 1107.68072
[102] P. Hoff, A. E. Raftery, and M. S. Handcock, Latent space approaches to social network analysis, J Am Stat Assoc 97(460) (2002), 1090-1098. · Zbl 1041.62098
[103] M. S. Handcock, A. E. Raftery, and J. M. Tantrum, Modelbased clustering for social networks, J R Stat Soc Ser A 170(2) (2007), 1-22.
[104] P. Hoff, Bilinear mixed-effects models for dyadic data, J Am Stat Assoc 100(469) (2005), 286-295. · Zbl 1117.62353
[105] P. N. Krivitsky, M. S. Handcock, A. E. Raftery, and P. D. Hoff, Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models, Social Netw 31(3) (2009), 204-213.
[106] P. N. Krivitsky and M. S. Handcock, latentnet: Latent position and cluster models for statistical networks, 2010.
[107] I. C. Gormley and T. B. Murphy, A mixture of experts latent position cluster model for social network data, Stat Methodol 7(3) (2010), 385-405. · Zbl 1233.62205
[108] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, Mixed-membership stochastic blockmodels, J Mach Learn Res 9 (2008), 1981-2014. · Zbl 1225.68143
[109] M. Salter-Townshend, and T. B. Murphy, Variational Bayesian inference for the latent position cluster model, In NIPS Workshop on Analyzing Networks and Learning with Graphs, 2010. · Zbl 1365.62246
[110] A. E. Raftery, X. Niu, P. D. Hoff, and K. Y. Yeung, Fast inference for the latent space network model using a casecontrol approximate likelihood, J Comput Graph Stat, in press.
[111] P. N. Krivitsky and M. S. Handcock, Fitting latent cluster models for networks with latentnet, J Stat Softw 24(5) (2008), 1-23.
[112] S. Shortreed, M. S. Handcock, and P. Hoff, Positional estimation within a latent space model for networks, Methodol Eur J Res Meth Behav Social Sci 2(1) (2006), 24-33.
[113] M. Salter-Townshend, VBLPCM: Variational Bayes Latent Position Cluster Model for networks, R package version 2.0., 2012.
[114] T. A. B. Snijders and K. Nowicki, Estimation and prediction for stochastic bockmodels for graphs with latent block structure, J Classif 14(1) (1997), 75-100. · Zbl 0896.62063
[115] P. J. Bickel and A. Chen, A nonparametric view of network models and Newman-Girvan and other modularities, Proc Natl Acad Sci 106(50) (2009), 21068-21073. · Zbl 1359.62411
[116] P. J. Bickel, A. Chen, and E. Levina, The method of moments and degree distributions for network models, Ann Stat 39(5) (2011), 2280-2301. · Zbl 1232.91577
[117] K. Rohe, S. Chatterjee, and B. Yu, Spectral clustering and the high-dimensional stochastic blockmodel, Ann Stat 39(4) (2011), 1878-1915. · Zbl 1227.62042
[118] D. S. Choi, P. J. Wolfe, and E. M. Airoldi, Stochastic blockmodels with growing number of classes, Biometrika To appear, (2011. · Zbl 1318.62207
[119] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent Dirichlet allocation, J Mach Learn Res 3 (2003), 993-1022. · Zbl 1112.68379
[120] E. P. Xing, W. Fu, and L. Song, A state-space mixed membership blockmodel for dynamic network tomography, Ann Appl Stat 4(2) (2010), 535-566. · Zbl 1194.62133
[121] P. Latouche, E. Birmel´e, and C. Ambroise, Overlapping stochastic block models with application to the french political blogosphere, Ann Appl Stat 5(1) (2011), 309-336. · Zbl 1220.62083
[122] A. McDaid and N. J. Hurley, Detecting highly overlapping communities with model-based overlapping seed expansion, In International Conference on Advances in Social Networks Analysis and Mining (ASONAM), N. Memon and R. Alhajj, eds. Washington, DC, USA, IEEE Computer Society, 2010, 112-119.
[123] J. Chang, lda: Collapsed Gibbs sampling methods for topic models, R package version 1.2.1., 2010.
[124] I. Porteous, A. Asuncion, D. Newman, P. Smyth, A. Ihler, and M. Welling, Fast collapsed Gibbs sampling for latent Dirichlet allocation, InProceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2008. ACM.569-577. New York, NY, USA,
[125] P. N. Krivitsky, Exponential-family random graph models for valued networks, Technical report, Pennsylvania State University, 2011. Available on arXiv:1101.1359v1. · Zbl 1264.91105
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.