×

Edge exchangeable models for interaction networks. (English) Zbl 1402.90027

Summary: Many modern network datasets arise from processes of interactions in a population, such as phone calls, email exchanges, co-authorships, and professional collaborations. In such interaction networks, the edges comprise the fundamental statistical units, making a framework for edge-labeled networks more appropriate for statistical analysis. In this context, we initiate the study of edge exchangeable network models and explore its basic statistical properties. Several theoretical and practical features make edge exchangeable models better suited to many applications in network analysis than more common vertex-centric approaches. In particular, edge exchangeable models allow for sparse structure and power law degree distributions, both of which are widely observed empirical properties that cannot be handled naturally by more conventional approaches. Our discussion culminates in the Hollywood model, which we identify here as the canonical family of edge exchangeable distributions. The Hollywood model is computationally tractable, admits a clear interpretation, exhibits good theoretical properties, and performs reasonably well in estimation and prediction as we demonstrate on real network datasets. As a generalization of the Hollywood model, we further identify the vertex components model as a nonparametric subclass of models with a convenient stick breaking construction.

MSC:

90B15 Stochastic network models in operations research
90C35 Programming involving graphs or networks
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Airoldi, E. M.; Choi, D. S.; Wolfe, P. J., Confidence sets for network structure, Statistical Analysis and Data Mining, 4, 461-469, (2011)
[2] Aldous, D. J., Representations for partially exchangeable arrays of random variables, Journal of Multivariate Analysis, 11, 581-598, (1981) · Zbl 0474.60044
[3] Barabási, A.-L.; Albert, R., Emergence of scaling in random networks, Science, 286, 509-512, (1999) · Zbl 1226.05223
[4] Bickel, P.; Chen, A., A nonparametric view of network models and Newman–girvan and other modularities, Proceedings of the National Academy of Sciences of the United States of America, 106, 21068-21073, (2009) · Zbl 1359.62411
[5] Borgs, C.; Chayes, J. T.; Cohn, H.; Ganguly, S., Consistent nonparametric estimation for heavy-tailed sparse graphs,, (2015)
[6] Borgs, C.; Chayes, J.; Cohn, H.; Zhao, Y., An theory of sparse graph convergence I: limits, sparse random graph models, and power law distributions,, (2014)
[7] Borgs, C.; Chayes, J. T.; Cohn, H.; Holden, N., Sparse exchangeable graphs and their limits via graphon processes,, (2016)
[8] Britton, T.; Deijfen, M.; Lagerås, A. N.; Lindholm, M., Epidemics on random graphs with tunable clustering, Journal of Applied Probability, 45, 743-756, (2008) · Zbl 1147.92034
[9] Butts, C. T., A relational event framework for social action, Sociological Methodology, 38, 155-200, (2008)
[10] Caron, F.; Fox, E. B., Sparse graphs using exchangeable random measures,, (2014)
[11] Sparse graphs using exchangeable random measures,, Journal of the Royal Statistical Society, 79, 1-44, (2017)
[12] Chatterjee, S.; Diaconis, P., Estimating and understanding exponential random graph models, Annals of Statistics, 41, 2428-2461, (2013) · Zbl 1293.62046
[13] Crane, H., The ubiquitous Ewens sampling formula (with discussion and a rejoinder by the author), Statistical Science, 31, 1-39, (2016)
[14] Rejoinder: the ubiquitous Ewens sampling formula, Statistical Science, 31, 37-39, (2016) · Zbl 1442.60011
[15] Crane, H.; Dempsey, W., A framework for statistical network modeling,, (2015)
[16] Deijfen, M.; Kets, W., Random intersection graphs with tunable degree distribution and clustering,, Probability in the Engineering and Informational Sciences, 23, 661-674, (2009) · Zbl 1180.68212
[17] Ewens, W. J., The sampling theory of selectively neutral alleles,, Theoretical Population Biology, 3, 87-112, (1972) · Zbl 0245.92009
[18] Feng, S., The Poisson–Dirichlet Distribution and Related Topics, (2010), Springer-Verlag, Berlin
[19] Ferguson, T., A Bayesian analysis of some nonparametric problems, The Annals of Statistics, 1, 209-230, (1973) · Zbl 0255.62037
[20] Fill, J.; Scheinerman, E.; Singer-Cohen, K., Random intersection graphs when : an equivalence theorem relating the evolution of the and models,, Random Structures & Algorithms, 16, 156-176, (2000) · Zbl 0951.05096
[21] Godehardt, E.; Jaworski, J.; Schwaiger, M.; Opitz, O., Exploratory Data Analysis in Empirical Research. Studies in Classification, Data Analysis, and Knowledge Organization, Two models of random intersection graphs for classification,, (2003), Springer, Berlin, Heidelberg
[22] Holland, P. W.; Leinhardt, S., An exponential family of probability distributions for directed graphs,, Journal of the American Statistical Association, 33-65, (1981) · Zbl 0457.62090
[23] Holland, P. W.; Laskey, K. B.; Leinhardt, S., Stochastic blockmodels: first steps, Social Networks, 5, 109-137, (1983)
[24] Hoover, D. N., Relations on probability spaces and arrays of random variables,, Preprint, Institute for Advanced Studies, (1979)
[25] Ishwaran, H.; James, L. F., Gibbs sampling methods for stick-breaking priors,, Journal of the American Statistical Association, 96, 161-173, (2001) · Zbl 1014.62006
[26] Kallenberg, O., Exchangeable random measures in the plane, Journal of Theoretical Probability, 3, 81-136, (1990) · Zbl 0694.60030
[27] Karónski, M.; Scheinerman, E.; Singer-Cohen, K., On random intersection graphs: the subgraphs problem,, Combinatorics Probability & Computing, 8, 131-159, (1999) · Zbl 0924.05059
[28] Klimt, B.; Yang, Y., Introducing the enron corpus,, Conference – CEAS 2004 - First Conference on Email and Anti-Spam, (2004) · Zbl 1132.68562
[29] Latouche, P.; Robin, S.; Ouadah, S., Goodness of fit of logistic models for random graphs,, (2015)
[30] Leskovec, J.; Huttenlocher, D.; Kleinberg, J., Signed networks in social media,, ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), (2010)
[31] Lovász, L.; Szegedy, B., Limits of dense graph sequences,, Journal of Combinatorial Theory, 96, 933-957, (2006) · Zbl 1113.05092
[32] Mariadassou, M.; Robin, S.; Vacher, C., Uncovering latent structure in valued graphs: A variational approach,, Annals of Applied Statistics, 715-742, (2010) · Zbl 1194.62125
[33] McAuley, J. J.; Leskovec, J., Neural Information Processing Systems, Learning to discover social circles in ego networks,, 539-547, (2012)
[34] Nesetril, J.; Ossona de Mendez, P., Sparsity: Graphs, Structures, and Algorithms, (2012), Springer, Heidelberg · Zbl 1268.05002
[35] Newman, M. E. J., The structure of scientific collaboration networks, Proceedings of the National Academy of Sciences, 98, 404-409, (2001) · Zbl 1065.00518
[36] Opsahl, T., Why anchorage is not (that) important: binary ties and sample selection,, (2011)
[37] Opsahl, T.; Panzarasa, P., Clustering in weighted networks, Social Networks, 31, 155-163, (2009)
[38] Orbanz, P.; Roy, D. M., Bayesian models of graphs, arrays and other exchangeable random structures, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 437-461, (2015)
[39] Perry, P. O.; Wolfe, P. J., Point process modelling for directed interaction networks,, Journal of the Royal Statistical Society, 75, 821-849, (2013)
[40] Pitman, J., Combinatorial Stochastic Processes, 1875, (2005), Springer, Berlin
[41] Rossi, R. A.; Ahmed, N. K., Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, The network data repository with interactive graph analytics and visualization,, (2015)
[42] Sethuraman, J., A constructive definition of Dirichlet priors, Statistica Sinica, 4, 639-650, (1994) · Zbl 0823.62007
[43] Shalizi, C. R.; Rinaldo, A., Consistency under subsampling of exponential random graph models, Annals of Statistics, 41, 508-535, (2013) · Zbl 1269.91066
[44] Singer, K., Random Intersection Graphs, unpublished, (1995), PhD thesis, Johns Hopkins University
[45] Snijders, T. A. B.; Nowicki, K., Estimation and prediction for stochastic blockmodels for graphs with latent block structure, Journal of Classification, 14, 75-100, (1997) · Zbl 0896.62063
[46] Sweet, T. M., Incorporating covariates into stochastic blockmodels,, Journal of Educational and Behavioral Statistics, 40, 635-664, (2015)
[47] Tallberg, C., A Bayesian approach to modeling stochastic block-structures with covariates, Journal of Mathematical Sociology, 29, 1-23, (2004) · Zbl 1101.91067
[48] Veitch, V.; Roy, D., The class of random graphs arising from exchangeable random measures,, (2015)
[49] Wolfe, P. J.; Olhede, S. C., Nonparametric graphon estimation,, (2014)
[50] Zachary, W. W., An information flow model for conflict and fission in small groups, Journal of Anthropological Research, 33, 452-473, (1977)
[51] Zhang, Y.; Levina, E.; Zhu, J., Community detection in networks with node features,, (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.