Consistency under sampling of exponential random graph models.(English)Zbl 1269.91066

Summary: The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM’s expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.

MSC:

 91D30 Social networks; opinion dynamics 62B05 Sufficient statistics and fields 60G51 Processes with independent increments; Lévy processes 62M99 Inference from stochastic processes 62M09 Non-Markovian processes: estimation 62P25 Applications of statistics to social sciences

statnet
Full Text:

References:

 [1] Achlioptas, D., Clauset, A., Kempe, D. and Moore, C. (2005). On the bias of traceroute sampling (or: Why almost every network looks like it has a power law). In Proceedings of the 37 th ACM Symposium on Theory of Computing . · Zbl 1192.68065 [2] Ackland, R. and O’Neil, M. (2011). Online collective identity: The case of the environmental movement. Social Networks 33 177-190. [3] Ahmed, N. K., Neville, J. and Kompella, R. (2010). Reconsidering the foundations of network sampling. In Proceedings of the 2 nd Workshop on Information in Networks [ WIN 2010] (S. Aral, F. Provost and A. Sundararajan, eds.). [4] Anderson, C. J., Wasserman, S. and Crouch, B. (1999). A $$p^\ast$$ primer: Logit models for social networks. Social Networks 21 37-66. [5] Bahadur, R. R. (1971). Some Limit Theorems in Statistics . SIAM, Philadelphia. · Zbl 0257.62015 [6] Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory . Wiley, Chichester. · Zbl 0387.62011 [7] Barvinok, A. and Hartigan, J. A. (2010). The number of graphs and a random graph with a given degree sequence. Available at . · Zbl 1213.05015 [8] Besag, J. (1989). A candidate’s formula: A curious result in Bayesian prediction. Biometrika 76 183. · Zbl 0664.62028 [9] Bhamidi, S., Bresler, G. and Sly, A. (2011). Mixing time of exponential random graphs. Ann. Appl. Probab. 21 2146-2170. · Zbl 1238.60011 [10] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068-21073. · Zbl 1359.62411 [11] Brown, L. D. (1986). Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Institute of Mathematical Statistics Lecture Notes-Monograph Series 9 . IMS, Hayward, CA. · Zbl 0685.62002 [12] Butler, R. W. (1986). Predictive likelihood inference with applications. J. Roy. Statist. Soc. Ser. B 48 1-38. · Zbl 0599.62009 [13] Chatterjee, S. and Dey, P. S. (2010). Applications of Stein’s method for concentration inequalities. Ann. Probab. 38 2443-2485. · Zbl 1203.60023 [14] Chatterjee, S. and Diaconis, P. (2011). Estimating and understanding exponential random graph models. Available at . · Zbl 1293.62046 [15] Chatterjee, S., Diaconis, P. and Sly, A. (2011). Random graphs with a given degree sequence. Ann. Appl. Probab. 21 1400-1435. · Zbl 1234.05206 [16] Daraganova, G., Pattison, P., Koskinen, J., Mitchell, B., Bill, A., Watts, M. and Baum, S. (2012). Networks and geography: Modelling community network structure as the outcome of both spatial and network processes. Social Networks 34 6-17. [17] de la Haye, K., Robins, G., Mohr, P. and Wilson, C. (2010). Obesity-related behaviors in adolescent friendship networks. Social Networks 32 161-167. [18] den Hollander, F. (2000). Large Deviations. Fields Institute Monographs 14 . Amer. Math. Soc., Providence, RI. · Zbl 0949.60001 [19] Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. Rend. Mat. Appl. (7) 28 33-61. · Zbl 1162.60009 [20] Easley, D. and Kleinberg, J. (2010). Networks , Crowds , and Markets : Reasoning About a Highly Connected World . Cambridge Univ. Press, Cambridge. · Zbl 1205.91007 [21] Faust, K. and Skvoretz, J. (2002). Comparing networks across space and time, size and species. Sociological Methodology 32 267-299. [22] Frank, O. and Strauss, D. (1986). Markov graphs. J. Amer. Statist. Assoc. 81 832-842. · Zbl 0607.05057 [23] Goldenberg, A., Zheng, A. X., Fienberg, S. E. and Airoldi, E. M. (2009). A survey of statistical network models. Foundations and Trends in Machine Learning 2 1-117. · Zbl 1184.68030 [24] Gondal, N. (2011). The local and global structure of knowledge production in an emergent research field: An exponential random graph analysis. Social Networks 33 20-30. [25] Gonzalez-Bailon, S. (2009). Opening the black box of link formation: Social factors underlying the structure of the web. Social Networks 31 271-280. [26] Goodreau, S. M., Kitts, J. A. and Morris, M. (2009). Birds of a feather, or friend of a friend?: Using exponential random graph models to investigate adolescent social networks. Demography 46 103-125. [27] Grünwald, P. D. (2007). The Minimum Description Length Principle . MIT Press, Cambridge, MA. [28] Handcock, M. S. and Gile, K. J. (2010). Modeling social networks from sampled data. Ann. Appl. Stat. 4 5-25. · Zbl 1189.62187 [29] Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M. and Morris, M. (2008). statnet: Software tools for the representation, visualization, analysis and simulation of network data. Journal of Statistical Software 24 1-11. Special issue on statnet. [30] Hanneke, S., Fu, W. and Xing, E. P. (2010). Discrete temporal models of social networks. Electron. J. Stat. 4 585-605. · Zbl 1329.91113 [31] Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33-65. · Zbl 0457.62090 [32] Jona-Lasinio, G. (2001). Renormalization group and probability theory. Phys. Rep. 352 439-458. · Zbl 0979.82026 [33] Kallenberg, O. (2002). Foundations of Modern Probability , 2nd ed. Springer, New York. · Zbl 0996.60001 [34] Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107, 10. [35] Kolaczyk, E. D. (2009). Statistical Analysis of Network Data : Methods and Models . Springer, New York. · Zbl 1277.62021 [36] Kossinets, G. (2006). Effects of missing data in social networks. Social Networks 28 247-268. [37] Krivitsky, P. N., Handcock, M. S. and Morris, M. (2011). Adjusting for network size and composition effects in exponential-family random graph models. Stat. Methodol. 8 319-339. · Zbl 1215.91069 [38] Landau, L. D. and Lifshitz, E. M. (1980). Statistical Physics . Pergamon Press, Oxford. · Zbl 0080.19702 [39] Lauritzen, S. L. (1974). Sufficiency, prediction and extreme models. Scand. J. Stat. 1 128-134. · Zbl 0297.62068 [40] Lauritzen, S. L. (1988). Extremal Families and Systems of Sufficient Statistics. Lecture Notes in Statistics 49 . Springer, New York. · Zbl 0681.62009 [41] Lauritzen, S. L. (2008). Exchangeable Rasch matrices. Rend. Mat. Appl. (7) 28 83-95. · Zbl 1223.60008 [42] Lovász, L. and Szegedy, B. (2006). Limits of dense graph sequences. J. Combin. Theory Ser. B 96 933-957. · Zbl 1113.05092 [43] Lubbers, M. J. and Snijders, T. A. B. (2007). A comparison of various approaches to the exponential random graph model: A reanalysis of 102 student networks in school classes. Social Networks 29 489-507. [44] Mandelbrot, B. (1962). The role of sufficiency and of estimation in thermodynamics. Ann. Math. Statist. 33 1021-1038. · Zbl 0111.33705 [45] McPherson, M., Smith-Lovin, L. and Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology 27 415-444. [46] Nauenberg, M. (2003). Critique of $$q$$-entropy for thermal statistics. Phys. Rev. E 67 036114. · Zbl 1038.01010 [47] Newman, M. E. J. (2010). Networks : An Introduction . Oxford Univ. Press, Oxford. · Zbl 1195.94003 [48] Orbanz, P. (2011). Projective limit techniques in Bayesian nonparametrics. Unpublished manuscript. · Zbl 1274.62076 [49] Park, J. and Newman, M. E. J. (2004). Solution of the 2-star model of a network. Phys. Rev. E (3) 70 066146. [50] Park, J. and Newman, M. E. J. (2004). Statistical mechanics of networks. Phys. Rev. E (3) 70 066117, 13. [51] Park, J. and Newman, M. E. J. (2006). Solution for the properties of a clustered network. Phys. Rev. E (3) 72 026136. [52] Rinaldo, A., Fienberg, S. E. and Zhou, Y. (2009). On the geometry of discrete exponential families with application to exponential random graph models. Electron. J. Stat. 3 446-484. · Zbl 1326.62071 [53] Rinaldo, A., Petrović, S. and Fienberg, S. E. (2011). Maximum likelihood estimation in network models. Available at . · Zbl 1292.62052 [54] Robins, G., Snijders, T., Wang, P., Handcock, M. and Pattison, P. (2007). Recent developments in exponential random graph ($$p^\ast$$) models for social networks. Social Networks 29 192-215. [55] Schaefer, D. R. (2012). Youth co-offending networks: An investigation of social and spatial effects. Social Networks 34 141-149. [56] Schervish, M. J. (1995). Theory of Statistics . Springer, New York. · Zbl 0834.62002 [57] Shalizi, C. R. and Rinaldo, A. (2013). Supplement to “Consistency under sampling of exponential random graph models.” . · Zbl 1269.91066 [58] Snijders, T. A. B. (2005). Models for longitudinal network data. In Models and Methods in Social Network Analysis (P. J. Carrington, J. Scott and S. Wasserman, eds.) 215-247. Cambridge Univ. Press, Cambridge. [59] Snijders, T. A. B., Pattison, P. E., Robins, G. L. and Handcock, M. S. (2006). New specifications for exponential random graph models. Sociological Methodology 36 99-153. [60] Stumpf, M. P. H., Wiuf, C. and May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc. Natl. Acad. Sci. USA 102 4221-4224. [61] Touchette, H. (2009). The large deviation approach to statistical mechanics. Phys. Rep. 478 1-69. [62] Vermeij, L., van Duijin, M. A. J. and Baerveldt, C. (2009). Ethnic segregation in context: Social discrimination among native Dutch pupils and their ethnic minority classmates. Social Networks 31 230-239. [63] Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1 1-305. · Zbl 1193.62107 [64] Wasserman, S. and Pattison, P. (1996). Logit models and logistic regressions for social networks. I. An introduction to Markov graphs and $$p$$. Psychometrika 61 401-425. · Zbl 0866.92029 [65] Wasserman, S. and Robins, G. (2005). An introduction to random graphs, dependence graphs, and $$p^\ast$$. In Models and Methods in Social Network Analysis (P. J. Carrington, J. Scott and S. Wasserman, eds.) 148-161. Cambridge Univ. Press, Cambridge, England. [66] Xiang, R. and Neville, J. (2011). Relational learning with one network: An asymptotic analysis. In Proceedings of the 14 th International Conference on Artificial Intelligence and Statistics [ AISTATS 2011] (G. Gordon, D. Dunson and M. Dudík, eds.). Journal of Machine Learning Research : Workshops and Conference Proceedings 15 779-788. Clarendon Press, Oxford. [67] Yeomans, J. M. (1992). Statistical Mechanics of Phase Transitions . Clarendon Press, Oxford.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.