Bootstrapping exchangeable random graphs. (English) Zbl 1493.62147

Summary: We introduce two new bootstraps for exchangeable random graphs. One, the “empirical graphon bootstrap”, is based purely on resampling, while the other, the “histogram bootstrap”, is a model-based “sieve” bootstrap. We show that both of them accurately approximate the sampling distributions of motif densities, i.e., of the normalized counts of the number of times fixed subgraphs appear in the network. These densities characterize the distribution of (infinite) exchangeable networks. Our bootstraps therefore give a valid quantification of uncertainty in inferences about fundamental network statistics, and so of parameters identifiable from them.


62F40 Bootstrap, jackknife and other resampling methods
62G09 Nonparametric statistical resampling methods
62G05 Nonparametric estimation
05C80 Random graphs (graph-theoretic aspects)


Full Text: DOI arXiv Link


[1] Bearman, P. S., Moody, J. and Stovel, K. (2004). Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks. American Journal of Sociology 110 44-91.
[2] Bhattacharyya, S. and Bickel, P. J. (2015). Subsampling Bootstrap of Count Features of Networks. Annals of Statistics 43 2384-2411. · Zbl 1326.62067 · doi:10.1214/15-AOS1338
[3] Bickel, P. J., Chen, A. and Levina, E. (2011). The method of moments and degree distributions for network models. Annals of Statistics 39 38-59. · Zbl 1232.91577 · doi:10.1214/11-AOS904
[4] Bickel, P. J., Götze, F. and van Zwet, W. R. (1997). Resampling fewer than n observations: gains, losses, and remedies for losses. Statistica Sinica 1-31. · Zbl 0927.62043
[5] Borgs, C., Chayes, J. T., Lovász, L., Sós, V. T. and Vesztergombi, K. (2008). Convergent Sequences of Dense Graphs I: Subgraph Frequencies, Metric Properties and Testing. Advances in Mathematics 219 1801-1851. · Zbl 1213.05161 · doi:10.1016/j.aim.2008.07.008
[6] Borgs, C., Chayes, J. T., Cohn, H. and Zhao, Y. (2019). An \[{L^p}\] Theory of Sparse Graph Convergence I: Limits, Sparse Random Graph Models, and Power Law Distributions. Transactions of the American Mathematical Society 372 3019-3062. · Zbl 1417.05186 · doi:10.1090/tran/7543
[7] Callaert, H. and Janssen, P. (1978). The Berry-Esseen Theorem for \(U\) Statistics. Annals of Statistics 6 417-421. · Zbl 0393.60022 · doi:10.1214/aos/1176344132
[8] Dynkin, E. B. (1978). Sufficient Statistics and Extreme Points. Annals of Probability 6 705-730. · Zbl 0403.62009 · doi:10.1214/aop/1176995424
[9] Eden, T., Levi, A., Ron, D. and Seshadhri, C. (2017). Approximately Counting Triangles in Sublinear Time. SIAM Journal on Computing 46 1603-1646. · Zbl 1380.68445 · doi:10.1137/15M1054389
[10] Eldardiry, H. and Neville, J. (2008). A Resampling Technique for Relational Data Graphs. In Proceedings of the 2nd SNA Workshop, 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York.
[11] Fienberg, S. E., Meyer, M. M. and Wasserman, S. S. (1985). Statistical Analysis of Multiple Sociometric Relations. Journal of the American Statistical Association 80 51-67. · doi:10.1080/01621459.1985.10477129
[12] Fienberg, S. E. and Wasserman, S. (1981). Categorical Data Analysis of Single Sociometric Relations. In Sociological Methodology 1981 (S. Leinhardt, ed.) 156-192. Jossey-Bass, San Francisco. · doi:10.2307/270741
[13] Gao, C., Lu, Y. and Zhou, H. H. (2015). Rate-Optimal Graphon Estimation. Annals of Statistics 43 2624-2652. · Zbl 1332.60050 · doi:10.1214/15-AOS1354
[14] Gao, C., Lu, Y., Ma, Z. and Zhou, H. H. (2016). Optimal Estimation and Completion of Matrices with Biclustering Structures. Journal of Machine Learning Research 17 1-29. · Zbl 1392.62151
[15] Geyer, C. J. (2013). Asymptotics of Maximum Likelihood without the LLN or CLT or Sample Size Going to Infinity. In Advances in Modern Statistical Theory and Applications: A Festschrift in honor of Morris L. Eaton (G. Jones and X. Shen, eds.) 1-24. Institute of Mathematical Statistics, Beachwood, Ohio. · Zbl 1327.62123 · doi:10.1214/12-IMSCOLL1001
[16] Gonen, M., Ron, D. and Shavitt, Y. (2011). Counting Stars and Other Small Subgraphs in Sublinear-Time. SIAM Journal on Discrete Mathematics 25 1365-1411. · Zbl 1234.68138 · doi:10.1137/100783066
[17] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social networks 5 109-137.
[18] Janssen, P. (1981). Rate of Convergence in the Central Limit Theorem and in the Strong Law of Large Numbers for von Mises Statistics. Metrika 28 35-46. · Zbl 0458.62015 · doi:10.1007/BF01902875
[19] Kallenberg, O. (2005). Probabilistic Symmetries and Invariance Principles. Springer-Verlag, New York. · Zbl 1084.60003
[20] Klopp, O., Tsybakov, A. B. and Verzelen, N. (2017). Oracle inequalities for network models and sparse graphon estimation. The Annals of Statistics 45 316-354. · Zbl 1367.62090 · doi:10.1214/16-aos1454
[21] Kolaczyk, E. D. (2009). Statistical Analysis of Network Data. Springer-Verlag, New York. · Zbl 1277.62021
[22] Lahiri, S. N. (2003). Resampling Methods for Dependent Data. Springer-Verlag, New York. · Zbl 1028.62002
[23] Lauritzen, S. L. (1984). Extreme Point Models in Statistics. Scandinavian Journal of Statistics 11 65-91. With discussion and response. · Zbl 0542.62002
[24] Leger, J.-B. (2016). Blockmodels: A R-package for estimating in Latent Block Model and Stochastic Block Model, with various probability functions, with or without covariates. E-print, arxiv:1602.07587.
[25] Levin, K. and Levina, E. (2019a). Bootstrapping Networks with Latent Space Structure. E-print, arXiv:1907.10821.
[26] Levin, K. and Levina, E. (2019b). Bootstrapping networks with latent space structure. arXiv preprint arXiv:1907.10821.
[27] Lovász, L. (2009). Very Large Graphs. In Current Developments in Mathematics, 2008 (D. Jerison, B. Mazur, T. Mrowka, W. Schmid, R. P. Stanley and S.-T. Yau, eds.) 67-128. International Press, Somerville, Massachusetts. · Zbl 1179.05100 · doi:10.4310/CDM.2008.v2008.n1.a2
[28] Lunde, R. and Sarkar, P. (2019). Subsampling sparse graphons under minimal assumptions. arXiv preprint arXiv:1907.12528.
[29] McCullagh, P. (2000). Resampling and exchangeable arrays. Bernoulli 6 285-301. · Zbl 0976.62035 · doi:10.2307/3318577
[30] Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D. and Alon, U. (2002). Network motifs: Simple building blocks of complex networks. Science 298 824-827.
[31] Olding, B. P. and Wolfe, P. J. (2014). Inference for graphs and networks: Extending classical tools to modern data. In Data Analysis for Network Cyber-Security (N. Adams and N. Heard, eds.) 1-31. World Scientific, Singapore. · doi:10.1142/9781783263752_0001
[32] Owen, A. B. and Eckles, D. (2012). Bootstrapping Data Arrays of Arbitrary Order. Annals of Applied Statistics 6 895-927. · Zbl 1454.62131 · doi:10.1214/12-AOAS547
[33] Rosvall, M. and Bergstrom, C. T. (2010). Mapping Change in Large Networks. PLoS ONE 5 e8694. · doi:10.1371/journal.pone.0008694
[34] Shalizi, C. R. and Rinaldo, A. (2013). Consistency Under Sampling of Exponential Random Graph Models. Annals of Statistics 41 508-535. · Zbl 1269.91066 · doi:10.1214/12-AOS1044
[35] Wang, L. (2016). Network Comparisons using Sample Splitting, PhD thesis, Carnegie Mellon University.
[36] Wolfe, P. J. and Olhede, S. C. (2013). Nonparametric graphon estimation. E-print, arxiv:1309.5936.
[37] Xu, J. (2018). Rates of convergence of spectral methods for graphon estimation. In International Conference on Machine Learning 5433-5442. PMLR.
[38] Zhang, Y. and Xia, D. (2020). Edgeworth expansions for network moments. arXiv preprint arXiv:2004.06615.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.