×

Are extreme value estimation methods useful for network data? (English) Zbl 1460.62085

Let \(G(n)=(V(n),E(n))\) be a directed network, where \(V(n)\) is the set of nodes, \(E(n)\) is the set of edges, and \(n\) is the number of edges. Let \(N(n)\) denote the number of nodes in \(G(n)\) and \(N_n(i,j)\) be the number of nodes with in-degree \(i\) and out-degree \(j\). The marginal counts of nodes with in-degree \(i\) and out-degree \(j\) are the following \[ N_i^{\mathrm{in}}(n)=\sum_{j=0}^\infty N_n(i,j), \quad N_j^{\mathrm{out}}(n)=\sum_{i=0}^\infty N_n(i,j). \]
It is supposed that the empirical degree frequency converges almost surely, i.e. \[ \frac{N_n(i,j)}{N(n)}\mathop{\rightarrow}_{n\longrightarrow\infty}p_{ij}\ \mathrm{ a.s.}, \] where \(p_{ij}\) are local probabilities of a bivariate integer-valued random variable.
Also it is supposed that the network exhibits the power-law behavior, i.e. the following requirements hold \[ p_i^{\mathrm{in}}=\sum_{j=0}^\infty p_{ij}\mathop{\sim}_{i\rightarrow\infty}C_{\mathrm{in}}i^{-(1+\imath_{\mathrm{in}})},\quad p_j^{\mathrm{out}}=\sum_{i=0}^\infty p_{ij}\mathop{\sim}_{j\rightarrow\infty}C_{\mathrm{out}}j^{-(1+\imath_{\mathrm{out}})}, \] for some positive constants \(C_{\mathrm{in}}\), \(C_{\mathrm{out}}\), \(\imath_{\mathrm{in}}\) and \(\imath_{\mathrm{out}}\).
The authors of the paper describe two classes of preferential attachment models that generate networks with power-law degree distributions. In addition, they consider semiparametric estimation of the model parameters based on an extreme value approach, compare the extreme value method with the existing parametric approaches and demonstrate how it can provide more robust estimates of parameters associated with the network when the data are corrupted or when the model is misspecified.

MSC:

62H12 Estimation in multivariate analysis
60G70 Extreme value theory; extremal stochastic processes
05C80 Random graphs (graph-theoretic aspects)
62F35 Robustness and adaptive procedures (parametric inference)
62E20 Asymptotic distribution theory in statistics
91D30 Social networks; opinion dynamics

Software:

plfit; KONECT; poweRlaw; ismev
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bhamidi, S.: Universal techniques to analyze preferential attachment trees: global and local analysis. available: http://www.unc.edu/bhamidi/preferent.pdf. Preprint (2007)
[2] Bhamidi, S.; Steele, Jm; Zaman, T., Twitter event networks and the superstar model, Ann. Appl. Probab., 10, 5, 2462-2502 (2015) · Zbl 1334.60204 · doi:10.1214/14-AAP1053
[3] Bollobás, B., Borgs, C., Chayes, J., Riordan, O.: Directed scale-free graphs. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Baltimore, 2003), pp 132-139. ACM, New York (2003) · Zbl 1094.68605
[4] Chandler, Re; Bate, S., Inference for clustered data using the independence log- likelihood, Biometrika, 95, 167-183 (2007) · Zbl 1142.62367 · doi:10.1093/biomet/asm015
[5] Clauset, A.; Shalizi, Cr; Newman, Mej, Power-law distributions in empirical data, SIAM Rev., 51, 4, 661-703 (2009) · Zbl 1176.62001 · doi:10.1137/070710111
[6] Coles, Sg, An introduction to statistical modeling of extreme values. Springer Series in Statistics, xiv 210-xiv 210 (2001), London: Springer, London · Zbl 0980.62043
[7] Das, B.; Mitra, A.; Resnick, S., Living on the multi-dimensional edge: seeking hidden risks using regular variation, Adv. Appl. Probab., 45, 1, 139-163 (2013) · Zbl 1276.60041 · doi:10.1239/aap/1363354106
[8] De Haan, L.; Ferreira, A., Extreme value theory: an introduction (2006), New York: Springer, New York · Zbl 1101.62002
[9] Drees, H., Janßen, A., Resnick, S.I., Wang, T.: On a minimum distance procedure for threshold selection in tail analysis. ArXiv e-prints. Submitted (2018) · Zbl 1484.62057
[10] Durrett, Rt, Random graph dynamics. Cambridge series in statistical and probabilistic mathematics (2010), Cambridge: Cambridge University Press, Cambridge · Zbl 1202.60001
[11] Easley, D.; Kleinberg, J., Networks, crowds, and markets (2010), Cambridge: Cambridge University Press, Cambridge · Zbl 1205.91007
[12] Gao, F.; Van Der Vaart, A., On the asymptotic normality of estimating the affine preferential attachment network models with random initial degrees, Stochastic Process Appl., 127, 11, 3754-3775 (2017) · Zbl 1491.62080 · doi:10.1016/j.spa.2017.03.008
[13] Gillespie, Cs, Fitting heavy tailed distributions: the poweRlaw package, J. Stat. Softw., 64, 2, 1-16 (2015) · doi:10.18637/jss.v064.i02
[14] Hill, Bm, A simple general approach to inference about the tail of a distribution, Ann. Statist., 3, 1163-1174 (1975) · Zbl 0323.62033 · doi:10.1214/aos/1176343247
[15] Hult, H.; Lindskog, F., Regular variation for measures on metric spaces, Publ. Inst Math. (Beograd) (N.S.), 80, 94, 121-140 (2006) · Zbl 1164.28005 · doi:10.2298/PIM0694121H
[16] Kolaczyk, Ed; Csárdi, G., Statistical analysis of network data with R. Use R! (2014), New York: Springer, New York · Zbl 1290.62002
[17] Krapivsky, P.; Rodgers, G.; Redner, S., Degree distributions of growing networks, Phys. Rev. Lett, 86, 5401-5404 (2001) · doi:10.1103/PhysRevLett.86.5401
[18] Krapivsky, Pl; Redner, S., Organization of growing random networks, Physical Review E, 63, 6, 066123:1-14 (2001) · doi:10.1103/PhysRevE.63.066123
[19] Kunegis, J.: Konect: the Koblenz network collection. In: Proceedings of the 22nd International Conference on World Wide Web, pp 1343-1350. ACM (2013)
[20] Lindskog, F.; Resnick, Si; Roy, J., Regularly varying measures on metric spaces: hidden regular variation and hidden jumps, Probab. Surv., 11, 270-314 (2014) · Zbl 1317.60007 · doi:10.1214/14-PS231
[21] Resnick, Si, Heavy-tail phenomena: probabilistic and statistical modeling. Springer series in operations research and financial engineering (2007), New York: Springer, New York · Zbl 1152.62029
[22] Resnick, Si; Samorodnitsky, G., Tauberian theory for multivariate regularly varying distributions with application to preferential attachment networks, Extremes, 18, 3, 349-367 (2015) · Zbl 1345.60118 · doi:10.1007/s10687-015-0216-2
[23] Samorodnitsky, G.; Resnick, S.; Towsley, D.; Davis, R.; Willis, A.; Wan, P., Nonstandard regular variation of in-degree and out-degree in the preferential attachment model, J. Appl. Probab., 53, 1, 146-161 (2016) · Zbl 1343.60138 · doi:10.1017/jpr.2015.15
[24] Van Der Hofstad, R., Random graphs and complex networks, vol. 1. Cambridge Series in Statistical and Probabilistic Mathematics (2017), Cambridge: Cambridge University Press, Cambridge · Zbl 1361.05002
[25] Varin, C.; Reid, N.; Firth, D., An overview of composite likelihood methods Statist, Sinica, 21, 5-42 (2011) · Zbl 05849508
[26] Wan, P.; Wang, T.; Davis, Ra; Resnick, Si, Fitting the linear preferential attachment model, Electron. J Statist., 11, 2, 3738-3780 (2017) · Zbl 1387.62074 · doi:10.1214/17-EJS1327
[27] Wang, T.; Resnick, Si, Multivariate regular variation of discrete mass functions with applications to preferential attachment networks, Methodol. Comput. Appl. Probab., 20, 3, 1029-104 (2018) · Zbl 1401.28006 · doi:10.1007/s11009-016-9503-x
[28] Wang, T., Resnick, S.I.: Consistency of Hill estimators in a linear preferential attachment model extremes. 10.1007/s10687-018-0335-7 (2018) · Zbl 1432.60056
[29] Wang, T., Resnick, S.I.: Degree growth rates and index estimation in a directed preferential attachment model. ArXiv e-prints. Submitted (2018) · Zbl 1443.60078
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.