Consistent estimation in general sublinear preferential attachment trees.

*(English)*Zbl 1387.62064Recently there has been much interest in studying large-scale real-world networks and attempting to model their properties using random graphs. This paper is a good contribution to statistical inference in models of randomly growing networks.

The following is the model analysed. “The network starts at stage \(n=2\) with two nodes \(v_1\) and \(v_2\) connected by a single edge. Next it evolves recursively by adding nodes \(v_3,v_4,\dots\) which each connect by a single edge to a single node of the existing network. The incoming nodes choose the node to which they connect by a probabilistic mechanism. Given the network with nodes \(v_1,\dots,v_n\), having degrees \(d_1(n),\dots,d_n(n)\) (the degree of a node is the number of edges connected to that node), the node \(v_{n+1}\) connects to the existing node \(v_i\in\{v_1,\dots,v_n\}\) with probability proportional to \(f(d_i(n))\), i.e., with probability \[ \frac{f(d_i(n))}{\sum_{j=1}^nf(d_j(n))}. \] Here \(f:\mathbb{N}_+\to\mathbb{R}_+\) is a given function, which we call the preferential attachment (PA) function, and we refer to \(f(d_i(n))\) as the preference for node \(v_i\) at stage \(n\). The PA function \(f\) is typically assumed to be non-decreasing, so that nodes of higher degrees inspire more incoming connections. This explains the name preferential attachment model. After the incoming node \(v_{n+1}\) has made its choice, the network evolves to the stage of \(n+1\) nodes and the scheme repeats with the set of existing nodes \(\{v_1,\dots,v_n\}\) and the incoming node \(v_{n+2}\). The recursive procedure may be repeated to reach any number of nodes.”

Consider \(r_k\) as the rescaled version of \(f(k)\) for each \(k\), that is \[ r_k:=\frac{f(k)}{\sum_{j=1}^{\infty}f(j)p_{j}}, \] where \((p_{j})_{j=1}^{\infty}\) is the limiting degree distribution. A new estimator of a PA sublinear function \(f\), from an observed realization of a network, is proposed: For each \(n\), and \(k=1,\dots,n\) define \[ \hat{r}_{k}(n)=\frac{N_{>k}(n)}{N_{k}(n)}, \] where \(N_{>k}(n)\) denotes the number of nodes of degree strictly larger than \(k\) at time \(n\), and \(N_{k}(n)\) denotes the number of nodes of degree \(k\). The main result of this paper is the following theorem. Define a function \(\rho_{f}:(0,\infty)\to[0,\infty]\) by \[ \rho_{f}(\lambda):=\sum_{l=1}^{\infty}\prod_{k=1}^{l}\frac{f(k)}{\lambda+f(k)}. \] Theorem. If there exists an open neighborhood \(U\) of \(1\), such that \(U\subset\{f(\lambda)\mid \lambda\in(0,\infty)\}\), then for any \(k\), \[ \hat{r}_{k}(n)\mathop{\longrightarrow}\limits_{n\to\infty}r_{k}\text{ almost surely.} \] The authors present a numerical illustration of the behavior of the empirical estimator. To summarize, “the exact performance of the estimator depends on the true PA function and the degrees of interest – if the true PA function increases slowly with respect to the degree, then it is easier to estimate the preference of low degrees, and harder to estimate the preference of high degrees and vice versa.”

The paper ends with the exposition of some open problems among which it is conjectured that for fixed \(k\), \[ \sqrt{n}(\hat{r}_{k}(n)-r_{k})\mathop{\longrightarrow}\limits^{d}N(0,\sigma_{k}^2), \] where \(\sigma_{k}^2\) only depends on \(f\) and \(k\) and \(\mathop{\longrightarrow}\limits^{d}\) denotes convergence in distribution. The paper is hard to understand because it is not well organized. Concepts are used before their definition. However, the proofs of the mathematical results are rigorous.

The following is the model analysed. “The network starts at stage \(n=2\) with two nodes \(v_1\) and \(v_2\) connected by a single edge. Next it evolves recursively by adding nodes \(v_3,v_4,\dots\) which each connect by a single edge to a single node of the existing network. The incoming nodes choose the node to which they connect by a probabilistic mechanism. Given the network with nodes \(v_1,\dots,v_n\), having degrees \(d_1(n),\dots,d_n(n)\) (the degree of a node is the number of edges connected to that node), the node \(v_{n+1}\) connects to the existing node \(v_i\in\{v_1,\dots,v_n\}\) with probability proportional to \(f(d_i(n))\), i.e., with probability \[ \frac{f(d_i(n))}{\sum_{j=1}^nf(d_j(n))}. \] Here \(f:\mathbb{N}_+\to\mathbb{R}_+\) is a given function, which we call the preferential attachment (PA) function, and we refer to \(f(d_i(n))\) as the preference for node \(v_i\) at stage \(n\). The PA function \(f\) is typically assumed to be non-decreasing, so that nodes of higher degrees inspire more incoming connections. This explains the name preferential attachment model. After the incoming node \(v_{n+1}\) has made its choice, the network evolves to the stage of \(n+1\) nodes and the scheme repeats with the set of existing nodes \(\{v_1,\dots,v_n\}\) and the incoming node \(v_{n+2}\). The recursive procedure may be repeated to reach any number of nodes.”

Consider \(r_k\) as the rescaled version of \(f(k)\) for each \(k\), that is \[ r_k:=\frac{f(k)}{\sum_{j=1}^{\infty}f(j)p_{j}}, \] where \((p_{j})_{j=1}^{\infty}\) is the limiting degree distribution. A new estimator of a PA sublinear function \(f\), from an observed realization of a network, is proposed: For each \(n\), and \(k=1,\dots,n\) define \[ \hat{r}_{k}(n)=\frac{N_{>k}(n)}{N_{k}(n)}, \] where \(N_{>k}(n)\) denotes the number of nodes of degree strictly larger than \(k\) at time \(n\), and \(N_{k}(n)\) denotes the number of nodes of degree \(k\). The main result of this paper is the following theorem. Define a function \(\rho_{f}:(0,\infty)\to[0,\infty]\) by \[ \rho_{f}(\lambda):=\sum_{l=1}^{\infty}\prod_{k=1}^{l}\frac{f(k)}{\lambda+f(k)}. \] Theorem. If there exists an open neighborhood \(U\) of \(1\), such that \(U\subset\{f(\lambda)\mid \lambda\in(0,\infty)\}\), then for any \(k\), \[ \hat{r}_{k}(n)\mathop{\longrightarrow}\limits_{n\to\infty}r_{k}\text{ almost surely.} \] The authors present a numerical illustration of the behavior of the empirical estimator. To summarize, “the exact performance of the estimator depends on the true PA function and the degrees of interest – if the true PA function increases slowly with respect to the degree, then it is easier to estimate the preference of low degrees, and harder to estimate the preference of high degrees and vice versa.”

The paper ends with the exposition of some open problems among which it is conjectured that for fixed \(k\), \[ \sqrt{n}(\hat{r}_{k}(n)-r_{k})\mathop{\longrightarrow}\limits^{d}N(0,\sigma_{k}^2), \] where \(\sigma_{k}^2\) only depends on \(f\) and \(k\) and \(\mathop{\longrightarrow}\limits^{d}\) denotes convergence in distribution. The paper is hard to understand because it is not well organized. Concepts are used before their definition. However, the proofs of the mathematical results are rigorous.

Reviewer: Oscar Bustos (Córdoba)