##
**A normal limit law for a nonparametric estimator of the coverage of a random sample.**
*(English)*
Zbl 0599.62053

Consider a random sample of size n drawn from a multinomial population with possibly a countably infinite number of classes, and let C denote the sum of probabilities of the observed classes in the sample. Then 1-C represents the conditional probability of a new class being observed at the \((n+1)st\) stage, given the result of the random sample.

The present article establishes ”asymptotic” confidence intervals for C in terms of \(N_ 1/n\) and \(N_ 2/n\), where \(N_ k\) denotes the number of classes observed exactly k times in the random sample (k\(\geq 1)\). Specifically, the following result is proved. Let \(\{p_{im}\}\) be a double sequence, where, for fixed \(m\geq 1\), \(p_{im}\) represents the probability of the ith class in the multinomial population, and let \(\{n_ m\}\) be a sequence of positive integers such that \(n_ m\to \infty\) as \(m\to \infty\). If sampling is done in the sequence \(\{n_ m\}\), and \(C_ m\) and \(N_{km}\) are defined in the same way as C and \(N_ k\), respectively, with reference to the random sample of size \(n_ m\), then, under the conditions that \(E(N_{1m}/n_ m)\to c_ 1\) \((0<c_ 1<1)\) and \(E(N_{2m}/n_ m)\to c_ 2\geq 0\) as \(m\to \infty\) which imply equivalent conditions on \(\{p_{im}\}\), the sequence \[ n_ m^{1/2}(C_ m-1-\frac{N_{1m}}{n_ m})(\frac{N_{1m}}{n_ m}+\frac{2N_{2m}}{n}-\frac{N^ 2_{1m}}{n^ 2_ m})^{-1/2} \] converges in distribution to a standard normal. The author demonstrates that the confidence intervals implied by this ”asymptotic” result are not much wider than those developed under restrictive assumptions such as in the classical occupancy problem.

The present article establishes ”asymptotic” confidence intervals for C in terms of \(N_ 1/n\) and \(N_ 2/n\), where \(N_ k\) denotes the number of classes observed exactly k times in the random sample (k\(\geq 1)\). Specifically, the following result is proved. Let \(\{p_{im}\}\) be a double sequence, where, for fixed \(m\geq 1\), \(p_{im}\) represents the probability of the ith class in the multinomial population, and let \(\{n_ m\}\) be a sequence of positive integers such that \(n_ m\to \infty\) as \(m\to \infty\). If sampling is done in the sequence \(\{n_ m\}\), and \(C_ m\) and \(N_{km}\) are defined in the same way as C and \(N_ k\), respectively, with reference to the random sample of size \(n_ m\), then, under the conditions that \(E(N_{1m}/n_ m)\to c_ 1\) \((0<c_ 1<1)\) and \(E(N_{2m}/n_ m)\to c_ 2\geq 0\) as \(m\to \infty\) which imply equivalent conditions on \(\{p_{im}\}\), the sequence \[ n_ m^{1/2}(C_ m-1-\frac{N_{1m}}{n_ m})(\frac{N_{1m}}{n_ m}+\frac{2N_{2m}}{n}-\frac{N^ 2_{1m}}{n^ 2_ m})^{-1/2} \] converges in distribution to a standard normal. The author demonstrates that the confidence intervals implied by this ”asymptotic” result are not much wider than those developed under restrictive assumptions such as in the classical occupancy problem.

### MSC:

62G05 | Nonparametric estimation |

60F05 | Central limit and other weak theorems |

62G15 | Nonparametric tolerance and confidence regions |