×

Density estimation for biased data. (English) Zbl 1091.62022

Summary: The concept of biased data is well known and its practical applications range from social sciences and biology to economics and quality control. These observations arise when a sampling procedure chooses an observation with probability that depends on the value of the observation. This is an interesting sampling procedure because it favors some observations and neglects others. It is known that biasing does not change rates of nonparametric density estimation, but no results are available about sharp constants. This article presents asymptotic results on sharp minimax density estimation. In particular, a coefficient of difficulty is introduced that shows the relationship between sample sizes of direct and biased samples that imply the same accuracy of estimation. The notion of restricted local minimax, where a low-frequency part of the estimated density is known, is introduced; it sheds new light on the phenomenon of nonparametric superefficiency. Results of a numerical study are presented.

MSC:

62G07 Density estimation
62G20 Asymptotic properties of nonparametric inference
62E20 Asymptotic distribution theory in statistics

References:

[1] Brown, L. D., Low, M. G. and Zhao, L. H. (1997). Superefficiency in nonparametric function estimation. Ann. Statist. 25 2607–2625. · Zbl 0895.62043 · doi:10.1214/aos/1030741087
[2] Buckland, S. T., Anderson, D. R., Burnham, K. P. and Laake, J. L. (1993). Distance Sampling : Estimating Abundance of Biological Populations . Chapman and Hall, London. · Zbl 1136.62085
[3] Cook, R. D. and Martin, F. B. (1974). A model for quadrat sampling with “visibility bias.” J. Amer. Statist. Assoc. 69 345–349.
[4] Cox, D. R. (1969). Some sampling problems in technology. In New Developments in Survey Sampling (N. L. Johnson and H. Smith, Jr., eds.) 506–527. Wiley, New York.
[5] Devroye, L. (1987). A Course in Density Estimation . Birkhäuser, Boston. · Zbl 0617.62043
[6] Devroye, L. and Györfi, L. (1985). Nonparametric Density Estimation : The \(L_1\) View. Wiley, New York. · Zbl 0546.62015
[7] Efromovich, S. (1985). Nonparametric estimation of a density with unknown smoothness. Theory Probab. Appl. 30 557–568. · Zbl 0593.62034 · doi:10.1137/1130067
[8] Efromovich, S. (1989). On sequential nonparametric estimation of a density. Theory Probab. Appl. 34 228–239. · Zbl 0716.62077 · doi:10.1137/1134019
[9] Efromovich, S. (1998). On global and pointwise adaptive estimation. Bernoulli 4 273–282. · Zbl 0908.62046 · doi:10.2307/3318752
[10] Efromovich, S. (1999). Nonparametric Curve Estimation : Methods , Theory and Applications. Springer, New York. · Zbl 0935.62039 · doi:10.1007/b97679
[11] Efromovich, S. (2000). On sharp adaptive estimation of multivariate curves. Math. Methods Statist. 9 117–139. · Zbl 1006.62033
[12] Efromovich, S. (2001). Density estimation under random censorship and order restrictions: From asymptotic to small samples. J. Amer. Statist. Assoc. 96 667–684. · Zbl 1017.62029 · doi:10.1198/016214501753168334
[13] Efromovich, S. (2004). Distribution estimation for biased data. J. Statist. Plann. Inference . · Zbl 1094.62046 · doi:10.1016/S0378-3758(03)00202-7
[14] Gill, R. D., Vardi, Y. and Wellner, J. A. (1988). Large sample theory of empirical distributions in biased sampling methods. Ann. Statist. 16 1069–1112. JSTOR: · Zbl 0668.62024 · doi:10.1214/aos/1176350948
[15] Golubev, G. K. (1991). LAN in problems of nonparametric estimation of functions and lower bounds for quadratic risks. Theory Probab. Appl. 36 152–157. · Zbl 0738.62043 · doi:10.1137/1136014
[16] Ibragimov, I. A. and Khasminskii, R. Z. (1981). Statistical Estimation : Asymptotic Theory . Springer, New York. · Zbl 0467.62026
[17] Kahane, J.-P. (1985). Some Random Series of Functions , 2nd ed. Cambridge Univ. Press. · Zbl 0571.60002
[18] Lee, J. and Berger, J. O. (2001). Semiparametric Bayesian analysis of selection models. J. Amer. Statist. Assoc. 96 1397–1409. · Zbl 1051.62029 · doi:10.1198/016214501753382318
[19] Patil, G. P. and Rao, C. R. (1977). The weighted distributions: A survey of their applications. In Applications of Statistics (P. R. Krishnaiah, ed.) 383–405. North-Holland, Amsterdam. · Zbl 0371.62034
[20] Pinsker, M. S. (1980). Optimal filtering of a square integrable signal in Gaussian noise. Problems Inform. Transmission 16 52–68. · Zbl 0452.94003
[21] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London. · Zbl 0617.62042
[22] Sun, J. and Woodroofe, M. B. (1997). Semiparametric estimates under biased sampling. Statist. Sinica 7 545–575. · Zbl 0876.62020
[23] Wu, C. O. (1997). A cross-validation bandwidth choice for kernel density estimates with selection biased data. J. Multivariate Anal. 61 38–60. · Zbl 0885.62050 · doi:10.1006/jmva.1997.1659
[24] Wu, C. O. and Mao, A. Q. (1996). Minimax kernels for density estimatorion with for biased data. Ann. Inst. Statist. Math. 48 451–467. · Zbl 0926.62027 · doi:10.1007/BF00050848
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.