Weighted likelihood estimation under two-phase sampling. (English) Zbl 1347.62033

Summary: We develop asymptotic theory for weighted likelihood estimators (WLE) under two-phase stratified sampling without replacement. We also consider several variants of WLEs involving estimated weights and calibration. A set of empirical process tools are developed including a Glivenko-Cantelli theorem, a theorem for rates of convergence of \(M\)-estimators, and a Donsker theorem for the inverse probability weighted empirical processes under two-phase sampling and sampling without replacement at the second phase. Using these general results, we derive asymptotic distributions of the WLE of a finite-dimensional parameter in a general semiparametric model where an estimator of a nuisance parameter is estimable either at regular or nonregular rates. We illustrate these results and methods in the Cox model with right censoring and interval censoring. We compare the methods via their asymptotic variances under both sampling without replacement and the more usual (and easier to analyze) assumption of Bernoulli sampling at the second phase.


62E20 Asymptotic distribution theory in statistics
62G20 Asymptotic properties of nonparametric inference
62D05 Sampling theory, sample surveys
62N01 Censored data models


Full Text: DOI arXiv Euclid


[1] Binder, D. A. (1992). Fitting Cox’s proportional hazards models from survey data. Biometrika 79 139-147.
[2] Breslow, N. E., Lumley, T., Ballantyne, C., Chambless, L. and Kulich, M. (2009). Improved Horvitz-Thompson estimation of model parameters from two-phase stratified samples: Applications in epidemiology. Stat. Biosc. 1 32-49.
[3] Breslow, N. E., Lumley, T., Ballantyne, C., Chambless, L. and Kulich, M. (2009). Using the whole cohort in the analysis of case-cohort data. Am. J. Epidemiol. 169 1398-1405.
[4] Breslow, N. E. and Wellner, J. A. (2007). Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression. Scand. J. Stat. 34 86-102. · Zbl 1142.62014
[5] Breslow, N. E. and Wellner, J. A. (2008). A \(Z\)-theorem with estimated nuisance parameters and correction note for: “Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression” [Scand. J. Statist. 34 (2007), no. 1, 86-102; MR2325244]. Scand. J. Stat. 35 186-192. · Zbl 1142.62014
[6] Chan, K. C. G. (2012). Uniform improvement of empirical likelihood for missing response problem. Electron. J. Stat. 6 289-302. · Zbl 1334.62033
[7] Cox, D. R. (1972). Regression models and life-tables (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 34 187-220. · Zbl 0243.62041
[8] Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. J. Amer. Statist. Assoc. 87 376-382. · Zbl 0760.62010
[9] Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663-685. · Zbl 0047.38301
[10] Huang, J. (1996). Efficient estimation for the proportional hazards model with interval censoring. Ann. Statist. 24 540-568. · Zbl 0859.62032
[11] Li, Z. and Nan, B. (2011). Relative risk regression for current status data in case-cohort studies. Canad. J. Statist. 39 557-577. · Zbl 1228.62124
[12] Lin, D. Y. (2000). On fitting Cox’s proportional hazards models to survey data. Biometrika 87 37-47. · Zbl 0974.62008
[13] Lumley, T. (2010). Complex Surveys : A Guide to Analysis Using R . Wiley, Hoboken, NJ.
[14] Lumley, T., Shaw, P. A. and Dai, J. Y. (2011). Connections between survey calibration estimators and semiparametric models for incomplete data. Int. Stat. Rev. 79 200-232. · Zbl 1422.62048
[15] Ma, S. and Kosorok, M. R. (2005). Robust semiparametric M-estimation and the weighted bootstrap. J. Multivariate Anal. 96 190-217. · Zbl 1073.62030
[16] McNeney, B. and Wellner, J. A. (2000). Application of convolution theorems in semiparametric models with non-i.i.d. data. J. Statist. Plann. Inference 91 441-480. Prague Workshop on Perspectives in Modern Statistical Inference: Parametrics, Semi-parametrics, Non-parametrics (1998). · Zbl 0970.62031
[17] Murphy, S. A. and van der Vaart, A. W. (1997). Semiparametric likelihood ratio inference. Ann. Statist. 25 1471-1509. · Zbl 0928.62036
[18] Murphy, S. A. and van der Vaart, A. W. (1999). Observed information in semi-parametric models. Bernoulli 5 381-412. · Zbl 0954.62036
[19] Nan, B. (2004). Efficient estimation for case-cohort studies. Canad. J. Statist. 32 403-419. · Zbl 1059.62116
[20] Neyman, J. (1938). Contribution to the theory of sampling human populations. J. Amer. Statist. Assoc. 33 101-116. · Zbl 0018.22603
[21] Præstgaard, J. and Wellner, J. A. (1993). Exchangeably weighted bootstraps of the general empirical process. Ann. Probab. 21 2053-2086. · Zbl 0792.62038
[22] Prentice, R. L. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73 1-11. · Zbl 0595.62111
[23] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89 846-866. · Zbl 0815.62043
[24] Saegusa, T. (2012). Weighted likelihood estimation under two-phase sampling. Ph.D. thesis, Univ. Washington, Seattle, WA. · Zbl 1347.62033
[25] Saegusa, T. and Wellner, J. A. (2012). Supplement to “Weighted likelihood estimation under two-phase sampling.” . · Zbl 1347.62033
[26] Saegusa, T. and Wellner, J. A. (2012). Weighted likelihood estimation under two-phase sampling. Technical Report 592, Dept. Statistics, Univ. Washington, Seattle, WA. Available at . · Zbl 1347.62033
[27] Self, S. G. and Prentice, R. L. (1988). Asymptotic distribution theory and efficiency results for case-cohort studies. Ann. Statist. 16 64-81. · Zbl 0666.62108
[28] Tan, Z. (2011). Efficient restricted estimators for conditional mean models with missing data. Biometrika 98 663-684. · Zbl 1437.62627
[29] van der Vaart, A. (2002). Semiparametric statistics. In Lectures on Probability Theory and Statistics ( Saint-Flour , 1999). Lecture Notes in Math. 1781 331-457. Springer, Berlin. · Zbl 1013.62031
[30] van der Vaart, A. and Wellner, J. A. (2000). Preservation theorems for Glivenko-Cantelli and uniform Glivenko-Cantelli classes. In High Dimensional Probability , II ( Seattle , WA , 1999). Progress in Probability 47 115-133. Birkhäuser, Boston, MA. · Zbl 0967.60037
[31] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3 . Cambridge Univ. Press, Cambridge. · Zbl 0910.62001
[32] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes : With Applications to Statistics . Springer, New York. · Zbl 0862.60002
[33] White, J. E. (1986). A two stage design for the study of the relationship between a rare exposure and and a rare disease. Am. J. Epidemiol. 115 119-128.
[34] Zheng, H. and Little, R. J. A. (2004). Penalized spline nonparametric mixed models for inference about a finite population mean from two-stage samples. Survey Methodology 30 209-218.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.