Z-estimation and stratified samples: application to survival models. (English) Zbl 1333.62237

Summary: The infinite dimensional Z-estimation theorem offers a systematic approach to joint estimation of both Euclidean and non-Euclidean parameters in probability models for data. It is easily adapted for stratified sampling designs. This is important in applications to censored survival data because the inverse probability weights that modify the standard estimating equations often depend on the entire follow-up history. Since the weights are not predictable, they complicate the usual theory based on martingales. This paper considers joint estimation of regression coefficients and baseline hazard functions in the Cox proportional and Lin-Ying additive hazards models. Weighted likelihood equations are used for the former and weighted estimating equations for the latter. Regression coefficients and baseline hazards may be combined to estimate individual survival probabilities. Efficiency is improved by calibrating or estimating the weights using information available for all subjects. Although inefficient in comparison with likelihood inference for incomplete data, which is often difficult to implement, the approach provides consistent estimates of desired population parameters even under model misspecification.


62N02 Estimation in survival analysis and censored data
62D05 Sampling theory, sample surveys
62N01 Censored data models
62G05 Nonparametric estimation


invGauss; R; Survey
Full Text: DOI Link


[1] Aalen, O, Nonparametric inference in connection with multiple decrement models, Scand J Stat, 3, 15-27, (1976) · Zbl 0331.62030
[2] Aalen OO, Borgan O, Gjessing HK (2008) Survival and event history analysis. Springer, New York · Zbl 1204.62165
[3] Andersen, PK; Gill, RD, Cox’s regression model for counting processes: a large sample study, Ann Stat, 10, 1100-1120, (1982) · Zbl 0526.62026
[4] Anderson, GL; Manson, J; Wallace, R; Lund, B; Hall, D; Davis, S; Shumaker, S; Wang, CY; Stein, E; Prentice, RL, Implementation of the women’s health initiative study design, Ann Epidemiol, 13, s5-s17, (2003)
[5] Barlow R, Bartholomew D, Bremner J, Brunk H (1972) Statistical inference under order restrictions. Wiley, New York · Zbl 0246.62038
[6] Begun, JM; Hall, WJ; Huang, WM; Wellner, JA, Information and asymptotic efficiency in parametric-nonparametric models, Ann Stat, 11, 432-452, (1983) · Zbl 0526.62045
[7] Bickel P, Klaassen C, Ritov Y, Wellner J (1993) Efficient and adaptive estimation for semiparametric models. The Johns Hopkins University Press, Baltimore · Zbl 0786.62001
[8] Borgan, O; Langholz, B; Samuelsen, SO; Goldstein, L; Pogoda, J, Exposure stratified case-cohort designs, Lifetime Data Anal, 6, 39-58, (2000) · Zbl 0948.62069
[9] Breslow, N; Crowley, J, A large sample study of the life table and product limit estimates under random censorship, Ann Stat, 2, 437-453, (1974) · Zbl 0283.62023
[10] Breslow NE, Lumley T (2013) Semiparametric models and two-phase samples: applications to Cox regression. In: IMS collections, vol. 9, Institute of Mathematical Statistics, Beachwood, OH, pp 65-77 · Zbl 1347.60008
[11] Breslow, NE; Wellner, JA, Weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression, Scand J Stat, 34, 86-102, (2007) · Zbl 1142.62014
[12] Breslow, NE; Wellner, JA, A Z-theorem with estimated nuisance parameters and correction note for ‘weighted likelihood for semiparametric models and two-phase stratified samples, with application to Cox regression’, Scand J Stat, 35, 186-192, (2008) · Zbl 1164.62012
[13] Breslow, NE; Lumley, T; Ballantyne, CM; Chambless, LE; Kulich, M, Improved horvitz-Thompson estimation of model parameters from two-phase stratified samples: applications in epidemiology, Statist Biosci, 1, 32-49, (2009)
[14] Breslow, NE; Lumley, T; Ballantyne, CM; Chambless, LE; Kulich, M, Using the whole cohort in the analysis of case-cohort data, Am J Epidemiol, 169, 1398-1405, (2009)
[15] Cox DR (1961) Tests of separate families of hypotheses. In: Proceedings of the fourth Berkeley symposium on mathematical statististics and probability, vol. 1, University of California Press, Berkeley, CA, pp 105-123 · Zbl 0796.62099
[16] Cox, DR, Regression models and life-tables (with discussion), J R Stat Soc (Ser B), 34, 187-220, (1972) · Zbl 0243.62041
[17] Deville, JC; Särndal, CE, Calibration estimators in survey sampling, J Am Stat Assoc, 87, 376-382, (1992) · Zbl 0760.62010
[18] Freedman, DA, On the so-called “huber sandwich estimator“ and “robust standard errors”, Am Stat, 60, 299-302, (2006)
[19] Godambe, VP, An optimum property of regular maximum-likelihood estimation, Ann Math Stat, 31, 1208-1211, (1960) · Zbl 0118.34301
[20] Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, University of California Press, Berkeley, CA, pp 221-233
[21] Huber PJ (1980) Robust statistics. Wiley, New York
[22] Kalbfleisch JD, Prentice R (2002) The statistical analysis of failure time data, 2nd edn. Wiley, Hoboken, NJ · Zbl 1012.62104
[23] Keogh, RH; White, IR, Using full-cohort data in nested case-control and case-cohort studies by multiple imputation, Stat Med, 32, 4021-4043, (2013)
[24] Kulich, M; Lin, DY, Additive hazards regression for case-cohort studies, Biometrika, 87, 73-87, (2000) · Zbl 0974.62104
[25] Kulich, M; Lin, DY, Improving the efficiency of relative-risk estimation in case-cohort studies, J Am Stat Assoc, 99, 832-844, (2004) · Zbl 1117.62373
[26] Li, G; Tseng, CH, Non-parametric estimation of a survival function with two-stage design studies, Scand J Stat, 35, 193-211, (2008) · Zbl 1157.62069
[27] Liang, KY; Zeger, SL, Longitudinal data analysis using generalized linear models, Biometrika, 73, 13-22, (1986) · Zbl 0595.62110
[28] Lin, DY; Wei, LJ, The robust inference for the Cox proportional hazards model, J Am Stat Assoc, 84, 1074-1078, (1989) · Zbl 0702.62042
[29] Lin, DY; Ying, Z, Semiparametric analysis of the additive risk model, Biometrika, 81, 61-71, (1994) · Zbl 0796.62099
[30] Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York · Zbl 1011.62004
[31] Lumley T (2009) Robustness of semiparametric efficiency in nearly-correct models for two-phase samples. UW Biostatistics Working Paper Series. http://biostats.bepress.com/uwbiostat/paper351, Accessed 22 November 2014 · Zbl 0526.62026
[32] Lumley T (2012) Complex surveys: a guide to analysis using R. Wiley, Hoboken, NJ
[33] Lumley, T; Shaw, PA; Dai, JY, Connections between survey calibration estimators and semiparametric models for incomplete data, Int Stat Rev, 79, 200-220, (2011) · Zbl 1422.62048
[34] Marti, H; Chavance, M, Multiple imputation analysis of case-cohort studies, Stat Med, 30, 1595-1607, (2011)
[35] McKeague, IW; Sasieni, PD, A partly parametric additive risk model, Biometrika, 81, 501-514, (1994) · Zbl 0812.62041
[36] Nan, B, Efficient estimation for case-cohort studies, Can J Stat, 32, 403-419, (2004) · Zbl 1059.62116
[37] Nan, B; Emond, M; Wellner, JA, Information bounds for Cox regression models with missing data, Ann Stat, 32, 723-753, (2004) · Zbl 1097.62097
[38] Nelson, W, Theory and applications of hazard plotting for censored failure data, Technometrics, 14, 945-966, (1972)
[39] Prentice, RL, A case-cohort design for epidemiologic cohort studies and disease prevention trials, Biometrika, 73, 1-11, (1986) · Zbl 0595.62111
[40] Robins, JM; Rotnitzky, A; Zhao, LP, Estimation of regression-coefficients when some regressors are not always observed, J Am Stat Assoc, 89, 846-866, (1994) · Zbl 0815.62043
[41] Royall, RM, Model robust confidence-intervals using maximum-likelihood estimators, Int Stat Rev, 54, 221-226, (1986) · Zbl 0596.62032
[42] Saegusa, T; Wellner, JA, Weighted likelihood estimation under two-phase sampling, Ann Stat, 41, 269-295, (2013) · Zbl 1347.62033
[43] Särndal C, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer, New York · Zbl 0742.62008
[44] Scheike, TH; Martinussen, T, Maximum likelihood estimation for cox’s regression model under case-cohort sampling, Scand J Stat, 31, 283-293, (2004) · Zbl 1060.62111
[45] Struthers, CA; Kalbfleisch, JD, Misspecified proportional hazard models, Biometrika, 73, 363-369, (1986) · Zbl 0606.62108
[46] Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, New York · Zbl 0958.62094
[47] Tsiatis, AA, A large sample study of cox’s regression model, Ann Stat, 9, 93-108, (1981) · Zbl 0455.62019
[48] van der Vaart AW (1995) Efficiency of infinite dimensional M-estimators. Stat Neerl 49:9-30 · Zbl 0830.62029
[49] van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge, UK · Zbl 0910.62001
[50] van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes with applications in statistics. Springer, New York · Zbl 0862.60002
[51] Williams, OD, The atherosclerosis risk in communities (ARIC) study—design and objectives, Am J Epidemiol, 129, 687-702, (1989)
[52] Zeng, DL; Lin, DY, Efficient estimation of semiparametric transformation models for two-phase cohort studies, J Am Stat Assoc, 109, 371-383, (2014) · Zbl 1367.62099
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.