Variable selection in discrete survival models including heterogeneity. (English) Zbl 1383.62212

Summary: Several variable selection procedures are available for continuous time-to-event data. However, if time is measured in a discrete way and therefore many ties occur models for continuous time are inadequate. We propose penalized likelihood methods that perform efficient variable selection in discrete survival modeling with explicit modeling of the heterogeneity in the population. The method is based on a combination of ridge and lasso type penalties that are tailored to the case of discrete survival. The performance is studied in simulation studies and an application to the birth of the first child.


62N01 Censored data models
62J07 Ridge regression; shrinkage estimators (Lasso)
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI Link


[1] Anderson, DA; Aitkin, M, Variance component models with binary response: interviewer variability, J R Stat Soc Ser B, 47, 203-210, (1985)
[2] Androulakis, E; Koukouvinos, C; Vonta, F, Estimation and variable selection via frailty models with penalized likelihood, Stat Med, 31, 2223-2239, (2012) · Zbl 1247.62246
[3] Baker, M; Melino, A, Duration dependence and nonparametric heterogeneity: a Monte Carlo study, J Econom, 96, 357-393, (2000) · Zbl 0956.62119
[4] Bates D, Maechler M (2010) lme4: linear mixed-effects models using S4 classes. http://CRAN.R-project.org/package=lme4, R package version 0.999999-0
[5] Bradic, J; Fan, J; Jiang, J, Regularization for coxõs proportional hazards model with np-dimensionality, Ann Stat, 39, 3092, (2011) · Zbl 1246.62202
[6] Breslow, NE; Clayton, DG, Approximate inference in generalized linear mixed model, J Am Stat Assoc, 88, 9-25, (1993) · Zbl 0775.62195
[7] Breslow, NE; Lin, X, Bias correction in generalized linear mixed models with a single component of dispersion, Biometrika, 82, 81-91, (1995) · Zbl 0823.62059
[8] Broström G (2009) glmmML: generalized linear models with clustering. http://CRAN.R-project.org/package=glmmML, R package version 0.81-6
[9] Brown, C, On the use of indicator variables for studying the time-dependence of parameters in a response-time model, Biometrics, 31, 863-872, (1975) · Zbl 0342.62070
[10] Bühlmann P, Van De Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, New York · Zbl 1273.62015
[11] Cox, DR, Regression models and life tables (with discussion), J R Stat Soc B, 34, 187-220, (1972) · Zbl 0243.62041
[12] Dezeure R, Bühlmann P, Meier L, Meinshausen N (2014) High-dimensional inference: confidence intervals, p values and R-Software hdi. arXiv preprint arXiv:14084026 · Zbl 0371.62149
[13] Dierckx P (1993) Curve and surface fitting with splines. Oxford Science Publications, Oxford · Zbl 0782.41016
[14] Do Ha, I; Noh, M; Lee, Y, Frailtyhl: a package for Fitting frailty models with h-likelihood, R J, 4, 28-36, (2012)
[15] Efron, B, Logistic regression, survival analysis, and the kaplan-meier-curve, J Am Stat Assoc, 83, 414-425, (1988) · Zbl 0644.62100
[16] Eilers, PHC; Marx, BD, Flexible smoothing with B-splines and penalties, Stat Sci, 11, 89-121, (1996) · Zbl 0955.62562
[17] Fahrmeir, L, Dynamic modelling and penalized likelihood estimation for discrete time survival data, Biometrika, 81, 317-330, (1994) · Zbl 0807.62085
[18] Fahrmeir L, Kneib T (2011) Bayesian smoothing and regression for longitudinal, spatial and event history data. Cambridge University Press, Cambridge · Zbl 1249.62003
[19] Fahrmeir, L; Knorr-Held, L, Dynamic discrete-time duration models: estimation via Markov chain Monte Carlo, Sociol Methodol, 27, 417-452, (1997)
[20] Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models. Springer, New York · Zbl 0980.62052
[21] Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat. pp 74-99 · Zbl 1012.62106
[22] Friedman, JH; Hastie, T; Tibshirani, R, Regularization paths for generalized linear models via coordinate descent, J Stat Softw, 33, 1-22, (2010)
[23] Gamst, A; Donohue, M; Xu, R, Asymptotic properties and empirical evaluation of the npmle in the proportional hazards mixed-effects model, Stat Sin, 19, 997, (2009) · Zbl 1166.62305
[24] Gelman A, Hill J, Su Y, Yajima M, Pittau MG (2013) mi: missing data imputation and model checking. http://CRAN.R-project.org/package=mi, R package version 0.09-18.03
[25] Goeman, JJ, \(\rm{L}_1\) penalized estimation in the Cox proportional hazards model, Biom J, 52, 70-84, (2010) · Zbl 1207.62185
[26] Goeman JJ (2011) Penalized. R package version 0.9-42 · Zbl 0813.65053
[27] Groll A (2011) glmmLasso: variable selection for generalized linear mixed models by \(\text{ L }_1\)-penalized estimation. http://CRAN.R-project.org/package=glmmLasso, R package version 1.2.3 · Zbl 1100.62102
[28] Groll, A; Tutz, G, Variable selection for generalized linear mixed models by \(\text{ L }_1\)-penalized estimation, Stat Comput, 24, 137-154, (2014) · Zbl 1325.62139
[29] Ham JC, Rea Jr SA (1987) Unemployment insurance and male unemployment duration in Canada. J Labor Econom. pp 325-353 · Zbl 0956.62119
[30] Hartzel, J; Liu, I; Agresti, A, Describing heterogenous effects in stratified ordinal contingency tables, with applications to multi-center clinical trials, Comput Stat Data Anal., 35, 429-449, (2001) · Zbl 1080.62523
[31] Heckman, JJ; Singer, B, Econometric duration analysis, J Econom, 24, 63-132, (1984) · Zbl 0531.62099
[32] Hinde, J; Gilchrist, R (ed.), Compound Poisson regression models, 109-121, (1982), New York
[33] Huinink, J; Brüderl, J; Nauck, B; Walper, S; Castiglioni, L; Feldhaus, M, Panel analysis of intimate relationships and family dynamics (pairfam): conceptual framework and design, J Fam Res, 23, 77-101, (2011)
[34] Kalbfleisch J, Prentice R (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York · Zbl 1012.62104
[35] Kauermann, G; Tutz, G; Brüderl, J, The survival of newly founded firms: a case-study into varying-coefficient models, J R Stat Soc A, 168, 145-158, (2005) · Zbl 1100.62102
[36] Laird, N; Olivier, D, Covariance analysis of censored survival data using log-linear analysis techniques, J Am Stat Assoc, 76, 231-240, (1981) · Zbl 0473.62045
[37] Lancaster T (1990) The econometric analysis of transition data. Cambridge University Press, Cambridge · Zbl 0717.62106
[38] Land, KC; Nagin, DS; McCall, PL, Discrete-time hazard regression models with hidden heterogeneity the semiparametric mixed Poisson regression approach, Sociol Methods Res, 29, 342-373, (2001)
[39] Leeb, H; Pötscher, BM, Model selection and inference: facts and fiction, Econom Theory, 21, 21-59, (2005) · Zbl 1085.62004
[40] Lin, X; Breslow, NE, Bias correction in generalized linear mixed models with multiple components of dispersion, J Am Stat Assoc, 91, 1007-1016, (1996) · Zbl 0882.62059
[41] Littell R, Milliken G, Stroup W, Wolfinger R (1996) SAS system for mixed models. SAS Institute Inc., Cary
[42] Liu, Q; Pierce, DA, A note on Gauss-Hermite quadrature, Biometrika, 81, 624-629, (1994) · Zbl 0813.65053
[43] Lockhart, R; Taylor, J; Tibshirani, RJ; Tibshirani, R, A significance test for the lasso, Ann Stat, 42, 413, (2014) · Zbl 1305.62254
[44] Möst S, Pößnecker W, Tutz G (2015) Variable selection for discrete competing risks models. Qual Quant. pp 1-22
[45] Nauck B, Brüderl J, Huinink J, Walper S (2013) The german family panel (pairfam). GESIS data archive, cologne ZA5678 data file version 4.0.0
[46] Nicoletti, C; Rondinelli, C, The (mis) specification of discrete duration models with unobserved heterogeneity: a Monte Carlo study, J Econom, 159, 1-13, (2010) · Zbl 1431.62656
[47] Park, MY; Hastie, T, An l1 regularization-path algorithm for generalized linear models, J R Stat Soc B, 69, 659-677, (2007)
[48] Pinheiro, JC; Bates, DM, Approximations to the log-likelihood function in the nonlinear mixed-effects model, J Comput Graph Stat, 4, 12-35, (1995)
[49] Pinheiro JC, Bates DM (2000) Mixed-effects models in S and S-plus. Springer, New York · Zbl 0953.62065
[50] Pötscher, BM; Leeb, H, On the distribution of penalized maximum likelihood estimators: the lasso, scad, and thresholding, J Multivar Anal, 100, 2065-2082, (2009) · Zbl 1170.62046
[51] Prentice, RL; Gloeckler, LA, Regression analysis of grouped survival data with application to breast cancer data, Biometrics, 34, 57-67, (1978) · Zbl 0405.62083
[52] Rondeau, V; Mazroui, Y; Gonzalez, JR, Frailtypack: an R package for the analysis of correlated survival data with frailty models using penalized likelihood estimation or parametrical estimation, J Stat Softw, 47, 1-28, (2012)
[53] Scheike T, Jensen T (1997) A discrete survival model with random effects: an application to time to pregnancy. Biometrics. pp 318-329 · Zbl 0874.62135
[54] Schwarz, G, Estimating the dimension of a model, Ann Stat, 6, 461-464, (1978) · Zbl 0379.62005
[55] Simon, N; Friedman, J; Hastie, T; Tibshirani, R, Regularization paths for cox’s proportional hazards model via coordinate descent, J Stat Softw, 39, 1-13, (2011)
[56] Therneau T, Grambsch P (2000) Modeling survival data: extending the Cox model. Springer, New York · Zbl 0958.62094
[57] Therneau TM (2013) A package for survival analysis in S. R package version 2.37-4
[58] Thompson, WA, On the treatment of grouped observations in life studies, Biometrics, 33, 463-470, (1977) · Zbl 0371.62149
[59] Tibshirani, R, Regression shrinkage and selection via the lasso, J R Stat Soc B, 58, 267-288, (1996) · Zbl 0850.62538
[60] Tutz, G; Pritscher, L, Nonparametric estimation of discrete hazard functions, Lifetime Data Anal, 2, 291-308, (1996) · Zbl 0862.62039
[61] van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in R. JStat Softw 45(3):1-67. http://www.jstatsoft.org/v45/i03/ · Zbl 0342.62070
[62] van Buuren S, Groothuis-Oudshoorn K (2013) Mice: multivariate imputation by chained equations in R. http://CRAN.R-project.org/package=mice, R package version 2.18
[63] Berg, GJ, Duration models: specification, identification and multiple durations, Handbook Econom, 5, 3381-3460, (2001)
[64] Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York · Zbl 1006.62003
[65] Vermunt JK (1996) Log-linear event history analysis: a general approach with missing data, latent variables, and unobserved heterogeneity, vol 8. Tilburg University Press, Tilburg · Zbl 0878.62051
[66] Vonesh, EF, A note on the use of laplace’s approximation for nonlinear mixed-effects models, Biometrika, 83, 447-452, (1996) · Zbl 0878.62019
[67] Wolfinger, R; O’Connell, M, Generalized linear mixed models; a pseudolikelihood approach, J Stat Comput Simul, 48, 233-243, (1993) · Zbl 0833.62067
[68] Wood S, Scheipl F (2013) Gamm4: generalized additive mixed models using mgcv and lme4. http://CRAN.R-project.org/package=gamm4, R package version 0.2-2 · Zbl 0243.62041
[69] Wood SN (2006) Generalized additive models: an introduction with R. Chapman & Hall/CRC, London · Zbl 1087.62082
[70] Zou, H, The adaptive lasso and its oracle properties, J Am Stat Assoc, 101, 1418-1429, (2006) · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.