×

Variable selection in discrete survival models including heterogeneity. (English) Zbl 1383.62212

Summary: Several variable selection procedures are available for continuous time-to-event data. However, if time is measured in a discrete way and therefore many ties occur models for continuous time are inadequate. We propose penalized likelihood methods that perform efficient variable selection in discrete survival modeling with explicit modeling of the heterogeneity in the population. The method is based on a combination of ridge and lasso type penalties that are tailored to the case of discrete survival. The performance is studied in simulation studies and an application to the birth of the first child.

MSC:

62N01 Censored data models
62J07 Ridge regression; shrinkage estimators (Lasso)
62P10 Applications of statistics to biology and medical sciences; meta analysis
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Baker M, Melino A (2000) Duration dependence and nonparametric heterogeneity: a monte carlo study. J Econom 96:357-393 · Zbl 0956.62119
[2] Bradic J, Fan J, Jiang J (2011) Regularization for coxÕs proportional hazards model with np-dimensionality. Ann Stat 39(6):3092 · Zbl 1246.62202
[3] Broström G (2009) glmmML: generalized linear models with clustering. http://CRAN.R-project.org/package=glmmML, R package version 0.81-6
[4] Brown C (1975) On the use of indicator variables for studying the time-dependence of parameters in a response-time model. Biometrics 31:863-872 · Zbl 0342.62070
[5] Bühlmann P, Van De Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, New York · Zbl 1273.62015
[6] Cox DR (1972) Regression models and life tables (with discussion). J R Stat Soc B 34:187-220 · Zbl 0243.62041
[7] Dezeure R, Bühlmann P, Meier L, Meinshausen N (2014) High-dimensional inference: confidence intervals, p values and R-Software hdi. arXiv preprint arXiv:14084026 · Zbl 1426.62183
[8] Dierckx P (1993) Curve and surface fitting with splines. Oxford Science Publications, Oxford · Zbl 0782.41016
[9] Do Ha I, Noh M, Lee Y (2012) Frailtyhl: a package for fitting frailty models with h-likelihood. R J 4(2):28-36
[10] Efron B (1988) Logistic regression, survival analysis, and the Kaplan-Meier-curve. J Am Stat Assoc 83:414-425 · Zbl 0644.62100
[11] Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and Penalties. Stat Sci 11:89-121 · Zbl 0955.62562
[12] Fahrmeir L (1994) Dynamic modelling and penalized likelihood estimation for discrete time survival data. Biometrika 81:317-330 · Zbl 0807.62085
[13] Fahrmeir L, Kneib T (2011) Bayesian smoothing and regression for longitudinal, spatial and event history data. Cambridge University Press, Cambridge · Zbl 1249.62003
[14] Fahrmeir L, Knorr-Held L (1997) Dynamic discrete-time duration models: estimation via markov chain monte carlo. Sociol Methodol 27(1):417-452
[15] Fahrmeir L, Tutz G (2001) Multivariate statistical modelling based on generalized linear models. Springer, New York · Zbl 0980.62052
[16] Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat. pp 74-99 · Zbl 1012.62106
[17] Friedman JH, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1-22
[18] Gamst A, Donohue M, Xu R (2009) Asymptotic properties and empirical evaluation of the npmle in the proportional hazards mixed-effects model. Stat Sin 19(3):997 · Zbl 1166.62305
[19] Gelman A, Hill J, Su Y, Yajima M, Pittau MG (2013) mi: missing data imputation and model checking. http://CRAN.R-project.org/package=mi, R package version 0.09-18.03
[20] Goeman JJ \[(2010) \rm{L}_1\] L1 penalized estimation in the Cox proportional hazards model. Biom J 52:70-84 · Zbl 1207.62185
[21] Goeman JJ (2011) Penalized. R package version 0.9-42 · Zbl 0813.65053
[22] Groll A (2011) glmmLasso: variable selection for generalized linear mixed models by \[\text{ L }_1\] L1-penalized estimation. http://CRAN.R-project.org/package=glmmLasso, R package version 1.2.3 · Zbl 1235.62090
[23] Groll A, Tutz G (2014) Variable selection for generalized linear mixed models by \[\text{ L }_1\] L1-penalized estimation. Stat Comput 24(2):137-154 · Zbl 1325.62139
[24] Ham JC, Rea Jr SA (1987) Unemployment insurance and male unemployment duration in Canada. J Labor Econom. pp 325-353 · Zbl 0956.62119
[25] Hartzel J, Liu I, Agresti A (2001) Describing heterogenous effects in stratified ordinal contingency tables, with applications to multi-center clinical trials. Comput Stat Data Anal. 35(4):429-449 · Zbl 1080.62523
[26] Heckman JJ, Singer B (1984) Econometric duration analysis. J Econom 24(1):63-132 · Zbl 0531.62099
[27] Hinde, J.; Gilchrist, R. (ed.), Compound poisson regression models, 109-121 (1982), New York
[28] Huinink J, Brüderl J, Nauck B, Walper S, Castiglioni L, Feldhaus M (2011) Panel analysis of intimate relationships and family dynamics (pairfam): conceptual framework and design. J Fam Res 23:77-101
[29] Kalbfleisch J, Prentice R (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York · Zbl 1012.62104
[30] Kauermann G, Tutz G, Brüderl J (2005) The survival of newly founded firms: a case-study into varying-coefficient models. J R Stat Soc A 168:145-158 · Zbl 1100.62102
[31] Laird N, Olivier D (1981) Covariance analysis of censored survival data using log-linear analysis techniques. J Am Stat Assoc 76(374):231-240 · Zbl 0473.62045
[32] Lancaster T (1990) The econometric analysis of transition data. Cambridge University Press, Cambridge · Zbl 0717.62106
[33] Land KC, Nagin DS, McCall PL (2001) Discrete-time hazard regression models with hidden heterogeneity the semiparametric mixed poisson regression approach. Sociol Methods Res 29(3):342-373
[34] Leeb H, Pötscher BM (2005) Model selection and inference: facts and fiction. Econom Theory 21(01):21-59 · Zbl 1085.62004
[35] Lin X, Breslow NE (1996) Bias correction in generalized linear mixed models with multiple components of dispersion. J Am Stat Assoc 91:1007-1016 · Zbl 0882.62059
[36] Littell R, Milliken G, Stroup W, Wolfinger R (1996) SAS system for mixed models. SAS Institute Inc., Cary
[37] Liu Q, Pierce DA (1994) A note on Gauss-Hermite quadrature. Biometrika 81:624-629 · Zbl 0813.65053
[38] Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R (2014) A significance test for the Lasso. Ann Stat 42(2):413 · Zbl 1305.62254
[39] Möst S, Pößnecker W, Tutz G (2015) Variable selection for discrete competing risks models. Qual Quant. pp 1-22
[40] Nauck B, Brüderl J, Huinink J, Walper S (2013) The german family panel (pairfam). GESIS data archive, cologne ZA5678 data file version 4.0.0
[41] Nicoletti C, Rondinelli C (2010) The (mis) specification of discrete duration models with unobserved heterogeneity: a monte carlo study. J Econom 159(1):1-13 · Zbl 1431.62656
[42] Park MY, Hastie T (2007) An l1 regularization-path algorithm for generalized linear models. J R Stat Soc B 69:659-677 · Zbl 07555370
[43] Pinheiro JC, Bates DM (1995) Approximations to the log-likelihood function in the nonlinear mixed-effects model. J Comput Graph Stat 4:12-35
[44] Pinheiro JC, Bates DM (2000) Mixed-effects models in S and S-plus. Springer, New York · Zbl 0953.62065
[45] Pötscher BM, Leeb H (2009) On the distribution of penalized maximum likelihood estimators: the lasso, scad, and thresholding. J Multivar Anal 100(9):2065-2082 · Zbl 1170.62046
[46] Prentice RL, Gloeckler LA (1978) Regression analysis of grouped survival data with application to breast cancer data. Biometrics 34:57-67 · Zbl 0405.62083
[47] Rondeau V, Mazroui Y, Gonzalez JR (2012) frailtypack: an R package for the analysis of correlated survival data with frailty models using penalized likelihood estimation or parametrical estimation. J Stat Softw 47(4):1-28
[48] Scheike T, Jensen T (1997) A discrete survival model with random effects: an application to time to pregnancy. Biometrics. pp 318-329 · Zbl 0874.62135
[49] Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461-464 · Zbl 0379.62005
[50] Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1-13
[51] Therneau T, Grambsch P (2000) Modeling survival data: extending the Cox model. Springer, New York · Zbl 0958.62094
[52] Therneau TM (2013) A package for survival analysis in S. R package version 2.37-4
[53] Thompson WA (1977) On the treatment of grouped observations in life studies. Biometrics 33:463-470 · Zbl 0371.62149
[54] Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267-288 · Zbl 0850.62538
[55] Tutz G, Pritscher L (1996) Nonparametric estimation of discrete hazard functions. Lifetime Data Anal 2:291-308 · Zbl 0862.62039
[56] van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in R. JStat Softw 45(3):1-67. http://www.jstatsoft.org/v45/i03/ · Zbl 0342.62070
[57] van Buuren S, Groothuis-Oudshoorn K (2013) Mice: multivariate imputation by chained equations in R. http://CRAN.R-project.org/package=mice, R package version 2.18
[58] Van den Berg GJ (2001) Duration models: specification, identification and multiple durations. Handbook Econom 5:3381-3460
[59] Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York · Zbl 1006.62003
[60] Vermunt JK (1996) Log-linear event history analysis: a general approach with missing data, latent variables, and unobserved heterogeneity, vol 8. Tilburg University Press, Tilburg · Zbl 0878.62051
[61] Vonesh EF (1996) A note on the use of Laplace’s approximation for nonlinear mixed-effects models. Biometrika 83:447-452 · Zbl 0878.62019
[62] Wolfinger R, O’Connell M (1993) Generalized linear mixed models; a pseudolikelihood approach. J Stat Comput Simul 48:233-243 · Zbl 0833.62067
[63] Wood S, Scheipl F (2013) Gamm4: generalized additive mixed models using mgcv and lme4. http://CRAN.R-project.org/package=gamm4, R package version 0.2-2 · Zbl 0243.62041
[64] Wood SN (2006) Generalized additive models: an introduction with R. Chapman & Hall/CRC, London · Zbl 1087.62082
[65] Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418-1429 · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.