Boosting multi-state models. (English) Zbl 1356.65030

Summary: One important goal in multi-state modelling is to explore information about conditional transition-type-specific hazard rate functions by estimating influencing effects of explanatory variables. This may be performed using single transition-type-specific models if these covariate effects are assumed to be different across transition-types. To investigate whether this assumption holds or whether one of the effects is equal across several transition-types (cross-transition-type effect), a combined model has to be applied, for instance with the use of a stratified partial likelihood formulation. Here, prior knowledge about the underlying covariate effect mechanisms is often sparse, especially about ineffectivenesses of transition-type-specific or cross-transition-type effects. As a consequence, data-driven variable selection is an important task: a large number of estimable effects has to be taken into account if joint modelling of all transition-types is performed. A related but subsequent task is model choice: is an effect satisfactory estimated assuming linearity, or is the true underlying nature strongly deviating from linearity? This article introduces component-wise Functional Gradient Descent Boosting (short boosting) for multi-state models, an approach performing unsupervised variable selection and model choice simultaneously within a single estimation run. We demonstrate that features and advantages in the application of boosting introduced and illustrated in classical regression scenarios remain present in the transfer to multi-state models. As a consequence, boosting provides an effective means to answer questions about ineffectiveness and non-linearity of single transition-type-specific or cross-transition-type effects.


65C60 Computational problems in statistics (MSC2010)
62-04 Software, source code, etc. for problems pertaining to statistics
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI


[1] Akaike, H; Petrov, BN (ed.); Csaki, F (ed.), Information theory and an extension of the maximum likelihood principle, 267-281, (1973), Budapest
[2] Allignol, A; Beyersmann, J; Schumacher, M, Mvna: an R package for the Nelson-aalen estimator in multistate models, R News, 8, 48-50, (2008)
[3] Andersen, PK; Pohar Perme, M, Inference for outcome probabilities in multi-state models, Lifetime Data Anal, 14, 405-431, (2008) · Zbl 1302.62226
[4] Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer Series in Statistics. Springer, Berlin · Zbl 0769.62061
[5] Beyersmann J, Schumacher M, Allignol A (2012) Competing Risks and Multistate Models with R. Springer Series “UseR!” · Zbl 1304.62002
[6] Bøvelstad, HM; Nygård, S; Størvold, HL; Aldrin, M; Borgan, Ø; Frigessi, A; Lingjærde, OC, Predicting survival from microarray data-a comparative study, Bioinformatics, 23, 2080-2087, (2007)
[7] Bühlmann, P; Hothorn, T, Boosting algorithms: regularization. prediction and model Fitting, Stat Sci, 22, 477-505, (2007) · Zbl 1246.62163
[8] Commenges, D, Multi-state models in epidemiology, Lifetime Data Anal, 5, 315-327, (1999) · Zbl 0941.62117
[9] Cox, DR, Regression models and life-tables, J R Stat Soc Ser B (Methodological), 34, 187-220, (1972) · Zbl 0243.62041
[10] de Wreede LC, Fiocco M, Putter H (2011) mstate: An R package for the analysis of competing risks and multi-state models. J Stat Softw 38(7):1-30
[11] Eilers, PHC; Marx, BD, Flexible smoothing with B-splines and penalties, Stat Sci, 11, 89-121, (1996) · Zbl 0955.62562
[12] Fahrmeir L, Kneib T, Lang S, Marx B (2013) Regression: models, methods and applications. Springer, Berlin · Zbl 1276.62046
[13] Goeman, JJ, L1 penalized estimation in the Cox proportional hazards model, Biom J, 52, 70-84, (2010) · Zbl 1207.62185
[14] Hastie TJ, Tibshirani RJ (1990) Generalized additive models, vol 43. CRC Press, Boca Raton · Zbl 0747.62061
[15] Hofner, B; Hothorn, T; Kneib, T; Schmid, M, A framework for unbiased model selection based on boosting, J Comput Gr Stat, 20, 956-971, (2011)
[16] Hofner, B; Hothorn, T; Kneib, T, Variable selection and model choice in structured survival models, Comput Stat, 28, 1079-1101, (2013) · Zbl 1305.65043
[17] Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2014) mboost: Model-based boosting. R add-on package published online on the Comprehensive R Archive Network, R package version 2.4-0 · Zbl 1231.62071
[18] Kneib, T; Hothorn, T; Tutz, G, Variable selection and model choice in geoadditive regression models, Biometrics, 65, 626-634, (2009) · Zbl 1167.62096
[19] Putter H, Van Houwelingen HC (2011) Frailties in multi-state models: are they identifiable? Do we need them? Stat Methods Med Res. doi:10.1177/0962280211424665
[20] Putter, H; Fiocco, M; Geskus, RB, Tutorial in biostatistics: competing risks and multi-state models, Stat Med, 26, 2389-2430, (2007)
[21] R Development Core Team (2014) R: A language and environment for statistical computing. Software published online on the Comprehensive R Archive Network · Zbl 1302.62226
[22] Reulen H (2014) gamboostMSM: Estimating multistate models using gamboost(). R add-on package published online on the Comprehensive R Archive Network, R package version 1.1.87
[23] Rodríguez-Girondo, M; Kneib, T; Cadarso-Suárez, C; Abu-Assi, E, Model building in nonproportional hazard regression, Stat Med, 32, 5301-5314, (2013)
[24] Schmid, M; Hothorn, T, Boosting additive models using component-wise P-splines, Comput Stat Data Anal, 53, 298-311, (2008) · Zbl 1231.62071
[25] Shao, J, Linear model selection by cross-validation, J Am Stat Assoc, 88, 486-494, (1993) · Zbl 0773.62051
[26] Therneau T (2014) Survival: a package for survival analysis in S. R add-on package published online on the Comprehensive R Archive Network, R package version 2.37-7
[27] Tibshirani, R; etal., The lasso method for variable selection in the Cox model, Stat Med, 16, 385-395, (1997)
[28] Houwelingen, HC; Bruinsma, T; Hart, AAM; van’t Veer, LJ; Wessels, LFA, Cross-validated Cox regression on microarray gene expression data, Stat Med, 25, 3201-3216, (2006)
[29] Verweij, PJM; Houwelingen, HC, Cross-validation in survival analysis, Stat Med, 12, 2305-2314, (1993)
[30] Verweij, PJM; Houwelingen, HC, Penalized likelihood in Cox regression, Stat Med, 13, 2427-2436, (1994)
[31] Wolkewitz, M; Vonberg, R; Grundmann, H; Beyersmann, J; Gastmeier, P; Barwolff, S; Geffers, C; Behnke, M; Ruden, H; Schumacher, M, Risk factors for the development of nosocomial pneumonia and mortality on intensive care units: application of competing risks models, Critical Care, 12, r44, (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.