Discrete-time survival trees. (English) Zbl 1170.62074

Summary: Tree-based methods are frequently used in studies with censored survival time. Their structure and ease of interpretability make them useful to identify prognostic factors and to predict conditional survival probabilities given an individual’s covariates. The existing methods are tailor-made to deal with a survival time variable that is measured continuously. However, survival variables measured on a discrete scale are often encountered in practice.
The authors propose a new tree construction method specifically adapted to such discrete-time survival variables. The splitting procedure can be seen as an extension, to the case of right-censored data, of the entropy criterion for a categorical outcomes. The selection of the final tree is made through a pruning algorithm combined with a bootstrap correction. The authors also present a simple way of potentially improving the predictive performance of a single tree through bagging. A simulation study shows that single trees and bagged-trees perform well compared to a parametric model. A real data example investigating the usefulness of personality dimensions in predicting early onset of cigarette smoking is presented.


62N99 Survival analysis and censored data
62N01 Censored data models
65C60 Computational problems in statistics (MSC2010)


ElemStatLearn; R; Ox
Full Text: DOI


[1] Breiman, Bagging predictors, Machine Learning 24 pp 123– (1996) · Zbl 0858.68080
[2] Breiman, Random forests, Machine Learning 45 pp 5– (2001) · Zbl 1007.68152
[3] Breiman, Classification and Regression Trees (1984)
[4] Ciampi, Stratification by stepwise regression, correspondence analysis and recursive partition: A comparison of three methods of analysis for survival data with covariates, Computational Statistics & Data Analysis 4 pp 185– (1986) · Zbl 0649.62106
[5] Cloninger, Neurogenetic adaptive mechanisms in alcoholism, Science x236 pp 410– (1987)
[6] Cox, Regression models and life tables, Journal of the Royal Statistical Society B 34 pp 187– (1972) · Zbl 0243.62041
[7] Davis, Exponential survival trees, Statistics in Medicine 8 pp 947– (1989)
[8] Doornik, Object-oriented Matrix Programming Using Ox (2002)
[9] Fan, Trees for correlated survival data by goodness of split, with applications to tooth prognosis, Journal of the American Statistical Association 101 pp 959– (2006) · Zbl 1120.62328
[10] Gao, Identification of prognostic factors with multivariate survival data, Computational Statistics & Data Analysis 45 pp 813– (2004) · Zbl 1429.62525
[11] Gordon, Tree-structured survival analysis, Cancer Treatment Reports 69 pp 1065– (1985)
[12] Hamza, An empirical comparison of ensemble methods based on classification trees, Journal of Statistical Computation and Simulation 75 pp 629– (2005) · Zbl 1075.62051
[13] Hastie, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2001) · Zbl 0973.62007
[14] Hothorn, Bagging survival trees, Statistics in Medicine 23 pp 77– (2004)
[15] Hothorn, Survival ensembles, Biostatistics 7 pp 355– (2006) · Zbl 1170.62385
[16] Jin, Alternative tree-structured survival analysis based on variance of survival time, Medical Decision Making 24 pp 670– (2004)
[17] Keles, Residual-based tree-structured survival analysis, Statistics in Medicine 21 pp 313– (2002)
[18] Ishwaran, Random survival forests, Annals of Applied Statistics 2 pp 841– (2008) · Zbl 1149.62331
[19] LeBlanc, Relative risk trees for censored survival data, Biometrics 48 pp 411– (1992)
[20] LeBlanc, Survival trees by goodness of split, Journal of the American Statistical Association 88 pp 457– (1993) · Zbl 0773.62071
[21] Mâsse, Behavior of boys in kindergarten and the onset of substance use during adolescence, Archives of General Psychiatry 54 pp 62– (1997)
[22] Molinaro, Tree-based multivariate regression and density estimation with right-censored data, Journal of Multivariate Analysis 90 pp 154– (2004) · Zbl 1048.62046
[23] Morgan, Problems in the analysis of survey data and a proposal, Journal of the American Statistical Association 58 pp 415– (1963) · Zbl 0114.10103
[24] Radespiel-Tröger, Comparison of tree-based methods for prognostic stratification of survival data, Artificial Intelligence in Medicine 28 pp 323– (2003)
[25] Radespiel-Tröger, Association between split selection instability and predictive error in survival trees, Methods of Information in Medicine 45 pp 548– (2006)
[26] R Development Team (2007). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing: http://www.R-project.org.
[27] Segal, Regression trees for censored data, Biometrics 44 pp 35– (1988) · Zbl 0707.62224
[28] Segal, Tree-structured methods for longitudinal data, Journal of the American Statistical Association 87 pp 407– (1992)
[29] Singer, It’s about time: Using discrete-time survival analysis to study duration and the timing of events, Journal of Educational Statistics 18 pp 155– (1993)
[30] Singer, Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence (2003)
[31] Su, Multivariate survival trees: A maximum likelihood approach based on frailty models, Biometrics 60 pp 93– (2004) · Zbl 1130.62386
[32] Su, Tree-augmented Cox proportional hazards models, Biostatistics 6 pp 486– (2005) · Zbl 1071.62111
[33] Su, Maximum likelihood regression trees, Journal of Computational and Graphical Statistics 13 pp 586– (2004)
[34] Therneau, Martingale-based residuals for survival models, Biometrika 77 pp 147– (1990) · Zbl 0692.62082
[35] Zhang, Classification trees for multiple binary responses, Journal of the American Statistical Association 93 pp 180– (1998) · Zbl 0906.62130
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.