zbMATH — the first resource for mathematics

Rationale and applications of survival tree and survival ensemble methods. (English) Zbl 1323.62127
Summary: Classification and Regression Trees (CART), and their successors – bagging and random forests, are statistical learning tools that are receiving increasing attention. However, due to characteristics of censored data collection, standard CART algorithms are not immediately transferable to the context of survival analysis. Questions about the occurrence and timing of events arise throughout psychological and behavioral sciences, especially in longitudinal studies. The prediction power and other key features of tree-based methods are promising in studies where an event occurrence is the outcome of interest. This article reviews existing tree algorithms designed specifically for censored responses as well as recently developed survival ensemble methods, and introduces available computer software. Through simulations and a practical example, merits and limitations of these methods are discussed. Suggestions are provided for practical use.
62P15 Applications of statistics to psychology
62P10 Applications of statistics to biology and medical sciences; meta analysis
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI
[1] Berk, R. A. (2008). Statistical learning from a regression perspective. New York, NY: Springer. · Zbl 1258.62047
[2] Breiman, L, Bagging predictors, Machine Learning, 24, 123-140, (1996) · Zbl 0858.68080
[3] Breiman, L, Random forests, Machine Learning, 45, 5-32, (2001) · Zbl 1007.68152
[4] Breiman, L. (2002). Software for the masses. Department of Statistics, University of California, Berkeley. Retrieved from http://www.stat.berkeley.edu/ breiman/wald2002-3.pdf. Accessed 1 July 2014. · Zbl 0773.62071
[5] Breiman, L. (2003a). How to use survival forests. Department of Statistics, University of California, Berkeley. Retrieved from http://www.stat.berkeley.edu/ breiman/SF_Manual.pdf. Accessed 1 July 2014.
[6] Breiman, L. (2003b). Manual—setting up, using and understanding random forests V4.0. Retrieved from http://www.stat.berkeley.edu/ breiman/Using_random_forests_v4.0.pdf. Accessed 1 July 2014.
[7] Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. J. (1984). Classification and regression trees. New York, NY: Chapman & Hall. · Zbl 0541.62042
[8] Butler, J., Gilpin, E., Gordon, L., & Olshen, R. (1989). Tree-structured survival analysis. II. Technical report, Department of Biostatistics, Stanford University.
[9] Ciampi, A; Thiffault, J; Nakache, JP; Asselain, B, Stratification by stepwise regression, correspondence analysis and recursive partition: A comparison of three methods of analysis for survival data with covariates, Computational Statistics & Data Analysis, 4, 185-204, (1986) · Zbl 0649.62106
[10] Cox, D. R. (1972). Regression models and life tables. Journal of the Royal Statistical Society Series B, 34(2), 187-220. · Zbl 0243.62041
[11] Cox, D. R., & Oakes, D. (1984). Analysis of survival data. London: Chapman & Hall. · Zbl 1007.68152
[12] Davis, R; Anderson, J, Exponential survival trees, Statistics in Medicine, 8, 947-961, (1989)
[13] DeWit, D. J., Adlaf, E. M., Offord, D. R., & Ogborne, A. C. (2000). Age at first alcohol use: A risk factor for the development of alcohol disorders. American Journal of Psychiatry, 157(5), 745-750.
[14] Gordon, L; Olshen, RA, Tree-structured survival analysis, Cancer Treatment Reports, 69, 1065-1069, (1985)
[15] Graf, E; Schmoor, C; Sauerbrei, W; Schumacher, M, Assessment and comparison of prognostic classification schemes for survival data, Statistics in Medicine, 18, 2529-2545, (1999)
[16] Harrell, F; Califf, R; Pryor, D; Lee, K; Rosati, R, Evaluating the yield of medical tests, Journal of the American Medical Association, 247, 2543-2546, (1982)
[17] Henning, K. R., & Frueh, B. C. (1996). Cognitive-behavioral treatment of incarcerated offenders: An evaluation of the Vermont Department of Corrections’ cognitive self-change program. Criminal Justice and Behavior, 23, 523-541.
[18] Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., & van der Laan, M. J. (2006a). Survival ensembles. Biostatistics, \(7\)(3), 355-373. · Zbl 1170.62385
[19] Hothorn, T., Hornik, K., Strobl, C., & Zeileis, A. (2010). Package ‘party’: A laboratory for recursive part(y)itioning (R package Version 0.9-9997) [Computer software]. Retrieved from http://cran.r-project.org/web/packages/party/index.html. Accessed 15 Oct 2010.
[20] Hothorn, T; Hornik, K; Zeileis, A, Unbiased recursive partitioning: A conditional inference framework, Journal of Computational and Graphical Statistics, 15, 651-674, (2006)
[21] Hothorn, T; Lausen, B; Benner, A; Radespiel-Tröger, M, Bagging survival trees, Statistics in Medicine, 23, 77-91, (2004)
[22] Hothorn, T., & Zeileis, A. (2012). Package ‘partykit’: A Toolkit for Recursive Partytioning (R package Version 0.1-6) [Computer software]. Retrieved from http://cran.r-project.org/web/packages/partykit/index.html. Accessed 3 Sept 2013. · Zbl 0649.62106
[23] Intrator, O., & Kooperberg, C. (1995). Trees and splines in survival analysis. Statistical Methods in Medical Research, 4(3), 237-261.
[24] Ishwaran, H., & Kogalur, U. B. (2010). Package ‘randomSurvivalForest’: Random survival forest. (R package Version 3.6.3) [Computer Software]. Retrieved from http://cran.r-project.org/web/packages/randomSurvivalForest/index.html. Accessed 15 Oct 2010. · Zbl 0773.62071
[25] Ishwaran, H., Kogalur, U. B., Blackstone, E. H., & Lauer, M. S. (2008). Random survival forests. The Annals of Applied Statistics, \(2\)(3), 841-860. · Zbl 1149.62331
[26] Keleş, S., & Segal, M. R. (2002). Residual-based tree structured survival analysis. Statistics in Medicine, 21, 313-326. · Zbl 0308.62063
[27] LeBlanc, M; Crowley, J, Relative risk trees for censored survival data, Biometrics, 48, 411-425, (1992)
[28] LeBlanc, M; Crowley, J, Survival trees by goodness of split, Journal of the American Statistical Association, 88, 457-467, (1993) · Zbl 0773.62071
[29] Mantel, N, Evaluation of survival data and two new rank order statistics arising in its consideration, Cancer Chemotherapy Reports, 50, 163-170, (1966)
[30] Mertens, J. R., Kline-Simon, A. H., Delucchi, K. L., Moore, C., & Weisner, C. M. (2012). Ten-year stability of remission in private alcohol and drug outpatient treatment: Non-problem users versus abstainers. Drug and Alcohol Dependence, 125(1), 67-74.
[31] McArdle, J. J. (2011). Exploratory data mining using CART in the behavioral sciences. In H. Cooper, P. Camic, D. Long, A. T. Panter, D. Rindskopf, & K. Sher (Eds.), APA handbook of research methods in psychology. Washington, DC: The American Psychological Association. · Zbl 0858.68080
[32] Molinaro, A. M., Dudoit, S., & van der Laan, M. J. (2004). Tree-based multivariate regression and density estimation with right-censored data. Journal of Multivariate Analysis, 90, 154-177. · Zbl 1048.62046
[33] Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association, 58, 415-434. · Zbl 0114.10103
[34] Morita, J. G., Lee, T. W., & Mowday, R. T. (1993). The regression-analog to survival analysis: A selected application to turnover research. Academy of Management Journal, 36(6), 1430-1464.
[35] Peters, A., Hothorn, T., Ripley, B. D., Therneau, T., & Atkinson, B. (2009). Package ‘ipred’: Improved Predictors. (R package Version 0.9-3) [Computer Software]. Retrieved from http://cran.r-project.org/web/packages/ipred/index.html. Accessed 1 July 2014.
[36] Peto, R; Peto, J, Asymptotically efficient rank invariant test procedures, Journal of the Royal Statistical Society Series A, 135, 185-207, (1972)
[37] Schemper, M; Stare, J, Explained variation in survival analysis, Statistics in Medicine, 15, 1999-2012, (1996)
[38] Segal, M. R. (1988). Regression trees for censored data. Biometrics, 44, 35-47. · Zbl 0707.62224
[39] Schapire, R. E. (1999). A brief introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI 99) (pp. 1401-1405). · Zbl 0858.68080
[40] Singer, J. D., & Willett, J. B. (1991). Modeling the days of our lives: Using survival analysis when designing and analyzing longitudinal studies of duration and the timing of events. Psychological Bulletin, 110(2), 268.
[41] Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis. New York, NY: Oxford.
[42] Stone, M, Choice and assessment of statistical predictions, Journal of the Royal Statistical Society Series B, 36, 111-133, (1974) · Zbl 0308.62063
[43] Strobl, C; Malley, J; Tutz, G, An introduction to recursive partitioning: rational, application, and characteristics of classification and regression trees, bagging, and random forests, Psychological Methods, 14, 323-348, (2009)
[44] Therneau, T. M., & Atkinson, B. (2010). Package ‘rpart’: Recursive partitioning (R package Version 3.1-48) [Computer software]. Retrieved from http://cran.r-project.org/web/packages/rpart/index.html. Accessed 15 Oct 2010. · Zbl 0308.62063
[45] Therneau, T. M., Grambsch, P. M., & Fleming, T. R. (1990). Martingale-based residuals for survival models. Biometrika, 77(1), 147-160. · Zbl 0692.62082
[46] Zhang, H. P., & Singer, B. (1999). Recursive partitioning in the health sciences. New York, NY: Springer. · Zbl 0920.62135
[47] Zhou, Y., Kadlec, K. M., & McArdle, J. J. (2014). Predicting mortality from demographics and specific cognitive abilities in the Hawaii Family Study of Cognition. In J. J. McArdle & G. Ritschard (Eds.), Contemporary issues in exploratory data mining (pp. 429-449). New York, NY: Routledge.
[48] Zosuls, K. M., Ruble, D. N., Tamis-LeMonda, C. S., Shrout, P. E., Bornstein, M. H., & Greulich, F. K. (2009). The acquisition of gender labels in infancy: Implications for gender-typed play. Developmental Psychology, 45(3), 688.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.