Random survival forests. (English) Zbl 1149.62331

Summary: We introduce random survival forests, a random forests method for the analysis of right-censored survival data. New survival splitting rules for growing survival trees are introduced, as is a new missing data algorithm for imputing missing data. A conservation-of-events principle for survival forests is introduced and used to define ensemble mortality, a simple interpretable measure of mortality that can be used as a predicted outcome. Several illustrative examples are given, including a case study of the prognostic implications of body mass for individuals with coronary artery disease. Computations for all examples were implemented using the freely available R-software package, randomSurvivalForest.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62N01 Censored data models
Full Text: DOI arXiv


[1] Adams, K. F., Schatzkin, A., Harris, T. B. et al. (2006). Overweight, obesity, and mortality in a large prospective cohort of persons 50 to 71 years old. N. Engl. J. Med. 355 763-778.
[2] Breiman, L. (1996). Bagging predictors. Machine Learning 26 123-140. · Zbl 0858.68080
[3] Breiman, L. (2001). Random forests. Machine Learning 45 5-32. · Zbl 1007.68152
[4] Breiman, L. (2002). Software for the masses. Slides presented at the Wald Lectures, Meeting of the Institute of Mathematical Statistics, Banff, Canada. Available at http://www.stat.berkeley.edu/users/breiman.
[5] Breiman, L. (2003). Manual-setting up, using and understanding random forests V4.0. Available at ftp://ftp.stat.berkeley.edu/pub/users/breiman/Using_random_forests_v4.0.pdf.
[6] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees . Wadsworth, Belmont, California. · Zbl 0541.62042
[7] Cortes, C. and Vapnik, V. N. (1995). Support-vector networks. Machine Learning 20 273-297. · Zbl 0831.68098
[8] Eagle, K. A., Guyton, R. A., Davidoff, R. et al. (2004). ACC/AHA 2004 guideline update for coronary artery bypass graft surgery: Summary article. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee to Update the 1999 Guidelines for Coronary Artery Bypass Graft Surgery). J. Am. Coll. Cardiol. 44 213-310.
[9] Flegal, K. M., Graubard, B. I., Williamson, D. F. and Gail, M. H. (2005). Excess deaths associated with underweight, overweight and obesity. J. Amer. Med. Assoc. 293 1861-1867.
[10] Flegal, K. M., Graubard, B. I., Williamson, D. F. and Gail, M. H. (2007). Cause-specific excess deaths associated with underweight, overweight and obesity. J. Amer. Med. Assoc. 298 2028-2037.
[11] Fontaine, K. R., Redden, D. T., Wang, C., Westfall, A. O. and Allison, D. B. (2003). Years of life lost due to obesity. J. Amer. Med. Assoc. 289 187-193.
[12] Fleming, T. and Harrington, D. (1991). Counting Processes and Survival Analysis . Wiley, New York. · Zbl 0727.62096
[13] Harrell, F., Califf, R., Pryor, D., Lee, K. and Rosati, R. (1982). Evaluating the yield of medical tests. J. Amer. Med. Assoc. 247 2543-2546.
[14] Heagerty, P. J. and Zheng, Y. (2005). Survival model predictive accurracy and ROC curves. Biometrics 61 92-105. · Zbl 1077.62077
[15] Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Comput. Statist. Data Anal. 43 121-137. · Zbl 1429.62542
[16] Hothorn, T., Buhlmann, P., Dudoit, S., Molinaro, A. and van der Laan, M. J. (2006). Survival ensembles. Biostat. 7 355-373. · Zbl 1170.62385
[17] Ishwaran, H. (2007). Variable importance in binary regression trees and forests. Electron. J. Statist. 1 519-537. · Zbl 1320.62158
[18] Ishwaran, H., Blackstone, E. H., Pothier, C. and Lauer, M. S. (2004). Relative risk forests for exercise heart rate recovery as a predictor of mortality. J. Amer. Statist. Assoc. 99 591-600. · Zbl 1117.62362
[19] Ishwaran, H. and Kogalur, U. B. (2007). Random survival forests for R. Rnews 7 25-31.
[20] Ishwaran, H. and Kogalur, U. B. (2008). RandomSurvivalForest 3.2.2. R package. Available at http://cran.r-project.org.
[21] Ishwaran, H., Blackstone, E. H., Apperson, C. A. and Rice, T. W. A novel data-driven approach to stage grouping of esophageal cancer. Cleveland Clinic technical report.
[22] Kalbfleisch, J. and Prentice, R. (1980). The Statistical Analysis of Failure Time Data . Wiley, New York. · Zbl 0504.62096
[23] Kattan, M. (2003). Comparison of Cox regression with other methods for determining prediction models and nomograms. J. Urol. 170 S6-S10.
[24] LeBlanc, M. and Crowley, J. (1992). Relative risk trees for censored survival data. Biometrics 48 411-425.
[25] LeBlanc, M. and Crowley, J. (1993). Survival trees by goodness of split. J. Amer. Statist. Assoc. 88 457-467. · Zbl 0773.62071
[26] Liaw, A. and Wiener, M. (2002). Classification and regression by randomForest. Rnews 2/3 18-22.
[27] Liaw, A. and Wiener, M. (2007). RandomForest 4.5-18. R package. Available at http://cran.r-project.org.
[28] Molinaro, A. M., Dudoit, S. and van der Laan, M. J. (2004). Tree-based multivariate regression and density estimation with right-censored data. J. Multivariate Anal. 90 154-177. · Zbl 1048.62046
[29] Mokdad, A. H, Ford, E. S., Bowman, B. A. et al. (2003). Prevalence of obesity, diabetes and obesity-related health risk factors. J. Amer. Med. Assoc. 289 76-79.
[30] Naftel, D., Blackstone, E. H. and Turner, M. (1985). Conservation of events. Unpublished notes.
[31] Olshansky, S. J., Passaro, D. J., Hershow, R. C. et al. (2005). A potential decline in life expectancy in the United States in the 21st century. N. Engl. J. Med. 352 1138-1145.
[32] Puskas, J. D., Williams, W. H., Mahoney, E. M. et al. (2004). Off-pump vs conventional coronary artery bypass grafting: Early and 1-year graft patency, cost and quality-of-life outcomes: A randomized trial. J. Amer. Med. Assoc. 291 1841-1849.
[33] Rossi, P. H., Berk, R. A. and Lenihan, K. J. (1980). Money , Work and Crime : Some Experimental Results . Academic Press, New York.
[34] Schapire, R., Freund, Y., Bartlett, P. and Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651-1686. · Zbl 0929.62069
[35] Segal, M. R. (1988). Regression trees for censored data. Biometrics 44 35-47. · Zbl 0707.62224
[36] Uretsky, S., Messerli, F. H., Bangalore, S. et al. (2007). Obesity paradox in patients with hypertension and coronary artery disease. Am. J. Med. 120 863-870.
[37] Yusuf, S., Zucker, D., Peduzzi, P. et al. (1994). Effect of coronary artery bypass graft surgery on survival: Overview of 10-year results from randomised trials by the Coronary Artery Bypass Graft Surgery Trialists Collaboration. Lancet 344 563-570.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.