×

Tree-based multivariate regression and density estimation with right-censored data. (English) Zbl 1048.62046

Summary: We propose a unified strategy for estimator construction, selection, and performance assessment in the presence of censoring. This approach is entirely driven by the choice of a loss function for the full (uncensored) data structure and can be stated in terms of the following three main steps. (1) First, define the parameter of interest as the minimizer of the expected loss, or risk, for a full data loss function chosen to represent the desired measure of performance. Map the full data loss function into an observed (censored) data loss function having the same expected value and leading to an efficient estimator of this risk. (2) Next, construct candidate estimators based on the loss function for the observed data. (3) Then, apply cross-validation to estimate risk based on the observed data loss function and to select an optimal estimator among the candidates.
A number of common estimation procedures follow this approach in the full data situation, but depart from it when faced with the obstacle of evaluating the loss function for censored observations. Here, we argue that one can, and should, also adhere to this estimation road map in censored data situations. Tree-based methods, where the candidate estimators in Step 2 are generated by recursive binary partitioning of a suitably defined covariate space, provide a striking example of the chasm between estimation procedures for full data and censored data (e.g., regression trees as in CART for uncensored data and adaptations to censored data). Common approaches for regression trees bypass the risk estimation problem for censored outcomes by altering the node splitting and tree pruning criteria in manners that are specific to right-censored data.
This article describes an application of our unified methodology to tree-based estimation with censored data. The approach encompasses univariate outcome prediction, multivariate outcome prediction, and density estimation, simply by defining a suitable loss function for each of these problems. The proposed method for tree-based estimation with censoring is evaluated using a simulation study and the analysis of CGH copy number and survival data from breast cancer patients.

MSC:

62G08 Nonparametric regression and quantile regression
62H99 Multivariate analysis
62N02 Estimation in survival analysis and censored data
62G07 Density estimation
62N01 Censored data models
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

LogicReg; R; rpart; COMODE
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] L. Breiman, Software for the masses, in: Wald Lectures, Meeting of the Institute of Mathematical Statistics, Banff, Canada, 2002, URL: .; L. Breiman, Software for the masses, in: Wald Lectures, Meeting of the Institute of Mathematical Statistics, Banff, Canada, 2002, URL: . · Zbl 0207.48601
[2] L. Breiman, How to Use Survival Forests, Department of Statistics, University of California, Berkeley, 2003, URL: .; L. Breiman, How to Use Survival Forests, Department of Statistics, University of California, Berkeley, 2003, URL: .
[3] Breiman, L.; Friedman, J. H., Predicting multivariate responses in multiple linear regression, J. Roy. Statist. Soc. Ser. B, 59, 1, 3-54 (1997) · Zbl 0897.62068
[4] L. Breiman, J.H. Friedman, R. Olshen, C.J. Stone, Classification and Regression Trees, The Wadsworth Statistics/Probability Series, Wadsworth International Group, Belmont, CA, 1984.; L. Breiman, J.H. Friedman, R. Olshen, C.J. Stone, Classification and Regression Trees, The Wadsworth Statistics/Probability Series, Wadsworth International Group, Belmont, CA, 1984.
[5] Ciampi, A.; Thiffault, J.; Nakache, J. P.; Asselain, B., Stratification by stepwise regression, correspondence analysis and recursive partition, Comput. Statist. Data Anal., 4, 185-204 (1986) · Zbl 0649.62106
[6] Davis, R.; Anderson, J., Exponential survival trees, Statist. Med., 8, 947-961 (1989)
[7] S. Dudoit, M.J. van der Laan, Asymptotics of cross-validated risk estimation in model selection and performance assessment, Technical report 126, Division of Biostatistics, University of California, Berkeley, 2003, URL: .; S. Dudoit, M.J. van der Laan, Asymptotics of cross-validated risk estimation in model selection and performance assessment, Technical report 126, Division of Biostatistics, University of California, Berkeley, 2003, URL: . · Zbl 1248.62004
[8] S. Dudoit, M.J. van der Laan, S. Keleş, A.M. Molinaro, S.E. Sinisi, S.L. Teng, Loss-based estimation with cross-validation: applications to microarray data analysis, SIGKDD Explor. Microarray Data Min Special Issue, 2004, to appear. URL: www.bepress.com/ucbbiostat/paper137.; S. Dudoit, M.J. van der Laan, S. Keleş, A.M. Molinaro, S.E. Sinisi, S.L. Teng, Loss-based estimation with cross-validation: applications to microarray data analysis, SIGKDD Explor. Microarray Data Min Special Issue, 2004, to appear. URL: www.bepress.com/ucbbiostat/paper137.
[9] R.D. Gill, M.J. van der Laan, J.R. Robins, Coarsening at random: characterizations, conjectures and counter-examples, in: D.Y. Lin, T.R. Fleming (Eds.), Proceedings of the First Seattle Symposium in Biostatistics, 1995, Springer Lecture Notes in Statistics, Springer, Berlin, 1997, pp. 255-294.; R.D. Gill, M.J. van der Laan, J.R. Robins, Coarsening at random: characterizations, conjectures and counter-examples, in: D.Y. Lin, T.R. Fleming (Eds.), Proceedings of the First Seattle Symposium in Biostatistics, 1995, Springer Lecture Notes in Statistics, Springer, Berlin, 1997, pp. 255-294. · Zbl 0918.62003
[10] Gordon, L.; Olshen, R., Tree-structured survival analysis, Cancer Treatment Rep., 69, 1062-1069 (1985)
[11] Graf, E.; Schmoor, C.; Sauerbrei, W.; Schumacher, M., Assessment and comparison of prognostic classification schemes for survival data, Statist. Med., 18, 2529-2545 (1999)
[12] Hothorn, T.; Lausen, B.; Benner, A.; Radespiel-Tröger, M., Bagging survival trees, Statist. Med., 23, 1, 77-91 (2004)
[13] Ihaka, R.; Gentleman, R. C., Ra language for data analysis and graphics, J. Comput. Graphical Statist., 5, 299-314 (1996)
[14] S. Keleş, M.J. van der Laan, S. Dudoit, Asymptotically optimal model selection method for regression on censored outcomes, Bernoulli, 2004, to appear. URL: .; S. Keleş, M.J. van der Laan, S. Dudoit, Asymptotically optimal model selection method for regression on censored outcomes, Bernoulli, 2004, to appear. URL: .
[15] S. Keleş, M.J. van der Laan, S. Dudoit, M.B. Eisen, B. Xing, Supervised detection of regulatory motifs in DNA sequences, Statist. Appl. Genetics Mol. Bio. 2(1) (2003b) Article 5. URL: .; S. Keleş, M.J. van der Laan, S. Dudoit, M.B. Eisen, B. Xing, Supervised detection of regulatory motifs in DNA sequences, Statist. Appl. Genetics Mol. Bio. 2(1) (2003b) Article 5. URL: . · Zbl 1038.92017
[16] LeBlanc, M.; Crowley, J., Relative risk trees for censored survival data, Biometrics, 48, 411-425 (1992)
[17] A.M. Molinaro, S. Dudoit, M.J. van der Laan, Tree-based multivariate regression and density estimation with right-censored data, Technical report 135, Division of Biostatistics, University of California, Berkeley, 2003. URL: .; A.M. Molinaro, S. Dudoit, M.J. van der Laan, Tree-based multivariate regression and density estimation with right-censored data, Technical report 135, Division of Biostatistics, University of California, Berkeley, 2003. URL: . · Zbl 1048.62046
[18] A.M. Molinaro, M.J. van der Laan, A Deletion/Substitution/Addition algorithm for partitioning the covariate space in prediction, Technical report, Division of Biostatistics, University of California, Berkeley, 2004, in preparation.; A.M. Molinaro, M.J. van der Laan, A Deletion/Substitution/Addition algorithm for partitioning the covariate space in prediction, Technical report, Division of Biostatistics, University of California, Berkeley, 2004, in preparation.
[19] Morgan, M.; Sonquist, J. A., Problems in the analysis of survey data and a proposal, J. Amer. Statist. Assoc., 58, 415-434 (1963) · Zbl 0114.10103
[20] J. Robins, A. Rotnitzky, Recovery of information and adjustment for dependent censoring using surrogate markers, chapter AIDS Epidemiology, Methodological issues, Bikhauser, Basel, 1992.; J. Robins, A. Rotnitzky, Recovery of information and adjustment for dependent censoring using surrogate markers, chapter AIDS Epidemiology, Methodological issues, Bikhauser, Basel, 1992.
[21] Ruczinski, I.; Kooperberg, C.; LeBlanc, M., Logic regression, J. Comput. Graphical Statist., 12, 3, 474-511 (2003), URL:
[22] Segal, M., Regression trees for censored data, Biometrics, 44, 35-48 (1988) · Zbl 0707.62224
[23] S.E. Sinisi, M.J. van der Laan, Loss-based cross-validated Deletion/Substitution/Addition algorithms in estimation, Technical report 143, Division of Biostatistics, University of California, Berkeley, 2004, URL: www.bepress.com/ucbbiostat/paper143.; S.E. Sinisi, M.J. van der Laan, Loss-based cross-validated Deletion/Substitution/Addition algorithms in estimation, Technical report 143, Division of Biostatistics, University of California, Berkeley, 2004, URL: www.bepress.com/ucbbiostat/paper143.
[24] T. Therneau, E. Atkinson, An introduction to recursive partitioning using the rpart routine, Technical report 61, Section of Biostatistics, Mayo Clinic, Rochester, 1997.; T. Therneau, E. Atkinson, An introduction to recursive partitioning using the rpart routine, Technical report 61, Section of Biostatistics, Mayo Clinic, Rochester, 1997.
[25] M.J. van der Laan, S. Dudoit, Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive \(ε\)-net estimator: finite sample oracle inequalities and examples, Technical report 130, Division of Biostatistics, University of California, Berkeley, 2003, URL: .; M.J. van der Laan, S. Dudoit, Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive \(ε\)-net estimator: finite sample oracle inequalities and examples, Technical report 130, Division of Biostatistics, University of California, Berkeley, 2003, URL: .
[26] M.J. van der Laan, S. Dudoit, S. Keleş, Asymptotic optimality of likelihood-based cross-validation, Statis. Appl. Genetics Mol. Bio. 3(1) (2004) Article 4. URL: .; M.J. van der Laan, S. Dudoit, S. Keleş, Asymptotic optimality of likelihood-based cross-validation, Statis. Appl. Genetics Mol. Bio. 3(1) (2004) Article 4. URL: .
[27] M.J. van der Laan, S. Dudoit, A.W. van de Vaart, The cross-validated adaptive \(ε\)-net estimator, Technical report 142, Division of Biostatistics, University of California, Berkeley, 2004. URL: www.bepress.com/ucbbiostat/paper142.; M.J. van der Laan, S. Dudoit, A.W. van de Vaart, The cross-validated adaptive \(ε\)-net estimator, Technical report 142, Division of Biostatistics, University of California, Berkeley, 2004. URL: www.bepress.com/ucbbiostat/paper142. · Zbl 1111.62003
[28] van der Laan, M. J.; Robins, J. M., Unified Methods for Censored Longitudinal Data and Causality (2003), Springer: Springer New York · Zbl 1013.62034
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.