×

An integrative pathway-based clinical-genomic model for cancer survival prediction. (English) Zbl 1198.62158

Summary: Prediction models that use gene expression levels are now being proposed for personalized treatment of cancer, but building accurate models that are easy to interpret remains a challenge. We describe an integrative clinical-genomic approach that combines both genomic pathway and clinical information. First, we summarize information from genes in each pathway using Supervised Principal Components (SPCA) to obtain pathway-based genomic predictors. Next, we build a prediction model based on clinical variables and pathway-based genomic predictors using Random Survival Forests (RSF). Our rationale for this two-stage procedure is that the underlying disease process may be influenced by environmental exposure (measured by clinical variables) and perturbations in different pathways (measured by pathway-based genomic variables), as well as their interactions. Using two cancer microarray datasets, we show that the pathway-based clinical-genomic model outperforms gene-based clinical-genomic models, with improved prediction accuracy and interpretability.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
92C50 Medical applications (general)
62H25 Factor analysis and principal components; correspondence analysis
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Alon, U.; Barkai, N.; Notterman, D. A.; Gish, K.; Ybarra, S.; Mack, D.; Levine, A. J., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Sci. USA, 96, 6745-6750 (1999)
[2] Alter, O.; Brown, P. O.; Botstein, D., Singular value decomposition for genome-wide expression data processing and modeling, Proc. Natl. Acad. Sci. USA, 97, 10101-10106 (2000)
[3] Ashburner, M.; Ball, C. A.; Blake, J. A.; Botstein, D.; Butler, H.; Cherry, J. M.; Davis, A. P.; Dolinski, K.; Dwight, S. S., Gene ontology: tool for the unification of biology, Gene Ontology Consortium. Nat. Genet., 25, 25-29 (2000)
[4] Bair, E.; Tibshirani, R., Semi-supervised methods to predict patient survival from gene expression data, PLoS Biol., 2, 511-522 (2004)
[5] Bair, E.; Hastie, T.; Paul, D.; Tibshirani, R., Prediction by supervised principal components, J. Amer. Statist. Assoc., 101, 119-137 (2006) · Zbl 1118.62326
[6] Beer, D. G.; Kardia, S. L.; Huang, C. C.; Giordano, T. J.; Levin, A. M.; Misek, D. E.; Lin, L.; Chen, G.; Gharib, T. G., Gene-expression profiles predict survival of patients with lung adenocarcinoma, Nat. Med., 8, 816-824 (2002)
[7] Bild, A. H.; Potti, A.; Nevins, J. R., Linking oncogenic pathways with therapeutic opportunities, Nat. Rev. Cancer, 6, 735-741 (2006)
[8] Binder, H.; Schumacher, M., Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models, BMC Bioinform., 9, 14 (2008)
[9] Breiman, L., Random forests, Mach. Learn., 45, 5-32 (2001) · Zbl 1007.68152
[10] Chen, X.; Wang, L.; Smith, J. D.; Zhang, B., Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes, Bioinformatics, 24, 2474-2481 (2008)
[11] Chen, X.; Wang, L., Integrating biological knowledge with gene expression profiles for survival prediction of cancer, J. Comput. Biol., 16, 265-278 (2009)
[12] Datta, S.; Datta, S., Comparisons and validation of statistical clustering techniques for microarray gene expression data, Bioinformatics, 19, 459-466 (2003)
[13] Datta, S.; Le-Rademacher, J.; Datta, S., Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO, Biometrics, 63, 259-271 (2007)
[14] Ein-Dor, L.; Kela, I.; Getz, G.; Givol, D.; Domany, E., Outcome signature genes in breast cancer: is there a unique set?, Bioinformatics, 21, 171-178 (2005)
[15] Gagliardi, A.; Collins, D. C., Inhibition of angiogenesis by antiestrogens, Cancer Res., 53, 533-535 (1993)
[16] Gupta, R. A.; Dubois, R. N., Colorectal cancer prevention and treatment by inhibition of cyclooxygenase-2, Nat. Rev. Cancer, 1, 11-21 (2001)
[17] Harrell, F. E.; Califf, R. M.; Pryor, D. B.; Lee, K. L.; Rosati, R. A., Evaluating the yield of medical tests, J. Am. Med. Assoc., 247, 2543-2546 (1982)
[18] Hastie, T.; Tibshirani, R., Efficient quadratic regularization for expression arrays, Biostatistics, 5, 329-340 (2004) · Zbl 1154.62393
[19] Ishwaran, H.; Kogalur, U. B.; Blackstone, E. H.; Lauer, M. S., Random survival forests, Ann. Appl. Stat., 2, 841-860 (2008) · Zbl 1149.62331
[20] Ishwaran, H.; Kogalur, U. B., Random survival forests for R, Rnews, 7/2, 25-31 (2007)
[21] Ishwaran, H., Variable importance in binary regression trees and forests, Electron. J. Stat., 1, 519-537 (2007) · Zbl 1320.62158
[22] Kanehisa, M.; Goto, S.; Kawashima, S.; Nakaya, A., The KEGG databases at GenomeNet, Nucleic Acids Res., 30, 42-46 (2002)
[23] Lee, E.; Chuang, H. Y.; Kim, J. W.; Ideker, T.; Lee, D., Inferring pathway activity toward precise disease classification, PLoS Comput. Biol., 4, 11, e1000217 (2008)
[24] Loi, S.; Haibe-Kains, B.; Desmedt, C.; Wirapati, P.; Lallemand, F.; Tutt, A. M.; Gillet, C.; Ellis, P.; Ryder, K., Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen, BMC Genom., 9, 239 (2008)
[25] Mandlekar, S.; Hebbar, V.; Christov, K.; Kong, A. N.T., Pharmacodynamics of tamoxifen and its 4-hydroxy and N-desmethyl metabolites: activation of caspases and induction of apoptosis in rat mammary tumors and in human breast cancer cell lines, Cancer Res., 60, 6601-6606 (2000)
[26] Mandlekar, S.; Kong, A. N.T., Mechanisms of tamoxifen-induced apoptosis, Apoptosis, 6, 469-477 (2001)
[27] Manoli, T.; Gretz, N.; Grone, H. J.; Kenzelmann, M.; Eils, R.; Brors, B., Group testing for pathway analysis improves comparability of different microarray datasets, Bioinformatics, 22, 2500-2506 (2006)
[28] Miller, L. D.; Smeds, J.; George, J.; Vega, V. B.; Vergara, L.; Ploner, A.; Pawitan, Y.; Hall, P.; Klaar, S.; Liu, E. T.; Bergh, J., An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival, Proc. Natl. Acad. Sci. USA, 102, 13550-13555 (2005)
[29] Mootha, V. K.; Lindgren, C. M.; Eriksson, K. F.; Subramanian, A.; Sihag, S.; Lehar, J.; Puigserver, P.; Carlsson, E.; Ridderstrale, M., PGC-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes, Nat. Genet., 34, 267-273 (2003)
[30] Nguyen, D. V.; Rocke, D. M., Partial least squares proportional hazard regression for application to DNA microarray survival data, Bioinformatics, 18, 1625-1632 (2002)
[31] Obrero, M.; Yu, D. V.; Shapiro, D. J., Estrogen receptor-dependent and estrogen receptor-independent pathways for tamoxifen and 4-hydroxytamoxifen-induced programmed cell death, J. Biol. Chem., 277, 45695-45703 (2002)
[32] Ogawa, Y.; Murata, Y.; Nishioka, A.; Inomata, T.; Yoshida, S., Tamoxifen-induced fatty liver in patients with breast cancer, Lancet, 351 (1998), 725-725
[33] Pandolfi, P. P., Breast cancer-loss of PTEN predicts resistance to treatment, N. Engl. J. Med., 351, 2337-2338 (2004)
[34] Park, M. Y.; Hastie, T., L-1-regularization path algorithm for generalized linear models, J. Roy. Stat. Soc. B, 69, 659-677 (2007) · Zbl 07555370
[35] Perou, C. M.; Sorlie, T.; Eisen, M. B.; van de Rijn, M.; Jeffrey, S. S.; Rees, C. A.; Pollack, J. R.; Ross, D. T.; Johnsen, H., Molecular portraits of human breast tumours, Nature, 406, 747-752 (2000)
[36] Ryden, L.; Stendahl, M.; Jonsson, H.; Emdin, S.; Bengtsson, N.; Landberg, G., Tumor-specific VEGF-A and VEGFR2 in postmenopausal breast cancer patients with long-term follow-up. Implication of a link between VEGF pathway and tamoxifen response, Breast Cancer Res. TR, 89, 135-143 (2005)
[37] Segal, M. R., Microarray gene expression data with linked survival phenotypes: diffuse large-B-cell lymphoma revisited, Biostatistics, 7, 268-285 (2006) · Zbl 1169.62388
[38] Tibshirani, R.; Walther, G.; Hastie, T., Estimating the number of clusters in a data set via the gap statistic, J. Roy. Stat. Soc. B, 63, 411-423 (2001) · Zbl 0979.62046
[39] Tomfohr, J.; Lu, J.; Kepler, T. B., Pathway level analysis of gene expression using singular value decomposition, BMC Bioinform., 6, 225 (2005)
[40] Wang, D.; Dubois, R. N., Prostaglandins and cancer, Gut, 55, 115-122 (2006)
[41] Wang, L.; Zhang, B.; Wolfinger, R. D.; Chen, X., An integrated approach for the analysis of biological pathways using mixed models, PLoS Genet., 4, e1000115 (2008)
[42] Wood, L. D.; Parsons, D. W.; Jones, S.; Lin, J.; Sjoblom, T.; Leary, R. J.; Shen, D.; Boca, S. M.; Barber, T., The genomic landscapes of human breast and colorectal cancers, Science, 318, 1108-1113 (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.