×

Ensemble survival tree models to reveal pairwise interactions of variables with time-to-events outcomes in low-dimensional setting. (English) Zbl 1398.92010

Summary: Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established concepts from random survival forest (RSF) models. We introduce a novel RSF-based pairwise interaction estimator and derive a randomization method with bootstrap confidence intervals for inferring interaction significance. Using various linear and nonlinear time-to-events survival models in simulation studies, we first show the efficiency of our approach: true pairwise interaction-effects between variables are uncovered, while they may not be accompanied with their corresponding main-effects, and may not be detected by standard semi-parametric regression modeling and test statistics used in survival analysis. Moreover, using a RSF-based cross-validation scheme for generating prediction estimators, we show that informative predictors may be inferred. We applied our approach to an HIV cohort study recording key host gene polymorphisms and their association with HIV change of tropism or AIDS progression. Altogether, this shows how linear or nonlinear pairwise statistical interactions of variables may be efficiently detected with a predictive value in observational studies with time-to-event outcomes.

MSC:

92B15 General biostatistics
62P10 Applications of statistics to biology and medical sciences; meta analysis
62N02 Estimation in survival analysis and censored data
62N03 Testing in survival analysis and censored data
62J05 Linear regression; mixed models
92D10 Genetics and epigenetics
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Bien, J., J. Taylor and R. Tibshirani (2013): “A lasso for hierarchical interactions,” Ann. Stat., 41, 1111-1141. · Zbl 1292.62109
[2] Breiman, L. (2001): “Random forests,” Mach. Learn., 45, 5-32. · Zbl 1007.68152
[3] Cantor, R. M., K. Lange and J. S. Sinsheimer (2010): “Prioritizing GWAS results: a review of statistical methods and recommendations for their application,” Am. J. Hum. Genet., 86, 6-22.
[4] Chen, W., D. Ghosh, T. E. Raghunathan, M. Norkin, D. J. Sargent and G. Bepler (2012): “On Bayesian methods of exploring qualitative interactions for targeted treatment,” Stat. Med., 31, 3693-3707.
[5] Chen, X. and H. Ishwaran (2012): “Random forests for genomic data analysis,” Genomics, 99, 323-329.
[6] Chipman, H. A., E. I. George and R. E. McCulloch (1998): “Bayesian cart model search,” J. Am. Stat. Assoc., 93, 935-948.
[7] Cordell, H. J. (2009): “Detecting gene-gene interactions that underlie human diseases,” Nat. Rev. Genet., 10, 392-404.
[8] Cox, D. R. (1972): “Regression models and life-tables,” J. R. Stat. Soc. Ser. B, 34, 187-220. · Zbl 0243.62041
[9] Cutler, A. and G. Zhao (2001): “Pert-perfect random tree ensembles,” Comput. Sci. Stat., 33, 490-497.
[10] Dempster, A. P., N. M. Laird and D. B. Rubin (1977): “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc. Ser. B Stat. Methodol., 39, 1-38. · Zbl 0364.62022
[11] Efron, B. and R. Tibshirani (1993): An introduction to the bootstrap, ed. Hall, C. a., London: CRC Press. · Zbl 0835.62038
[12] Ehrlinger, J. 2014. “Contributed R package: ggRandomForests for visually exploring random forests.” The Comprehensive R Archive Network. DOI: .
[13] Friedman, J. H. 1984, ‘A variable span scatterplot smoother’, in SLAC PUB-3477 STAN-LCS 005. Technical Report, Stanford University. Accessed on October 1984.
[14] Grambsch, P. and T. Therneau (1994): “Proportional hazards tests and diagnostics based on weighted residuals,” Biometrika, 81, 515-526. · Zbl 0810.62096
[15] Gustafson, P. (2000): “Bayesian regression modeling with interactions and smooth effects,” J. Am. Stat. Assoc., 95, 795-806.
[16] Harrell, F. E. (1982): “Evaluating the yield of medical tests,” J. Am. Med. Assoc., 247, 2543-2546.
[17] Hastie, T., R. Tibshirani and J. Friedman (2009): The elements of statistical learning: data mining, inference, and prediction (2nd edn.), ed. Statistics, S. S. i., New York: Springer Science. · Zbl 1273.62005
[18] Ishwaran, H. (2007): “Variable importance in binary regression trees and forests,” Electron. J. Stat., 1, 519-537. · Zbl 1320.62158
[19] Ishwaran, H. and U. B. Kogalur (2007): “Random survival forests for R,” RNews, 7, 25-31.
[20] Ishwaran, H. & Kogalur, U. B. 2013. “Contributed R package randomForestSRC: random forests for survival, regression and classification (RF-SRC)”. The Comprehensive R Archive Network. DOI: .
[21] Ishwaran, H., U. B. Kogalur, E. H. Blackstone and M. S. Lauer (2008): “Random survival forests,” Ann. Appl. Stat., 2, 841-860. · Zbl 1149.62331
[22] Ishwaran, H., U. B. Kogalur, E. Z. Gorodeski, A. J. Minn and M. S. Lauer (2010): “High-dimensional variable selection for survival data,” J. Am. Stat. Assoc., 105, 205-217. · Zbl 1397.62220
[23] Ishwaran, H., T. A. Gerds, U. B. Kogalur, R. D. Moore, S. J. Gange and B. M. Lau (2014): “Random survival forests for competing risks,” Biostatistics, 15, 757-773.
[24] Kaplan, E. L. and P. Meier (1958): “Nonparametric estimation from incomplete observations,” J. Am. Stat. Assoc., 53, 457-481. · Zbl 0089.14801
[25] LeBlanc, M. and J. Crowley (1993): “Survival trees by goodness of split,” J. Am. Stat. Assoc., 88, 457-467. · Zbl 0773.62071
[26] Li, J., B. Horstman and Y. Chen (2011): “Detecting epistatic effects in association studies at a genomic level based on an ensemble approach,” Bioinformatics, 27, i222-i229.
[27] Lin, Y. and Y. Jeon (2006): “Random forests and adaptive nearest neighbors,” J. Am. Stat. Assoc., 101, 578-590. · Zbl 1119.62304
[28] Lunetta, K. L., L. B. Hayward, J. Segal and P. Van Eerdewegh (2004): “Screening large-scale association study data: exploiting interactions using random forests,” BMC Genet., 5, 32.
[29] Marchini, J., P. Donnelly and L. R. Cardon (2005): “Genome-wide strategies for detecting multiple loci that influence complex diseases,” Nat. Genet., 37, 413-417.
[30] McGill, R., J. W. Tukey and W. A. Larsen (1978): “Variations of box plots,” Am. Stat., 32, 12-16.
[31] Mehlotra, R. K., Dazard, J.-E., John, B., Zimmerman, P. A., Weinberg, A & Jurevic, R. J. 2012, “Copy number variation within human β-Defensin gene cluster influences progression to AIDS in the multicenter AIDS cohort study,” AIDS Clin. ResJ. AIDS Clin. Res., 3, 10.
[32] Mogensen, U. B., H. Ishwaran and T. A. Gerds (2012): “Evaluating random forests for survival analysis using prediction error curves,” J. Stat. Softw., 50, 1-23.
[33] Phillips, P. C. (2008): “Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems,” Nat. Rev. Genet., 9, 855-867.
[34] Segal, M. R. (1988): “Regression trees for censored data,” Biometrics, 44, 35-47. · Zbl 0707.62224
[35] Shepherd, J. C., Jacobson, L. P., Qiao, W., Jamieson, B. D., Phair, J. P., Piazza, P., T. C. Quinn, J. B. Margolick (2008): “Emergence and persistence of CXCR4-Tropic Hiv-1 in a population of men from the multicenter AIDS cohort study,” J. Infect. Dis., 198, 1104-1112.
[36] Simon, N. and R. Tibshirani (2015): “A permutation approach to testing interactions for binary response by comparing correlations between classes,” J. Am. Stat. Assoc., 110, 1707-1716. · Zbl 1373.62278
[37] Tian, L., A. A. Alizadeh, A. J. Gentles and R. Tibshirani (2014): “A simple method for estimating interactions between a treatment and a large number of covariates,” J. Am. Stat. Assoc., 109, 1517-1532. · Zbl 1368.62294
[38] Ueki, M. and H. J. Cordell (2012): “Improved statistics for genome-wide interaction analysis,” PLoS Genet., 8, e1002625.
[39] Wang, X., R. C. Elston and X. Zhu (2010): “The meaning of interaction,” Hum. Hered., 70, 269-277.
[40] Yung, L. S., C. Yang, X. Wan and W. Yu (2011): “GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies,” Bioinformatics, 27, 1309-1310.
[41] Zhang, Z., S. Zhang, M. Y. Wong, N. J. Wareham and Q. Sha (2008): “An ensemble learning approach jointly modeling main and interaction effects in genetic association studies,” Genet. Epidemiol., 32, 285-300.
[42] Zhang, X., F. Pan, Y. Xie, F. Zou and W. Wang (2010a): “COE: a general approach for efficient genome-wide two-locus epistasis test in disease association study,” J. Comput. Biol., 17, 401-415.
[43] Zhang, X., S. Huang, F. Zou and W. Wang (2010b): “Team: efficient two-locus epistasis tests in human genome-wide association study,” Bioinformatics, 26, i217-i227.
[44] Zhang, X., S. Huang, F. Zou and W. Wang (2011): “Tools for efficient epistasis detection in genome-wide association study,” Source Code Biol. Med., 6, 1.
[45] Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. R. Stat. Soc., 67, 301-320. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.