Identification of biomarker-by-treatment interactions in randomized clinical trials with survival outcomes and high-dimensional spaces. (English) Zbl 1369.62306

Summary: Stratified medicine seeks to identify biomarkers or parsimonious gene signatures distinguishing patients that will benefit most from a targeted treatment. We evaluated 12 approaches in high-dimensional Cox models in randomized clinical trials: penalization of the biomarker main effects and biomarker-by-treatment interactions (full-lasso, three kinds of adaptive lasso, ridge+lasso and group-lasso); dimensionality reduction of the main effect matrix via linear combinations (PCA+lasso (where PCA is principal components analysis) or PLS+lasso (where PLS is partial least squares)); penalization of modified covariates or of the arm-specific biomarker effects (two-I model); gradient boosting; and univariate approach with control of multiple testing. We compared these methods via simulations, evaluating their selection abilities in null and alternative scenarios. We varied the number of biomarkers, of nonnull main effects and true biomarker-by-treatment interactions. We also proposed a novel measure evaluating the interaction strength of the developed gene signatures. In the null scenarios, the group-lasso, two-I model, and gradient boosting performed poorly in the presence of nonnull main effects, and performed well in alternative scenarios with also high interaction strength. The adaptive lasso with grouped weights was too conservative. The modified covariates, PCA+lasso, PLS+lasso, and ridge+lasso performed moderately. The full-lasso and adaptive lassos performed well, with the exception of the full-lasso in the presence of only nonnull main effects. The univariate approach performed poorly in alternative scenarios. We also illustrate the methods using gene expression data from 614 breast cancer patients treated with adjuvant chemotherapy.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62N03 Testing in survival analysis and censored data
62J15 Paired and multiple comparisons; multiple testing
Full Text: DOI


[1] Amado, Wild-type KRAS is required for panitumumab efficacy in patients with metastatic colorectal cancer, Journal of Clinical Oncology 26 pp 1626– (2008)
[2] Benjamini, Controlling the false dicovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B 57 pp 289– (1995) · Zbl 0809.62014
[3] Bertrand , F. Maumy-Bertrand , M. Meyer , N. 2014 plsRcox: Partial least squares regression for Cox models and related techniques
[4] Betensky, Influence of unrecognized molecular heterogeneity on randomized clinical trials, Journal of Clinical Oncology 20 pp 2495– (2002)
[5] Bien, A lasso for hierarchical interactions, The Annals of Statistics 41 pp 1111– (2013) · Zbl 1292.62109
[6] Bühlmann, Boosting with L2 loss: regression and classification, Journal of the American Statistical Association 98 pp 324– (2003) · Zbl 1041.62029
[7] Buyse, Omics-based clinical trial designs, Current Opinion in Oncology 25 pp 289– (2013)
[8] Buyse, Integrating biomarkers in clinical trials, Expert Review of Molecular Diagnostics 11 pp 171– (2011)
[9] Cox, Regression models and life-tables (with discussion), Journal of the Royal Statistical Society: Series B 34 pp 187– (1972) · Zbl 0243.62041
[10] Davis , J. Goadrich , M. 2006 The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning 233 240
[11] Desmedt, Multifactorial approach to predicting resistance to anthracyclines, Journal of Clinical Oncology 29 pp 1578– (2011)
[12] Friedman, Greedy function approximation: a gradient boosting machine, Annals of Statistics 29 pp 1189– (2001) · Zbl 1043.62034
[13] Friedman, Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software 33 pp 1– (2010)
[14] Friedman , J. Hastie , T. Simon , N. Tibshirani , R. 2016 glmnet: Lasso and elastic-net regularized generalized linear models
[15] Gehrmann , M. Von Törne , C. 2009
[16] Genovese, Operating characteristics and extensions of the false discovery rate procedure, Journal of the Royal Statistical Society: Series B 64 pp 499– (2002) · Zbl 1090.62072
[17] Hastie, Model Assessment and Selection (2001)
[18] Hatzis, A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer, Journal of the American Medical Association 305 pp 1873– (2011)
[19] Hingorani, Prognosis research strategy (PROGRESS) 4: stratified medicine research, British Medical Journal 346 pp e5793– (2013)
[20] Hoerl, Ridge regression: biased estimation for nonorthogonal problems, Technometrics 42 pp 80– (1970) · Zbl 0202.17205
[21] Hothorn , T. Bülhmann , P. Kneib , T. Schmid , M. Hofner , B. Sobotka , F. Scheipl , F. 2016 mboost: Model-based boosting. R package version 2.6-0
[22] Lockhart, A significance test for the lasso, The Annals of Statistics 42 pp 413– (2014) · Zbl 1305.62254
[23] Magbanua, Serial expression analysis of breast tumors during neoadjuvant chemotherapy reveals changes in cell cycle and immune pathways associated with recurrence and response, Breast Cancer Research 17 pp 1– (2015)
[24] Marín-Aguilera, Identification of docetaxel resistance genes in castration-resistant prostate cancer, Molecular Cancer Therapeutics 11 pp 329– (2011)
[25] Martens, Multivariate Calibration pp 237– (1989)
[26] McCall, Frozen robust multiarray analysis (fRMA), Biostatistics 11 pp 242– (2010)
[27] Meier , L. 2015 grplasso: Fitting user specified models with Group Lasso penalty
[28] Michiels, Interpretation of microarray data in cancer, British Journal of Cancer 96 pp 1155– (2007)
[29] Michiels, Multiple testing of treatment-effect-modifying biomarkers in a randomized clinical trial with a survival endpoint, Statistics in Medicine 30 pp 1502– (2011)
[30] Michiels, Design and Analysis of Clinical Trials for Predictive Medicine pp 187– (2015)
[31] Mok, Gefitinib or Carboplatin-Paclitaxel in pulmonary adenocarcinoma, New England Journal of Medicine 361 pp 947– (2009)
[32] Pawitan, In All Likelihood: Statistical Modelling and Inference Using Likelihood (2013) · Zbl 1256.62006
[33] Pawitan, False discovery rate, sensitivity and sample size for microarray studies, Bioinformatics 21 pp 3017– (2005)
[34] Perez, Genomic analysis reveals that immune function genes are strongly linked to clinical outcome in the North Central Cancer Treatment Group N9831 adjuvant trastuzumab trial, Journal of Clinical Oncology 33 pp 701– (2015)
[35] Pogue-Geile, Predicting degree of benefit from adjuvant trastuzumab in NSABP trial B-31, Journal of the National Cancer Institute 105 pp 1782– (2013)
[36] Rothwell, Treating individuals 2: subgroup analysis in randomised controlled trials: importance, indications, and interpretation, Lancet 365 pp 176– (2005)
[37] Royston, Interactions between treatment and continuous covariates: a step toward individualizing therapy, Journal of Clinical Oncology 26 pp 1397– (2008)
[38] Schäfer , J. Opgen-rhein , R. Zuber , V. Ahdesmäki , M. Duarte Silva , A. P. Strimmer , K. 2015 corpcor: Efficient estimation of covariance and (partial) correlation
[39] Schemper, Non-parametric analysis of treatment-covariate interaction in the presence of censoring, Statistics in Medicine 7 pp 1257– (1988)
[40] Shabalin, Merging two gene-expression studies via cross-platform normalization, Bioinformatics 24 pp 1154– (2008) · Zbl 05511616
[41] Ternès, Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models, Statistics in Medicine 35 pp 2561– (2016)
[42] Tian, A simple method for estimating interactions between a treatment and a large number of covariates, Journal of the American Statistical Association 109 pp 1517– (2014) · Zbl 1368.62294
[43] Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B 58 pp 267– (1996) · Zbl 0850.62538
[44] Tibshirani, The lasso method for variable selection in the Cox model, Statistics in Medicine 16 pp 385– (1997)
[45] Ulloa-Montoya, Predictive gene signature in MAGE-A3 antigen-specific cancer immunotherapy, Journal of Clinical Oncology 31 pp 2388– (2013)
[46] Uno, On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data, Statistics in Medicine 30 pp 1105– (2011)
[47] Verweij, Cross-validation in survival analysis, Statistics in Medicine 12 pp 2305– (1993)
[48] Verweij, Penalized likelihood in Cox regression, Statistics in Medicine 13 pp 2427– (1994)
[49] Wang , E. Li , J. O’Connor-McCourt , M. Purisima , E. 2014
[50] Yuan, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B 68 pp 49– (2006) · Zbl 1141.62030
[51] Zhang, Adaptive lasso for Cox’s proportional hazards model, Biometrika 94 pp 691– (2007) · Zbl 1135.62083
[52] Zou, The adaptive lasso and its oracle properties, Journal of the American Statistical Association 101 pp 1418– (2006) · Zbl 1171.62326
[53] Zou, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B 67 pp 301– (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.