\(L_{1}\) penalized estimation in the Cox proportional hazards model. (English) Zbl 1207.62185

Summary: This article presents a novel algorithm that efficiently computes \(L_{1}\) penalized (lasso) estimates of parameters in high-dimensional models. The lasso has the property that it simultaneously performs variable selection and shrinkage, which makes it very useful for finding interpretable prediction rules in high-dimensional data. The new algorithm is based on a combination of gradient ascent optimization with the Newton-Raphson algorithm. It is described for a general likelihood function and can be applied in generalized linear models and other models with an \(L_{1}\) penalty. The algorithm is demonstrated in the Cox proportional hazards model, predicting survival of breast cancer patients using gene expression data, and its performance is compared with competing approaches. An R package, penalized, that implements the method, is available on CRAN.


62N02 Estimation in survival analysis and censored data
62P10 Applications of statistics to biology and medical sciences; meta analysis
65K10 Numerical optimization and variational techniques
62J12 Generalized linear models (logistic models)
92C50 Medical applications (general)
65C60 Computational problems in statistics (MSC2010)
Full Text: DOI


[1] Beer, Gene-expression profiles predict survival of patients with lung adenocarcinoma, Nature Medicine 8 pp 816– (2002)
[2] BÃ\c{}velstad, Predicting survival from microarray data â a comparative study, Bioinformatics 23 pp 2080– (2007)
[3] De Boer, Statistical analysis of sediment toxicity by additive monotone regression splines, Ecotoxicology 11 pp 435– (2002)
[4] Efron, Least angle regression, Annals of Statistics 32 pp 407– (2004) · Zbl 1091.62054
[5] Genkin, Large-scale Bayesian logistic regression for text categorization, Technometrics 49 pp 291– (2007)
[6] Goeman, Testing association of a pathway with survival using gene expression data, Bioinformatics 21 pp 1950– (2005)
[7] Gui, Penalized cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data, Bioinformatics 21 pp 3001– (2005)
[8] Keerthi, A fast tracking algorithm for generalized LARS/LASSO, IEEE Transactions on Neural Networks 18 pp 1826– (2007)
[9] Kim, Y. and Kim, J., (2004). Gradient lasso for feature selection. In Proceedings of the 21st International Conference on Machine Learning, Volume 69 of ACM International Conference Proceeding Series, pp. 473â480.
[10] Meier, The group lasso for logistic regression, Journal of The Royal Statistical Society Series B 70 pp 53– (2008) · Zbl 1400.62276
[11] Osborne, On the lasso and its dual, Journal of Computational and Graphical Statistics 9 pp 319– (2000)
[12] Park, L1 regularization path algorithm for generalized linear models, Journal of the Royal Statistical Society Series B 19 pp 659– (2007)
[13] Rosenwald, The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma, New England Journal of Medicine 346 pp 1937– (2002)
[14] Segal, Microarray gene expression data with linked survival phenotypes: diffuse large-b-cell lymphoma revisited, Biostatistics 7 pp 268– (2006) · Zbl 1169.62388
[15] Shevade, A simple and efficient algorithm for gene selection using sparse logistic regression, Bioinformatics 19 pp 2246– (2003)
[16] Tibshirani, Regression shrinkage and selection via the LASSO, Journal of the Royal Statistical Society Series B 58 pp 267– (1996) · Zbl 0850.62538
[17] Tibshirani, The LASSO method for variable selection in the Cox model, Statistics in Medicine 16 pp 385– (1997)
[18] Van de Vijver, A gene-expression signature as a predictor of survival in breast cancer, New England Journal of Medicine 347 pp 1999– (2002)
[19] Van Houwelingen, Cross-validated Cox regression on microarray gene expression data, Statistics in Medicine 25 pp 3201– (2006)
[20] Van’t Veer, Gene expression profiling predicts clinical outcome of breast cancer, Nature 415 pp 530– (2002)
[21] van Wieringen, Survival prediction using gene expression data: A review and comparison, Computational Statistics & Data Analysis 53 pp 1590– (2009) · Zbl 1453.62225
[22] Verweij, Cross-validation in survival analysis, Statistics in Medicine 12 pp 2305– (1993)
[23] Wang, Gene-expression pro-files to predict distant metastasis of lymph-node-negative primary breast cancer, Lancet 365 pp 671– (2005)
[24] Yuan, Model selection and estimation in regression with grouped variables, Journal of The Royal Statistical Society Series B 68 pp 49– (2006) · Zbl 1141.62030
[25] Zou, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B 67 pp 301– (2005) · Zbl 1069.62054
[26] Zou, On the âdegrees of freedomâ of the lasso, Annals of Statistics 35 pp 2173– (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.