×

Broken adaptive ridge regression for right-censored survival data. (English) Zbl 07473255

Summary: Broken adaptive ridge (BAR) is a computationally scalable surrogate to \(L_0\)-penalized regression, which involves iteratively performing reweighted \(L_2\) penalized regressions and enjoys some appealing properties of both \(L_0\) and \(L_2\) penalized regressions while avoiding some of their limitations. In this paper, we extend the BAR method to the semi-parametric accelerated failure time (AFT) model for right-censored survival data. Specifically, we propose a censored BAR (CBAR) estimator by applying the BAR algorithm to the Leurgan’s synthetic data and show that the resulting CBAR estimator is consistent for variable selection, possesses an oracle property for parameter estimation and enjoys a grouping property for highly correlation covariates. Both low- and high-dimensional covariates are considered. The effectiveness of our method is demonstrated and compared with some popular penalization methods using simulations. Real data illustrations are provided on a diffuse large-B-cell lymphoma data and a glioblastoma multiforme data.

MSC:

62-XX Statistics

Software:

glmnet
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Akaike, H., A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19, 716-723 (1974) · Zbl 0314.62039
[2] Box, JK; Paquet, N.; Adams, MN; Boucher, D.; Bolderson, E.; Obyrne, KJ; Richard, DJ, Nucleophosmin: From structure and function to disease development, BMC Molecular Biology, 17, 19, 1-12 (2016)
[3] Breheny, P.; Huang, J., Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection, The Annals of Applied Statistics, 5, 1, 232-253 (2011) · Zbl 1220.62095
[4] Breiman, L., Heuristics of instability and stabilization in model selection, Annals of Statistics, 24, 2350-2383 (1996) · Zbl 0867.62055
[5] Buckley, J.; James, I., Linear regression with censored data, Biometrika, 66, 3, 429-436 (1979) · Zbl 0425.62051
[6] Cai, T.; Huang, J.; Tian, L., Regularized estimation for the accelerated failure time model, Biometrics, 65, 2, 394-404 (2009) · Zbl 1274.62736
[7] Chen, J.; Chen, Z., Extended Bayesian information criteria for model selection with large model spaces, Biometrika, 95, 759-771 (2008) · Zbl 1437.62415
[8] Cox, BDR, Regression models and life-tables, Journal of the Royal Statistical Society: Series B (Methodological), 34, 2, 187-220 (1972) · Zbl 0243.62041
[9] Cui, H.; Li, R.; Zhong, W., Model-free feature screening for ultrahigh dimensional discriminant analysis, Journal of the American Statistical Association, 110, 510, 630-641 (2015) · Zbl 1373.62305
[10] Dai, L.; Chen, K.; Sun, Z.; Liu, Z.; Li, G., Broken adaptive ridge regression and its asymptotic properties, Journal of Multivariate Analysis, 168, 334-351 (2018) · Zbl 1401.62108
[11] Dai, L.; Chen, K.; Li, G., The broken adaptive ridge procedure and its applications, Statistica Sinica, 30, 2, 1069-1094 (2020) · Zbl 1439.62169
[12] Datta, S.; Le-Rademacher, J.; Datta, S., Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and lasso, Biometrics, 63, 1, 259-271 (2007)
[13] Eirín-López, JM; Frehlick, LJ; Ausió, J., Long-term evolution and functional diversification in the members of the nucleophosmin/nucleoplasmin family of nuclear chaperones, Genetics, 173, 4, 1835-1850 (2006)
[14] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96, 456, 1348-1360 (2001) · Zbl 1073.62547
[15] Fan, J.; Li, R., Variable selection for cox’s proportional hazards model and frailty model, Annals of Statistics, 30, 1, 74-99 (2002) · Zbl 1012.62106
[16] Fan, J.; Lv, J., Sure independence screening for ultrahigh dimensional feature space, Journal of the Royal Statistical Society: Series B (Methodological), 70, 5, 849-911 (2008) · Zbl 1411.62187
[17] Foster, D.; George, E., The risk inflation criterion for multiple regression, Annals of Statistics, 22, 1947-1975 (1994) · Zbl 0829.62066
[18] Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software, 33, 1, 1-22 (2010)
[19] Huang, J.; Ma, S., Variable selection in the accelerated failure time model via the bridge method, Lifetime Data Analysis, 16, 2, 176-95 (2010) · Zbl 1322.62189
[20] Huang, J.; Ma, S.; Xie, H., Regularized estimation in the accelerated failure time model with high-dimensional covariates, Biometrics, 62, 3, 813-820 (2006) · Zbl 1111.62090
[21] Johnson, BA, On lasso for censored data, Electronic Journal of Statistics, 3, 2009, 485-506 (2009) · Zbl 1326.62201
[22] Johnson, BA; Lin, DY; Zeng, D., Penalized estimating functions and variable selection in semiparametric regression models, Journal of the American Statistical Association, 103, 482, 672-680 (2008) · Zbl 1471.62330
[23] Johnson, K. D., Lin, D., Ungar, L. H., Foster, D., Stine, R. (2015). A risk ratio comparison of \(l_0\) and \(l_1\) penalized regression. arXiv:1510.06319 [math.ST].
[24] Kalbfleisch, JD; Prentice, RL, The statistical analysis of failure time data (2002), Hoboken: Wiley, Hoboken · Zbl 1012.62104
[25] Kawaguchi, ES; Suchard, MA; Liu, Z.; Li, G., A surrogate \(l0\) sparse cox’s regression with applications to sparse high-dimensional massive sample size time-to-event data, Statistics in Medicine, 39, 6, 675-686 (2020)
[26] Koul, H.; Susarla, V.; Ryzin, JV, Regression analysis with randomly right-censored data, Annals of Statistics, 9, 6, 1276-1288 (1981) · Zbl 0477.62046
[27] Leurgans, S., Linear models, random censoring and synthetic data, Biometrika, 74, 2, 301-309 (1987) · Zbl 0649.62068
[28] Li, Y.; Dicker, L.; Zhao, SD, The dantzig selector for censored linear regression models, Statistica Sinica, 24, 1, 251-2568 (2014) · Zbl 1285.62075
[29] Liu, Y.; Chen, X.; Li, G., A new joint screening method for right-censored time-to-event data with ultra-high dimensional covariates, Statistical Methods in Medical Research, 29, 6, 1499-1513 (2020)
[30] Mallows, C., Some comments on \(c_p\), Technometrics, 15, 661-675 (1973) · Zbl 0269.62061
[31] Mummenhoff, J.; Houweling, AC; Peters, T.; Christoffels, VM; Rther, U., Expression of Irx6 during mouse morphogenesis, Mechanisms of Development, 103, 1-2, 193-195 (2001)
[32] Nachmani, D.; Bothmer, AH; Grisendi, S.; Mele, A.; Pandolfi, PP, Germline NPM1 mutations lead to altered rRNA 2-O-methylation and cause dyskeratosis congenita, Nature Genetics, 51, 10, 1518-1529 (2019)
[33] Nardi, Y.; Rinaldo, A., On the asymptotic properties of the group lasso estimator for linear models, Electronic Journal of Statistics, 2, 605-633 (2008) · Zbl 1320.62167
[34] Schwarz, G., Estimating the dimension of a model, Annals of Statistics, 6, 461-464 (1978) · Zbl 0379.62005
[35] Shen, X.; Pan, W.; Zhu, Y., Likelihood-based selection and sharp parameter estimation, Journal of the American Statistical Association, 107, 223-232 (2012) · Zbl 1261.62020
[36] Stute, W., Consistent estimation under random censorship when covariables are present, Journal of Multivariate Analysis, 45, 1, 89-103 (1993) · Zbl 0767.62036
[37] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), 58, 1, 267-288 (1996) · Zbl 0850.62538
[38] Tibshirani, R., The lasso method for variable selection in the cox model, Statistics in Medicine, 16, 4, 385-395 (1997)
[39] Wang, S.; Nan, B.; Zhu, J.; Beer, DG, Doubly penalized Buckley-James method for survival data with high-dimensional covariates, Biometrics, 64, 1, 132-140 (2008) · Zbl 1139.62063
[40] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 1, 49-67 (2006) · Zbl 1141.62030
[41] Zhang, CH, Nearly unbiased variable selection under minimax concave penalty, Annals of Statistics, 38, 2, 894-942 (2010) · Zbl 1183.62120
[42] Zhao, H.; Wu, Q.; Li, G.; Sun, J., Simultaneous estimation and variable selection for interval-censored data with broken adaptive ridge regression, Journal of the American Statistical Association, 115, 529, 204-216 (2019) · Zbl 1437.62283
[43] Zhou, M., Asymptotic normality of the synthetic data regression estimator for censored survival data, Annals of Statistics, 20, 2, 1002-1021 (1992) · Zbl 0748.62024
[44] Zhu, L.; Li, L.; Li, R.; Zhu, L., Model-free feature screening for ultrahigh dimensional data, Journal of the American Statistical Association, 106, 496, 1464-1475 (2011) · Zbl 1233.62195
[45] Zou, H., The adaptive lasso and its oracle properties, Journal of the American Statistical Association, 101, 476, 1418-1429 (2006) · Zbl 1171.62326
[46] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 2, 301-320 (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.