Regularization for Cox’s proportional hazards model with NP-dimensionality. (English) Zbl 1246.62202

Summary: High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding the need for better measurement specific model selection. We establish strong oracle properties of non-concave penalized methods for nonpolynomial (NP) dimensional data with censoring in the framework of Cox’s proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that non-concave penalties lead to significant reduction of the “irrepresentable condition” needed for LASSO model selection consistency. The large deviations result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the non-concave regularized estimator is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62N02 Estimation in survival analysis and censored data
62N01 Censored data models
60G44 Martingales with continuous parameter
60F10 Large deviations
92C50 Medical applications (general)


