# zbMATH — the first resource for mathematics

Covariate assisted screening and estimation. (English) Zbl 1310.62085
Summary: Consider a linear model $$Y=X\beta +z$$, where $$X=X_{n,p}$$ and $$z\sim N(0,I_{n})$$. The vector $$\beta$$ is unknown but is sparse in the sense that most of its coordinates are $$0$$. The main interest is to separate its nonzero coordinates from the zero ones (i.e., variable selection). Motivated by examples in long-memory time series [J. Fan and Q. Yao, Nonlinear time series. Nonparametric and parametric methods. New York, NY: Springer (2003; Zbl 1014.62103)] and the change-point problem [P. K. Bhattacharya, IMS Lect. Notes, Monogr. Ser. 23, 28-56 (1994; Zbl 1157.62331)], we are primarily interested in the case where the Gram matrix $$G=X^\prime X$$ is nonsparse but sparsifiable by a finite order linear filter. We focus on the regime where signals are both rare and weak so that successful variable selection is very challenging but is still possible.
We approach this problem by a new procedure called the covariate assisted screening and estimation (CASE). CASE first uses a linear filtering to reduce the original setting to a new regression model where the corresponding Gram (covariance) matrix is sparse. The new covariance matrix induces a sparse graph, which guides us to conduct multivariate screening without visiting all the submodels. By interacting with the signal sparsity, the graph enables us to decompose the original problem into many separated small-size subproblems (if only we know where they are!). Linear filtering also induces a so-called problem of information leakage, which can be overcome by the newly introduced patching technique. Together, these give rise to CASE, which is a two-stage screen and clean [J. Fan and R. Song, Ann. Stat. 38, No. 6, 3567–3604 (2010; Zbl 1206.68157)] procedure, where we first identify candidates of these submodels by patching and screening, and then re-examine each candidate to remove false positives.
For any procedure $$\hat{\beta}$$ for variable selection, we measure the performance by the minimax Hamming distance between the sign vectors of $$\hat{\beta}$$ and $$\beta$$. We show that in a broad class of situations where the Gram matrix is nonsparse but sparsifiable, CASE achieves the optimal rate of convergence. The results are successfully applied to long-memory time series and the change-point model.

##### MSC:
 62J05 Linear regression; mixed models 62J07 Ridge regression; shrinkage estimators (Lasso) 62C20 Minimax procedures in statistical decision theory 62F12 Asymptotic properties of parametric estimators 62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
##### Software:
glasso; ScreenClean
Full Text:
##### References:
  Andreou, E. and Ghysels, E. (2002). Detecting multiple breaks in financial market volatility dynamics. J. Appl. Econometrics 17 579-600.  Bhattacharya, P. K. (1994). Some aspects of change-point analysis. In Change-Point Problems ( South Hadley , MA , 1992). Institute of Mathematical Statistics Lecture Notes-Monograph Series 23 28-56. IMS, Hayward, CA. · Zbl 1157.62331 · doi:10.1214/lnms/1215463112  Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by $$\ell_1$$ minimization. Ann. Statist. 37 2145-2177. · Zbl 1173.62053 · doi:10.1214/08-AOS653 · arxiv:0801.0345  Chen, W. W., Hurvich, C. M. and Lu, Y. (2006). On the correlation matrix of the discrete Fourier transform and the fast solution of large Toeplitz systems for long-memory time series. J. Amer. Statist. Assoc. 101 812-822. · Zbl 1119.62358 · doi:10.1198/016214505000001069 · miranda.asa.catchword.org  Donoho, D. L. and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47 2845-2862. · Zbl 1019.94503 · doi:10.1109/18.959265  Donoho, D. and Jin, J. (2008). Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 105 14790-14795. · Zbl 1357.62212 · doi:10.1073/pnas.0807471105  Donoho, D. L. and Stark, P. B. (1989). Uncertainty principles and signal recovery. SIAM J. Appl. Math. 49 906-931. · Zbl 0689.42001 · doi:10.1137/0149053  Fan, J., Guo, S. and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 37-65. · doi:10.1111/j.1467-9868.2011.01005.x  Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273  Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. Ann. Statist. 38 3567-3604. · Zbl 1206.68157 · doi:10.1214/10-AOS798 · arxiv:0903.5255  Fan, J., Xue, L. and Zou, H. (2014). Strong oracle optimality of folded concave penalized estimation. Ann. Statist. 42 819-849. · Zbl 1305.62252 · doi:10.1214/13-AOS1198 · euclid:aos/1400592644 · arxiv:1210.5992  Fan, J. and Yao, Q. (2003). Nonlinear Time Series : Nonparametric and Parametric Methods . Springer, New York. · Zbl 1014.62103  Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432-441. · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045  Genovese, C. R., Jin, J., Wasserman, L. and Yao, Z. (2012). A comparison of the lasso and marginal regression. J. Mach. Learn. Res. 13 2107-2143. · Zbl 1435.62270  Harchaoui, Z. and Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty. J. Amer. Statist. Assoc. 105 1480-1493. · Zbl 1388.62211 · doi:10.1198/jasa.2010.tm09181  Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Medicine 2 e124.  Ising, E. (1925). A contribution to the theory of ferromagnetism. Z. Phys 31 253-258.  Ji, P. and Jin, J. (2012). UPS delivers optimal phase diagram in high-dimensional variable selection. Ann. Statist. 40 73-103. · Zbl 1246.62160 · doi:10.1214/11-AOS947 · euclid:aos/1331830775 · arxiv:1010.5028  Jin, J., Zhang, C.-H. and Zhang, Q. (2014). Optimality of graphlet screening in high dimensional variable selection. J. Mach. Learn. Res. 15 2723-2772. · Zbl 1319.62139 · jmlr.csail.mit.edu · arxiv:1204.6452  Ke, Z. T., Jin, J. and Fan, J. (2014). Supplement to “Covariate assisted screening and estimation.” . · Zbl 1310.62085 · doi:10.1214/14-AOS1243 · euclid:aos/1413810726 · arxiv:1205.4645  Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation , 2nd ed. Springer, New York. · Zbl 0916.62017  Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281 · arxiv:math/0608017  Moulines, E. and Soulier, P. (1999). Broadband log-periodogram regression of time series with long-range dependence. Ann. Statist. 27 1415-1439. · Zbl 0962.62085 · doi:10.1214/aos/1017938932  Niu, Y. S. and Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variations. Ann. Appl. Stat. 6 1306-1326. · Zbl 1401.92145 · doi:10.1214/12-AOAS539  Olshen, A. B., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5 557-572. · Zbl 1155.62478 · doi:10.1093/biostatistics/kxh008  Ray, B. K. and Tsay, R. S. (2000). Long-range dependence in daily stock volatilities. J. Bus. Econom. Statist. 18 254-262.  Siegmund, D. O. (2011). Personal communication.  Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879-898. · Zbl 1452.62515 · doi:10.1093/biomet/ass043  Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267-288. · Zbl 0850.62538  Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9 18-29. · Zbl 1274.62886 · doi:10.1093/biostatistics/kxm013  Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178-2201. · Zbl 1173.62054 · doi:10.1214/08-AOS646 · arxiv:0704.1139  Yao, Y.-C. and Au, S. T. (1989). Least-squares estimation of a step function. Sankhyā Ser. A 51 370-381. · Zbl 0711.62031  Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894-942. · Zbl 1183.62120 · doi:10.1214/09-AOS729 · arxiv:1002.4734  Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous changepoints in multiple sequences. Biometrika 97 631-645. · Zbl 1195.62168 · doi:10.1093/biomet/asq025  Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008 · www.jmlr.org  Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.