zbMATH — the first resource for mathematics

Covariate assisted screening and estimation. (English) Zbl 1310.62085
Summary: Consider a linear model \(Y=X\beta +z\), where \(X=X_{n,p}\) and \(z\sim N(0,I_{n})\). The vector \(\beta\) is unknown but is sparse in the sense that most of its coordinates are \(0\). The main interest is to separate its nonzero coordinates from the zero ones (i.e., variable selection). Motivated by examples in long-memory time series [J. Fan and Q. Yao, Nonlinear time series. Nonparametric and parametric methods. New York, NY: Springer (2003; Zbl 1014.62103)] and the change-point problem [P. K. Bhattacharya, IMS Lect. Notes, Monogr. Ser. 23, 28-56 (1994; Zbl 1157.62331)], we are primarily interested in the case where the Gram matrix \(G=X^\prime X\) is nonsparse but sparsifiable by a finite order linear filter. We focus on the regime where signals are both rare and weak so that successful variable selection is very challenging but is still possible.
We approach this problem by a new procedure called the covariate assisted screening and estimation (CASE). CASE first uses a linear filtering to reduce the original setting to a new regression model where the corresponding Gram (covariance) matrix is sparse. The new covariance matrix induces a sparse graph, which guides us to conduct multivariate screening without visiting all the submodels. By interacting with the signal sparsity, the graph enables us to decompose the original problem into many separated small-size subproblems (if only we know where they are!). Linear filtering also induces a so-called problem of information leakage, which can be overcome by the newly introduced patching technique. Together, these give rise to CASE, which is a two-stage screen and clean [J. Fan and R. Song, Ann. Stat. 38, No. 6, 3567–3604 (2010; Zbl 1206.68157)] procedure, where we first identify candidates of these submodels by patching and screening, and then re-examine each candidate to remove false positives.
For any procedure \(\hat{\beta}\) for variable selection, we measure the performance by the minimax Hamming distance between the sign vectors of \(\hat{\beta}\) and \(\beta\). We show that in a broad class of situations where the Gram matrix is nonsparse but sparsifiable, CASE achieves the optimal rate of convergence. The results are successfully applied to long-memory time series and the change-point model.

62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
62C20 Minimax procedures in statistical decision theory
62F12 Asymptotic properties of parametric estimators
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
glasso; ScreenClean
Full Text: DOI Euclid arXiv
[1] Andreou, E. and Ghysels, E. (2002). Detecting multiple breaks in financial market volatility dynamics. J. Appl. Econometrics 17 579-600.
[2] Bhattacharya, P. K. (1994). Some aspects of change-point analysis. In Change-Point Problems ( South Hadley , MA , 1992). Institute of Mathematical Statistics Lecture Notes-Monograph Series 23 28-56. IMS, Hayward, CA. · Zbl 1157.62331 · doi:10.1214/lnms/1215463112
[3] Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by \(\ell_1\) minimization. Ann. Statist. 37 2145-2177. · Zbl 1173.62053 · doi:10.1214/08-AOS653 · arxiv:0801.0345
[4] Chen, W. W., Hurvich, C. M. and Lu, Y. (2006). On the correlation matrix of the discrete Fourier transform and the fast solution of large Toeplitz systems for long-memory time series. J. Amer. Statist. Assoc. 101 812-822. · Zbl 1119.62358 · doi:10.1198/016214505000001069 · miranda.asa.catchword.org
[5] Donoho, D. L. and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47 2845-2862. · Zbl 1019.94503 · doi:10.1109/18.959265
[6] Donoho, D. and Jin, J. (2008). Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 105 14790-14795. · Zbl 1357.62212 · doi:10.1073/pnas.0807471105
[7] Donoho, D. L. and Stark, P. B. (1989). Uncertainty principles and signal recovery. SIAM J. Appl. Math. 49 906-931. · Zbl 0689.42001 · doi:10.1137/0149053
[8] Fan, J., Guo, S. and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 37-65. · doi:10.1111/j.1467-9868.2011.01005.x
[9] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[10] Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. Ann. Statist. 38 3567-3604. · Zbl 1206.68157 · doi:10.1214/10-AOS798 · arxiv:0903.5255
[11] Fan, J., Xue, L. and Zou, H. (2014). Strong oracle optimality of folded concave penalized estimation. Ann. Statist. 42 819-849. · Zbl 1305.62252 · doi:10.1214/13-AOS1198 · euclid:aos/1400592644 · arxiv:1210.5992
[12] Fan, J. and Yao, Q. (2003). Nonlinear Time Series : Nonparametric and Parametric Methods . Springer, New York. · Zbl 1014.62103
[13] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432-441. · Zbl 1143.62076 · doi:10.1093/biostatistics/kxm045
[14] Genovese, C. R., Jin, J., Wasserman, L. and Yao, Z. (2012). A comparison of the lasso and marginal regression. J. Mach. Learn. Res. 13 2107-2143. · Zbl 1435.62270
[15] Harchaoui, Z. and Lévy-Leduc, C. (2010). Multiple change-point estimation with a total variation penalty. J. Amer. Statist. Assoc. 105 1480-1493. · Zbl 1388.62211 · doi:10.1198/jasa.2010.tm09181
[16] Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Medicine 2 e124.
[17] Ising, E. (1925). A contribution to the theory of ferromagnetism. Z. Phys 31 253-258.
[18] Ji, P. and Jin, J. (2012). UPS delivers optimal phase diagram in high-dimensional variable selection. Ann. Statist. 40 73-103. · Zbl 1246.62160 · doi:10.1214/11-AOS947 · euclid:aos/1331830775 · arxiv:1010.5028
[19] Jin, J., Zhang, C.-H. and Zhang, Q. (2014). Optimality of graphlet screening in high dimensional variable selection. J. Mach. Learn. Res. 15 2723-2772. · Zbl 1319.62139 · jmlr.csail.mit.edu · arxiv:1204.6452
[20] Ke, Z. T., Jin, J. and Fan, J. (2014). Supplement to “Covariate assisted screening and estimation.” . · Zbl 1310.62085 · doi:10.1214/14-AOS1243 · euclid:aos/1413810726 · arxiv:1205.4645
[21] Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation , 2nd ed. Springer, New York. · Zbl 0916.62017
[22] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281 · arxiv:math/0608017
[23] Moulines, E. and Soulier, P. (1999). Broadband log-periodogram regression of time series with long-range dependence. Ann. Statist. 27 1415-1439. · Zbl 0962.62085 · doi:10.1214/aos/1017938932
[24] Niu, Y. S. and Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variations. Ann. Appl. Stat. 6 1306-1326. · Zbl 1401.92145 · doi:10.1214/12-AOAS539
[25] Olshen, A. B., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5 557-572. · Zbl 1155.62478 · doi:10.1093/biostatistics/kxh008
[26] Ray, B. K. and Tsay, R. S. (2000). Long-range dependence in daily stock volatilities. J. Bus. Econom. Statist. 18 254-262.
[27] Siegmund, D. O. (2011). Personal communication.
[28] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879-898. · Zbl 1452.62515 · doi:10.1093/biomet/ass043
[29] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267-288. · Zbl 0850.62538
[30] Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9 18-29. · Zbl 1274.62886 · doi:10.1093/biostatistics/kxm013
[31] Wasserman, L. and Roeder, K. (2009). High-dimensional variable selection. Ann. Statist. 37 2178-2201. · Zbl 1173.62054 · doi:10.1214/08-AOS646 · arxiv:0704.1139
[32] Yao, Y.-C. and Au, S. T. (1989). Least-squares estimation of a step function. Sankhyā Ser. A 51 370-381. · Zbl 0711.62031
[33] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894-942. · Zbl 1183.62120 · doi:10.1214/09-AOS729 · arxiv:1002.4734
[34] Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous changepoints in multiple sequences. Biometrika 97 631-645. · Zbl 1195.62168 · doi:10.1093/biomet/asq025
[35] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008 · www.jmlr.org
[36] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.