×

Lasso-type recovery of sparse representations for high-dimensional data. (English) Zbl 1155.62050

Summary: The Lasso is an attractive technique for regularization and variable selection for high-dimensional data, where the number of predictor variables \(p_n\) is potentially much larger than the number of samples \(n\). However, it was recently discovered that the sparsity pattern of the Lasso estimator can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the so-called irrepresentable condition. The latter condition can easily be violated in the presence of highly correlated variables.
We examine the behavior of the Lasso estimators if the irrepresentability condition is relaxed. Even though the Lasso cannot recover the correct sparsity pattern, we show that the estimator is still consistent in the \(\ell_2\)-norm sense for fixed designs under conditions on (a) the number \(s_n\) of nonzero components of the vector \(\beta_n\) and (b) the minimal singular values of design matrices that are induced by selecting small subsets of variables. Furthermore, a rate of convergence result is obtained on the \(\ell_2\) error with an appropriate choice of the smoothing parameter. The rate is shown to be optimal under the condition of bounded maximal and minimal sparse eigenvalues. Our results imply that, with high probability, all important variables are selected. The set of selected variables is a meaningful reduction on the original set of variables. Finally, our results are illustrated with the detection of closely adjacent frequencies, a problem encountered in astrophysics.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62F12 Asymptotic properties of parametric estimators
62G08 Nonparametric regression and quantile regression
62F07 Statistical ranking and selection procedures
62J05 Linear regression; mixed models

References:

[1] Bickel, P., Ritov, Y. and Tsybakov, A. (2008). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[2] Bunea, B., Tsybakov, A. and Wegkamp, M. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674-1697. · Zbl 1209.62065 · doi:10.1214/009053606000001587
[3] Bunea, F., Tsybakov, A. and Wegkamp, M. (2006). Sparsity oracle inequalities for the Lasso. Electron. J. Statist. 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[4] Candes, E. and Tao, T. (2005a). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203-4215. · Zbl 1264.94121 · doi:10.1109/TIT.2005.858979
[5] Candes, E. and Tao, T. (2005b). The Dantzig selector: Statistical estimation when p is much larger than n . Ann. Statist. 35 2313-2351. · Zbl 1139.62019 · doi:10.1214/009053606000001523
[6] Cornish, N. and Crowder, J. (2005). LISA data analysis using Markov chain Monte Carlo methods. Phys. Rev. D 72 43005.
[7] Davidson, K. and Szarek, S. (2001). Local operator theory, random matrices and Banach spaces. In Handbook on the Geometry of Banach Spaces 1 (W. B. Johnson and J. Lindenstrauss, eds.) 317-366. North -Holland, Amsterdam. · Zbl 1067.46008 · doi:10.1016/S1874-5849(01)80010-3
[8] Donoho, D. (2006). For most large underdetermined systems of linear equations, the minimal l 1 -norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 797-829. · Zbl 1113.15004 · doi:10.1002/cpa.20132
[9] Donoho, D. and Elad, M. (2003). Optimally sparse representation in general (nonorthogonal) dictionaries via \ell 1 -minimization. Proc. Natl. Acad. Sci. USA 100 2197-2202. · Zbl 1064.94011 · doi:10.1073/pnas.0437847100
[10] Donoho, D., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6-18. · Zbl 1288.94017 · doi:10.1109/TIT.2005.860430
[11] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-451. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[12] Fuchs, J. (2005). Recovery of exact sparse representations in the presence of bounded noise. IEEE Trans. Inform. Theory 51 3601-3608. · Zbl 1286.94031 · doi:10.1109/TIT.2005.855614
[13] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional predictor selection and the virtue of over-parametrization. Bernoulli 10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[14] Gribonval, R. and Nielsen, M. (2003). Sparse representations in unions of bases. IEEE Trans. Inform. Theory 49 3320-3325. · Zbl 1286.94032 · doi:10.1109/TIT.2003.820031
[15] Hall, P., Reimann, J. and Rice, J. (2000). Nonparametric estimation of a periodic function. Biometrika 87 545-557. JSTOR: · Zbl 0956.62031 · doi:10.1093/biomet/87.3.527
[16] Hannan, E. and Quinn, B. (1989). The resolution of closely adjacent spectral lines. J. Time Ser. Anal. 10 13-31. · Zbl 0683.62051 · doi:10.1111/j.1467-9892.1989.tb00012.x
[17] Joshi, R., Crump, V. and Fischer, T. (1995). Image subband coding using arithmetic coded trellis codedquantization. IEEE Transactions on Circuits and Systems for Video Technology 5 515-523.
[18] Knight, K. and Fu, W. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356-1378. · Zbl 1105.62357 · doi:10.1214/aos/1015957397
[19] LoPresto, S., Ramchandran, K. and Orchard, M. (1997). Image coding based on mixture modeling of wavelet coefficients and a fast estimation-quantization framework. In Proc. Data Compression Conference 221-230.
[20] Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Machine Intelligence 11 674-693. · Zbl 0709.94650 · doi:10.1109/34.192463
[21] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. Roy. Statist. Soc. Ser. B 70 53-71. · Zbl 1400.62276
[22] Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374-393. · Zbl 1452.62522
[23] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082 · doi:10.1214/009053606000000281
[24] Meinshausen, N., Rocha, G. and Yu, B. (2007). A tale of three cousins: Lasso, L2Boosting and Dantzig. Ann. Statist. 35 2373-2384. · doi:10.1214/009053607000000460
[25] Osborne, M., Presnell, B. and Turlach, B. (2000). On the Lasso and its dual. J. Comput. Graph. Statistics 9 319-337. JSTOR: · doi:10.2307/1390657
[26] Paul, D. (2007). Asymptotics of sample eigenstructure for a large-dimensional spiked covariance model. Statist. Sinica 17 1617-1642. · Zbl 1134.62029
[27] Pojmanski, G. (2002). The All Sky Automated Survey. Catalog of Variable Stars. I. 0 h-6 Quarter of the Southern Hemisphere. Acta Astronomica 52 397-427.
[28] Scargle, J. (1982). Studies in astronomical time series analysis. II. Statistical aspects of spectral analysis of unevenly spaced data. Astrophysical J. 263 835.
[29] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538
[30] Tropp, J. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory 50 2231-2242. · Zbl 1288.94019 · doi:10.1109/TIT.2004.834793
[31] Tropp, J. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030-1051. · Zbl 1288.94025 · doi:10.1109/TIT.2005.864420
[32] Umstätter, R., Christensen, N., Hendry, M., Meyer, R., Simha, V., Veitch, J., Vigeland, S. and Woan, G. (2005). LISA source confusion: Identification and characterization of signals. Classical and Quantum Gravity 22 901.
[33] Valdés-Sosa, P., Sánchez-Bornot, J., Lage-Castellanos, A., Vega-Hernández, M., Bosch-Bayard, J., Melie-García, L. and Canales-Rodríguez, E. (2005). Estimating brain functional connectivity with sparse multivariate autoregression. Philos. Trans. Roy. Soc. B: Biological Sciences 360 969-981.
[34] van de Geer, S. (2006). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323 · doi:10.1214/009053607000000929
[35] Wainwright, M. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity. Available at
[36] Yuan, M. and Lin, Y. (2006a). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[37] Yuan, M. and Lin, Y. (2006b). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49-67. · Zbl 1141.62030 · doi:10.1111/j.1467-9868.2005.00532.x
[38] Zhang, C.-H. and Huang, J. (2006). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044 · doi:10.1214/07-AOS520
[39] Zhang, H. and Lu, W. (2007). Adaptive-Lasso for Cox’s proportional hazards model. Biometrika 94 691-703. · Zbl 1135.62083 · doi:10.1093/biomet/asm037
[40] Zhao, P. and Yu, B. (2004). Stagewise Lasso. J. Machine Learning Research 8 2701-2726. · Zbl 1222.68345
[41] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Machine Learning Research 7 2541-2563. · Zbl 1222.62008
[42] Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.