## The Dantzig selector: statistical estimation when $$p$$ is much larger than $$n$$. (With discussions and rejoinder).(English)Zbl 1139.62019

Summary: In many important statistical applications, the number of variables or parameters $$p$$ is much larger than the number of observations $$n$$. Suppose then that we have observations $$y=X\beta+z$$, where $$\beta\in\mathbb R^p$$ is a parameter vector of interest, $$X$$ is a data matrix with possibly far fewer rows than columns, $$n\ll p$$, and the $$z_i$$’s are i.i.d. $$N(0,\sigma^2)$$. Is it possible to estimate $$\beta$$ reliably based on the noisy data $$y$$? To estimate $$\beta$$, we introduce a new estimator – we call it the Dantzig selector – which is a solution to the $$\ell_1$$-regularization problem
$\min_{\widetilde{\beta}\in\mathbb R^p} \|\widetilde{\beta}\|_{\ell_1} \quad\text{subject to}\quad \|X^*r\|_{\ell_\infty}\leq (1+t^{-1}) \sqrt{2\log p}\cdot\sigma,$
where $$r$$ is the residual vector $$y-X\widetilde{\beta}$$ and $$t$$ is a positive scalar. We show that if $$X$$ obeys a uniform uncertainty principle (with unit-normed columns) and if the true parameter vector $$\beta$$ is sufficiently sparse (which here roughly guarantees that the model is identifiable), then with very large probability,
$\|\widehat{\beta}-\beta\|_{\ell_2}^2\leq C^2\cdot 2\log p\cdot \Bigl(\sigma^2+ \sum_i \min(\beta_i^2,\sigma^2)\Bigr).$
Our results are nonasymptotic and we give values for the constant $$C$$. Even though $$n$$ may be much smaller than $$p$$, our estimator achieves a loss within a logarithmic factor of the ideal mean squared error one would achieve with an oracle which would supply perfect information about which coordinates are nonzero, and which were above the noise level.
In multivariate regression and from a model selection viewpoint, our result says that it is possible nearly to select the best subset of variables by solving a very simple convex program, which, in fact, can easily be recast as a convenient linear program.

### MSC:

 62G08 Nonparametric regression and quantile regression 62G05 Nonparametric estimation 94A08 Image processing (compression, reconstruction, etc.) in information and communication theory 94A12 Signal theory (characterization, reconstruction, filtering, etc.) 90C05 Linear programming 62C05 General considerations in statistical decision theory

PDCO; lars
Full Text:

### References:

  Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control 19 716-723. · Zbl 0314.62039  Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations (with discussion). J. Amer. Statist. Assoc. 96 939-967. · Zbl 1072.62561  Baraud, Y. (2000). Model selection for regression on a fixed design. Probab. Theory Related Fields 117 467-493. · Zbl 0997.62027  Barron, A. R., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301-413. · Zbl 0946.62036  Barron, A. R. and Cover, T. M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034-1054. · Zbl 0743.62003  Birgé, L. and Massart, P. (1997). From model selection to adaptive estimation. In Festschrift for Lucien Le Cam (D. Pollard, E. Torgersen and G. L. Yang, eds.) 55-87. Springer, New York. · Zbl 0920.62042  Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3 203-268. · Zbl 1037.62001  Boyd, S. and Vandenberghe L. (2004). Convex Optimization . Cambridge Univ. Press. · Zbl 1058.90049  Candès, E. J. and Romberg, J. (2005). Practical signal recovery from random projections. In Computational Imaging III : Proc. SPIE International Symposium on Electronic Imaging 1 76-86. San Jose, CA.  Candès, E. J., Romberg, J. and Tao, T. (2006). Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59 1207-1223. · Zbl 1098.94009  Candès, E. J., Romberg, J. and Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52 489-509. · Zbl 1231.94017  Candès, E. J., Rudelson, M., Vershynin, R. and Tao, T. (2005). Error correction via linear programming. In Proc. 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS) 295-308. IEEE, Los Alamitos, CA.  Candès, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203-4215. · Zbl 1264.94121  Candès, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406-5425. · Zbl 1309.94033  Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33-61. · Zbl 0919.94002  Daniel, B. L., Yen, Y. F., Glover, G. H. et al. (1998). Breast disease: Dynamic spiral MR imaging. Radiology 209 499-509.  Daubechies, I. (2005). Personal communication.  Donoho, D. L. (2006). For most large underdetermined systems of linear equations the minimal $$\ell_1$$-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 797-829. · Zbl 1113.15004  Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289-1306. · Zbl 1288.94016  Donoho, D. L. and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47 2845-2862. · Zbl 1019.94503  Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425-455. · Zbl 0815.62019  Donoho, D. L. and Johnstone, I. M. (1994). Ideal denoising in an orthonormal basis chosen from a library of bases. C. R. Acad. Sci. Paris Sér. I Math. 319 1317-1322. · Zbl 0819.94007  Donoho, D. L. and Johnstone, I. M. (1995). Empirical atomic decomposition. Unpublished manuscript.  Elad, M. and Bruckstein, A. M. (2002). A generalized uncertainty principle and sparse representation in pairs of bases. IEEE Trans. Inform. Theory 48 2558-2567. · Zbl 1062.15001  Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928-961. · Zbl 1092.62031  Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947-1975. · Zbl 0829.62066  Fuchs, J. (2004). On sparse representations in arbitrary redundant bases. IEEE Trans. Inform. Theory 50 1341-1344. · Zbl 1284.94018  Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971-988. · Zbl 1055.62078  Haupt, J. and Nowak, R. (2006). Signal reconstruction from noisy random projections. IEEE Trans. Inform. Theory 52 4036-4048. · Zbl 1323.94046  Kettenring, J., Lindsay, B. and Siegmund, D., eds. (2003). Statistics: Challenges and opportunities for the twenty-first century. NSF report. Available at www.pnl.gov/scales/docs/nsf_report.pdf.  Mallows, C. L. (1973). Some comments on $$C_P$$. Technometrics 15 661-675. · Zbl 0269.62061  Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM J. Comput. 24 227-234. · Zbl 0827.68054  Peters, D. C., Korosec, F. R., Grist, T. M., Block, W. F., Holden, J. E., Vigen, K. K. and Mistretta, C. A. (2000). Undersampled projection reconstruction applied to MR angiography. Magnetic Resonance in Medicine 43 91-101.  Rudin, L. I., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D 60 259-268. · Zbl 0780.49028  Sardy, S., Bruce, A. G. and Tseng, P. (2000). Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Statist. 9 361-379.  Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005  Szarek, S. J. (1991). Condition numbers of random matrices. J. Complexity 7 131-149. · Zbl 0760.15018  Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.