Exponential screening and optimal rates of sparse estimation. (English) Zbl 1215.62043

Summary: In high-dimensional linear regression, the goal pursued here is to estimate an unknown regression function using linear combinations of a suitable set of covariates. One of the key assumptions for the success of any statistical procedure in this setup is to assume that the linear combination is sparse in some sense, for example, that it involves only few covariates. We consider general, not necessarily linear, regression with Gaussian noise and study a related question, that is, to find a linear combination of approximating functions, which is at the same time sparse and has small mean squared error (MSE).
We introduce a new estimation procedure, called exponential screening, that shows remarkable adaptation properties. It adapts to the linear combination that optimally balances MSE and sparsity, whether the latter is measured in terms of the number of nonzero entries in the combination (\(\ell_0\) norm) or in terms of the global weight of the combination (\(\ell_1\) norm). The power of this adaptation result is illustrated by showing that exponential screening solves optimally and simultaneously all the problems of aggregation in Gaussian regression that have been discussed in the literature. Moreover, we show that the performance of the exponential screening estimator cannot be improved in a minimax sense, even if the optimal sparsity is known in advance. The theoretical and numerical superiority of exponential screening compared to state-of-the-art sparse procedures is also discussed.


62G08 Nonparametric regression and quantile regression
62J05 Linear regression; mixed models
62C20 Minimax procedures in statistical decision theory
62G20 Asymptotic properties of nonparametric inference
62H12 Estimation in multivariate analysis
65C60 Computational problems in statistics (MSC2010)
62G05 Nonparametric estimation
Full Text: DOI arXiv


[1] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584-653. · Zbl 1092.62005
[2] Alquier, P. and Lounici, K. (2010). PAC-Bayesian bounds for sparse regression estimation with exponential weights. HAL. · Zbl 1274.62463
[3] Baraniuk, R., Davenport, M., DeVore, R. and Wakin, M. (2008). A simple proof of the restricted isometry property for random matrices. Constr. Approx. 28 253-263. · Zbl 1177.15015
[4] Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39 930-945. · Zbl 0818.68126
[5] Bickel, P., Ritov, Y. and Tsybakov, A. (2008). Hierarchical selection of variables in sparse high-dimensional regression. Available at .
[6] Bickel, P. J. and Doksum, K. A. (2006). Mathematical Statistics: Basic Ideas and Selected Topics, Vol. 1 , 2nd ed. Prentice-Hall, Upper Saddle River, NJ. · Zbl 0403.62001
[7] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022
[8] Bühlmann, P. and van de Geer, S. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Statist. 3 1360-1392. · Zbl 1327.62425
[9] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007a). Sparsity oracle inequalities for the Lasso. Electron. J. Statist. 1 169-194 (electronic). · Zbl 1146.62028
[10] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007b). Aggregation for Gaussian regression. Ann. Statist. 35 1674-1697. · Zbl 1209.62065
[11] Candes, E. (2008). The restricted isometry property and its implications for compressed sensing. C. R. Math. Acad. Sci. Paris 346 589-592. · Zbl 1153.94002
[12] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313-2351. · Zbl 1139.62019
[13] Dalalyan, A. and Tsybakov, A. (2008). Aggregation by exponential weighting, sharp pac-Bayesian bounds and sparsity. Machine Learning 72 39-61.
[14] Dalalyan, A. and Tsybakov, A. B. (2010). Mirror averaging with sparsity priors. Available at . · Zbl 1243.62008
[15] Dalalyan, A. S. and Tsybakov, A. B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory. Lecture Notes in Computer Science 4539 97-111. Springer, Berlin. · Zbl 1203.62063
[16] Dalalyan, A. S. and Tsybakov, A. B. (2009). Sparse regression learning by aggregation and Langevin Monte Carlo. Available at . · Zbl 1244.62054
[17] Donoho, D. L. and Johnstone, I. M. (1994a). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425-455. JSTOR: · Zbl 0815.62019
[18] Donoho, D. L. and Johnstone, I. M. (1994b). Minimax risk over l p -balls for l q -error. Probab. Theory Related Fields 99 277-303. · Zbl 0802.62006
[19] Donoho, D. L., Johnstone, I. M., Hoch, J. C. and Stern, A. S. (1992). Maximum entropy and the nearly black object. J. Roy. Statist. Soc. Ser. B 54 41-81. With discussion and a reply by the authors. JSTOR: · Zbl 0788.62103
[20] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. JSTOR: · Zbl 1073.62547
[21] Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947-1975. · Zbl 0829.62066
[22] George, E. I. (1986a). Combining minimax shrinkage estimators. J. Amer. Statist. Assoc. 81 437-445. JSTOR: · Zbl 0594.62061
[23] George, E. I. (1986b). Minimax multiple shrinkage estimation. Ann. Statist. 14 188-205. · Zbl 0602.62041
[24] Giraud, C. (2008). Mixing least-squares estimators when the variance is unknown. Bernoulli 14 1089-1107. · Zbl 1168.62327
[25] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction . Springer, New York. · Zbl 0973.62007
[26] Koh, K., Kim, S.-J. and Boyd, S. (2008). l1_ls: A Matlab solver for l1-regularized least squares problems. BETA version. Stanford Univ. Available at .
[27] Koltchinskii, V. (2010). Oracle inequalities in empirical risk minimization and sparse recovery problems. St. Flour Lecture Notes .
[28] Koltchinskii, V. (2009a). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799-828. · Zbl 1452.62486
[29] Koltchinskii, V. (2009b). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré Probab. Statist. 45 7-57. · Zbl 1168.62044
[30] LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. and Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems 396-404. Morgan Kaufmann, San Francisco.
[31] Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396-3410. · Zbl 1309.94051
[32] Lounici, K. (2007). Generalized mirror averaging and D -convex aggregation. Math. Methods Statist. 16 246-259. · Zbl 1231.62046
[33] Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85-277. Springer, Berlin. · Zbl 0998.62033
[34] Raskutti, G., Wainwright, M. J. and Yu, B. (2009). Minimax rates of estimation for high-dimensional linear regression over \ell q -balls. Available at . · Zbl 1365.62276
[35] Reynaud-Bouret, P. (2003). Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields 126 103-153. · Zbl 1019.62079
[36] Rigollet, P. (2009). Maximum likelihood aggregation and misspecified generalized linear models. Technical report. Available at .
[37] Rigollet, P. and Tsybakov, A. (2010). Exponential screening and optimal rates of sparse estimation. Available at . · Zbl 1215.62043
[38] Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods . Springer, New York. · Zbl 1096.62003
[39] Tsybakov, A. B. (2003). Optimal rates of aggregation. In COLT (B. Schölkopf and M. K. Warmuth, eds.). Lecture Notes in Computer Science 2777 303-313. Springer, Berlin. · Zbl 1208.62073
[40] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation . Springer, New York. · Zbl 1176.62032
[41] van de Geer, S. A. (2008). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323
[42] Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S. and Ma, Y. (2009). Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31 210-227.
[43] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894-942. · Zbl 1183.62120
[44] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567-1594. · Zbl 1142.62044
[45] Zhang, C.-H. and Melnik, O. (2009). Plus: Penalized linear unbiased selection. R package version 0.8. CRAN. Available at .
[46] Zhang, T. (2009). Some sharp performance bounds for least squares regression with L 1 regularization. Ann. Statist. 37 2109-2144. · Zbl 1173.62029
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.