## Sparse estimation by exponential weighting.(English)Zbl 1331.62351

Summary: Consider a regression model with fixed design and Gaussian noise where the regression function can potentially be well approximated by a function that admits a sparse representation in a given dictionary. This paper resorts to exponential weights to exploit this underlying sparsity by implementing the principle of sparsity pattern aggregation. This model selection take on sparse estimation allows us to derive sparsity oracle inequalities in several popular frameworks, including ordinary sparsity, fused sparsity and group sparsity. One striking aspect of these theoretical results is that they hold under no condition in the dictionary. Moreover, we describe an efficient implementation of the sparsity pattern aggregation principle that compares favorably to state-of-the-art procedures on some basic numerical examples.

### MSC:

 62J07 Ridge regression; shrinkage estimators (Lasso) 62G08 Nonparametric regression and quantile regression 62F15 Bayesian inference

glmnet
Full Text:

### References:

 [1] Alquier, P. and Lounici, K. (2011). PAC-Bayesian bounds for sparse regression estimation with exponential weights. Electron. J. Stat. 5 127-145. · Zbl 1274.62463 [2] Baraniuk, R., Davenport, M., DeVore, R. and Wakin, M. (2008). A simple proof of the restricted isometry property for random matrices. Constr. Approx. 28 253-263. · Zbl 1177.15015 [3] Bartlett, P. L., Boucheron, S. and Lugosi, G. (2002). Model selection and error estimation. Mach. Learn. 48 85-113. · Zbl 0998.68117 [4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022 [5] Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. ( JEMS ) 3 203-268. · Zbl 1037.62001 [6] Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5 232-253. · Zbl 1220.62095 [7] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674-1697. · Zbl 1209.62065 [8] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $$p$$ is much larger than $$n$$. Ann. Statist. 35 2313-2351. · Zbl 1139.62019 [9] Catoni, O. (1999). Universal aggregation rules with exact bias bounds. Technical report, Laboratoire de Probabilités et Modeles Aléatoires, Preprint 510. · Zbl 0944.90053 [10] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851 . Springer, Berlin. · Zbl 1076.93002 [11] Catoni, O. (2007). Pac-Bayesian Supervised Classification : The Thermodynamics of Statistical Learning. Institute of Mathematical Statistics Lecture Notes-Monograph Series 56 . IMS, Beachwood, OH. · Zbl 1277.62015 [12] Dalalyan, A. S. and Salmon, J. (2011). Sharp oracle inequalities for aggregation of affine estimators. Available at · Zbl 1257.62038 [13] Dalalyan, A. S. and Tsybakov, A. B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory. Lecture Notes in Computer Science 4539 97-111. Springer, Berlin. · Zbl 1203.62063 [14] Dalalyan, A. and Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Mach. Learn. 72 39-61. [15] Dalalyan, A. and Tsybakov, A. B. (2012). Mirror averaging with sparsity priors. Bernoulli 18 914-944. · Zbl 1243.62008 [16] Dalalyan, A. S. and Tsybakov, A. B. (2012). Sparse regression learning by aggregation and Langevin Monte-Carlo. J. Comput. System Sci. 78 1423-1443. · Zbl 1244.62054 [17] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348-1360. · Zbl 1073.62547 [18] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1-22. [19] Gerchinovitz, S. (2011). Prediction of individual sequences and prediction in the statistical framework: Some links around sparse regression and aggregation techniques. Ph.D. thesis, Univ. Paris Sud-Paris XI. [20] Giraud, C. (2007). Mixing least-squares estimators when the variance is unknown. Available at . arXiv:0711.0372 · Zbl 1168.62327 [21] Giraud, C. (2008). Mixing least-squares estimators when the variance is unknown. Bernoulli 14 1089-1107. · Zbl 1168.62327 [22] Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978-2004. · Zbl 1202.62052 [23] Juditsky, A., Rigollet, P. and Tsybakov, A. B. (2008). Learning by mirror averaging. Ann. Statist. 36 2183-2206. · Zbl 1274.62288 [24] Juditsky, A. B., Nazin, A. V., Tsybakov, A. B. and Vayatis, N. (2005). Recursive aggregation of estimators by the mirror descent method with averaging. Probl. Inf. Transm. 41 368-384. · Zbl 1123.62044 [25] Kneip, A. (1994). Ordered linear smoothers. Ann. Statist. 22 835-866. · Zbl 0815.62022 [26] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302-2329. · Zbl 1231.62097 [27] Lecué, G. (2007). Simultaneous adaptation to the margin and to complexity in classification. Ann. Statist. 35 1698-1721. · Zbl 1209.62146 [28] Lecué, G. (2012). Empirical risk minimization is optimal for the Convex aggregation problem. Bernoulli . · Zbl 1457.62115 [29] Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396-3410. · Zbl 1309.94051 [30] Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A. B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164-2204. · Zbl 1306.62156 [31] Lugosi, G. and Wegkamp, M. (2004). Complexity regularization via localized random penalties. Ann. Statist. 32 1679-1697. · Zbl 1045.62060 [32] Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics ( Saint-Flour , 1998). Lecture Notes in Math. 1738 85-277. Springer, Berlin. · Zbl 0998.62033 [33] Rigollet, P. (2012). Kullback-Leibler aggregation and misspecified generalized linear models. Ann. Statist. 40 639-665. · Zbl 1274.62298 [34] Rigollet, P. and Tsybakov, A. B. (2007). Linear and convex aggregation of density estimators. Math. Methods Statist. 16 260-280. · Zbl 1231.62057 [35] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731-771. · Zbl 1215.62043 [36] Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods , 2nd ed. Springer, New York. · Zbl 1096.62003 [37] Rudin, L. I., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Physica D : Nonlinear Phenomena 60 259-268. · Zbl 0780.49028 [38] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91-108. · Zbl 1060.62049 [39] Tsybakov, A. B. (2003). Optimal rates of aggregation. In COLT (B. Schölkopf and M. K. Warmuth, eds.). Lecture Notes in Computer Science 2777 303-313. Springer, Berlin. · Zbl 1208.62073 [40] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation . Springer, New York. · Zbl 1176.62032 [41] Wegkamp, M. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252-273. · Zbl 1019.62037 [42] Yang, Y. (2004). Aggregating regression procedures to improve performance. Bernoulli 10 25-47. · Zbl 1040.62030 [43] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. · Zbl 1141.62030 [44] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894-942. · Zbl 1183.62120
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.