## Sparse recovery in convex hulls via entropy penalization.(English)Zbl 1269.62039

Summary: Let $$(X,Y)$$ be a random couple in $$S\times T$$ with unknown distribution $$P$$ and $(X_1,Y_1),\dots,(X_n,Y_n)$ be i.i.d. copies of $$(X,Y)$$. Denote $$P_n$$ the empirical distribution of $$(X_1,Y_1),\dots, (X_n,Y_n)$$. Let $$h_1,\dots,h_N: S\mapsto[-1,1]$$ be a dictionary that consists of $$N$$ functions. For $$\lambda\in\mathbb{R}^N$$, denote $$f_\lambda: =\sum^N_{j=1}\lambda_jh_j$$. Let $$\ell:T\times\mathbb{R}\mapsto\mathbb{R}$$ be a given loss function and suppose it is convex with respect to the second variable. Let $$(\ell\bullet f)(x,y):=\ell(y; f(x))$$. Finally, let $$\Lambda \subset\mathbb{R}^N$$ be the simplex of all probability distributions on $$\{1, \dots,N\}$$. Consider the following penalized empirical risk minimization problem $\widehat\lambda^\varepsilon:={\underset\lambda\in\Lambda {\text{argmin}}}\left[P_n(\ell\bullet f_\lambda)+\varepsilon \sum^N_{j=1}\lambda_j\log\lambda_j\right],$ along with its distribution dependent version $\lambda^\varepsilon:={\underset\lambda\in\Lambda {\text{argmin}}}\left[P_n(\ell\bullet f_\lambda)+\varepsilon \sum^N_{j=1}\lambda_j\log\lambda_j\right],$ where $$\varepsilon\geq 0$$ is a regularization parameter. It is proved that the “approximate sparsity” of $$\lambda^\varepsilon$$ implies the “approximate sparsity” of $$\widehat\lambda^\varepsilon$$ and the impact of “sparsity” on bounding the excess risk of the empirical solution is explored. Similar results are also discussed in the case of entropy penalized density estimation.

### MSC:

 62G07 Density estimation 62G30 Order statistics; empirical distribution functions 62B10 Statistical aspects of information-theoretic topics 62H99 Multivariate analysis
Full Text:

### References:

 [1] Audibert, J.-Y. (2004). Une approche PAC-bayésienne de la théorie statistique de l’apprentissage. Ph.D. thesis, Univ. Paris 6. [2] Bickel, P., Ritov, Y. and Tsybakov, A. (2008). Simultaneous analysis of LASSO and Dantzig selector. Ann. Statist. · Zbl 1173.62022 · doi:10.1214/08-AOS620 [3] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007a). Sparsity oracle inequalities for the LASSO. Electronic Journal of Statistics 1 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008 [4] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007b). Sparse density estimation with \ell 1 penalties. In Proc. 20th Annual Conference on Learning Theory ( COLT 2007 ) 530-543. Lecture Notes in Artificial Intelligence 4539 . Springer, Berlin. · Zbl 1203.62053 · doi:10.1007/978-3-540-72927-3_38 [5] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n . Ann. Statist. 35 2392-2404. · Zbl 1139.62019 · doi:10.1214/009053606000001523 [6] Catoni, O. (2004). Statistical learning theory and stochastic optimization. In Ecole d’Eté de Probabilités de Saint-Flour XXXI -2001. Lecture Notes in Mathematics 1851 . Springer, New York. · Zbl 1076.93002 · doi:10.1007/b99352 [7] Dalalyan, A. and Tsybakov, A. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Proc. 20th Annual Conference on Learning Theory ( COLT 2007 ) 97-111. Lecture Notes in Artificial Intelligence 4539 . Springer, Berlin. · Zbl 1203.62063 · doi:10.1007/978-3-540-72927-3_9 [8] Donoho, D. L. (2006). For most large underdetermined systems of linear equations the minimal \ell 1 -norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 797-829. · Zbl 1113.15004 · doi:10.1002/cpa.20132 [9] Koltchinskii, V. (2005). Model selection and aggregation in sparse classification problems. Oberwolfach Reports: Meeting on Statistical and Probabilistic Methods of Model Selection, October 2005. [10] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593-2656. · Zbl 1118.62065 · doi:10.1214/009053606000001019 [11] Koltchinskii, V. (2008a). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré Probab. Statist. · Zbl 1168.62044 · doi:10.1214/07-AIHP146 [12] Koltchinskii, V. (2008b). The Dantzig selector and sparsity oracle inequalities. · Zbl 1452.62486 · doi:10.3150/09-BEJ187 [13] Massart, P. (2007). Concentration inequalities and model selection. In Ecole d’ete de Probabilités de Saint-Flour 2003 . Springer, Berlin. · Zbl 1170.60006 · doi:10.1007/978-3-540-48503-2 [14] McAllester, D. A. (1999). Some PAC-Bayesian theorems. Machine Learning 37 355-363. · Zbl 0945.68157 · doi:10.1023/A:1007618624809 [15] Mendelson, S., Pajor, A. and Tomczak-Jaegermann, N. (2007). Reconstruction and subGaussian operators in asymptotic geometric analysis. Geomet. Funct. Anal. 17 1248-1282. · Zbl 1163.46008 · doi:10.1007/s00039-007-0618-7 [16] Rudelson, M. and Vershynin, R. (2005). Geometric approach to error correcting codes and reconstruction of signals. Int. Math. Res. Not. 64 4019-4041. · Zbl 1103.94014 · doi:10.1155/IMRN.2005.4019 [17] van de Geer, S. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323 · doi:10.1214/009053607000000929 [18] Zhang, T. (2001). Regularized Winnow method. In Advances in Neural Information Processing Systems 13 ( NIPS2000 ) (T. K. Leen, T. G. Dietrich and V. Tresp, eds.) 703-709 MIT Press. [19] Zhang, T. (2006a). From epsilon-entropy to KL-complexity: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180-2210. · Zbl 1106.62005 · doi:10.1214/009053606000000704 [20] Zhang, T. (2006b). Information theoretical upper and lower bounds for statistical estimation. IEEE Trans. Inform. Theory 52 1307-1321. · Zbl 1320.94033 · doi:10.1109/TIT.2005.864439
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.