High-dimensional generalized linear models and the lasso. (English) Zbl 1138.62323

Summary: We consider high-dimensional generalized linear models with Lipschitz loss functions, and prove a nonasymptotic oracle inequality for the empirical risk minimizer with lasso penalty. The penalty is based on the coefficients in the linear predictor, after normalization with the empirical norm. The examples include logistic regression, density estimation and classification with hinge loss. Least squares regression is also discussed.


62G08 Nonparametric regression and quantile regression
62J12 Generalized linear models (logistic models)
62G07 Density estimation


Full Text: DOI arXiv


[1] Bousquet, O. (2002). A Bennet concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495-550. · Zbl 1001.60021
[2] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via \ell 1 -penalized least squares. In Proceedings of 19th Annual Conference on Learning Theory, COLT 2006. Lecture Notes in Comput. Sci. 4005 379-391. Springer, Berlin. · Zbl 1143.62319
[3] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007a). Sparse density estimation with \ell 1 penalties. COLT 2007 4539 530-543. · Zbl 1146.62028
[4] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007b). Sparsity oracle inequalities for the Lasso. Electron. J. Statist . 1 169-194. · Zbl 1146.62028
[5] Dahinden, C., Parmigiani, G., Emerick, M. C. and Bühlmann, P. (2008). Penalized likelihood for sparse contingency tables with an application to full length cDNA libraries. BMC Bioinformatics .
[6] Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41 613-627. · Zbl 0820.62002
[7] Donoho, D. L. (2006a). For most large underdetermined systems of equations, the minimal \ell 1 -norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 59 907-934. · Zbl 1105.90068
[8] Donoho, D. L. (2006b). For most large underdetermined systems of linear equations, the minimal \ell 1 -norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 797-829. · Zbl 1113.15004
[9] Greenshtein, E. (2006). Best subset selection, persistence in high dimensional statistical learning and optimization under \ell 1 constraint. Ann. Statist. 34 2367-2386. · Zbl 1106.62022
[10] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Data Mining, Inference and Prediction . Springer, New York. · Zbl 0973.62007
[11] Ledoux, M. (1996). Talagrand deviation inequalities for product measures. ESAIM Probab. Statist. 1 63-87. · Zbl 0869.60013
[12] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces : Isoperimetry and Processes . Springer, Berlin. · Zbl 0748.60004
[13] Loubes, J.-M. and van de Geer, S. (2002). Adaptive estimation in regression, using soft thresholding type penalties. Statist. Neerl. 56 453-478.
[14] Massart, P. (2000a). About the constants in Talagrand’s concentration inequalities for empirical processes. Ann. Probab. 28 863-884. · Zbl 1140.60310
[15] Massart, P. (2000b). Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse Math. (6) 9 245-303. · Zbl 0986.62002
[16] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. Research Report 131, ETH Zürich. J. Roy. Statist. Soc. Ser. B. · Zbl 1400.62276
[17] Meinshausen, N. (2007). Relaxed Lasso. Comput. Statist. Data Anal. 52 374-393. · Zbl 1452.62522
[18] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082
[19] Meinshausen, N. and Yu, B. (2007). Lasso-type recovery of sparse representations for high-dimensional data. Technical Report 720, Dept. Statistics, UC Berkeley. · Zbl 1155.62050
[20] Rockafeller, R. T. (1970). Convex Analysis . Princeton Univ. Press. · Zbl 0193.18401
[21] Tarigan, B. and van de Geer, S. A. (2006). Classifiers of support vector machine type with \ell 1 complexity regularization. Bernoulli 12 1045-1076. · Zbl 1118.62067
[22] Tibshirani, R. (1996). Regression analysis and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538
[23] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135-166. · Zbl 1105.62353
[24] van de Geer, S. (2000). Empirical Processes in M-Estimation . Cambridge Univ. Press. · Zbl 1179.62073
[25] van de Geer, S. (2003). Adaptive quantile regression. In Recent Advances and Trends in Nonparametric Statistics (M. G. Akritas and D. N. Politis, eds.) 235-250. North-Holland, Amsterdam.
[26] Zhang, C.-H. and Huang, J. (2006). Model-selection consistency of the Lasso in high-dimensional linear regression. Technical Report 2006-003, Dept. Statistics, Rutgers Univ.
[27] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.