Quasi-likelihood and/or robust estimation in high dimensions. (English) Zbl 1331.62354

Summary: We consider the theory for the high-dimensional generalized linear model with the Lasso. After a short review on theoretical results in literature, we present an extension of the oracle results to the case of quasi-likelihood loss. We prove bounds for the prediction error and \(\ell_{1}\)-error. The results are derived under fourth moment conditions on the error distribution. The case of robust loss is also given. We moreover show that under an irrepresentable condition, the \(\ell_{1}\)-penalized quasi-likelihood estimator has no false positives.


62J07 Ridge regression; shrinkage estimators (Lasso)
62J12 Generalized linear models (logistic models)


Full Text: DOI arXiv Euclid


[1] Bertsimas, D. and Tsitsiklis, J. N. (1997). Introduction to Linear Optimization . Athena Scientific, Belmont, MA.
[2] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022
[3] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-dimensional Data : Methods , Theory and Applications . Springer, Heidelberg. · Zbl 1273.62015
[4] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via \(\ell_{1}\)-penalized least squares. In Proceedings of 19 th Annual Conference on Learning Theory , COLT 2006. Lecture Notes in Artificial Intelligence 4005 379-391. Springer, Berlin. · Zbl 1143.62319
[5] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007a). Aggregation for Gaussian regression. Ann. Statist. 35 1674-1697. · Zbl 1209.62065
[6] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007b). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169-194. · Zbl 1146.62028
[7] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007c). Sparse density estimation with \(\ell_{1}\) penalties. In Learning Theory. Lecture Notes in Computer Science 4539 530-543. Springer, Berlin. · Zbl 1203.62053
[8] Candès, E., Li, X., Ma, Y. and Wright, J. (2009). Robust principal component analysis? Journal of the Association for Computing Machinery 58 1-37. · Zbl 1327.62369
[9] Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Trans. Inform. Theory 41 613-627. · Zbl 0820.62002
[10] Fan, J. (1997). Comments on “Wavelets in statistics: A review,” by A. Antoniadis. J. Amer. Statist. Assoc. 6 131-138.
[11] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularized paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 1-22.
[12] Hein, M. and Buehler, T. (2010). An inverse power method for nonlinear eigenproblems with applications in 1-spectral clustering and sparse PCA. In Adv. Neural Inf. Process. Syst. , NIPS 2010 23 847-855. MIT Press, Cambridge, MA.
[13] Juditsky, A. and Nemirovski, A. (2011). Accuracy guarantees for \(\ell_{1}\)-recovery. IEEE Trans. Inform. Theory 57 7818-7839. · Zbl 1365.94077
[14] Koltchinskii, V. (2009a). Sparsity in penalized empirical risk minimization. Ann. Inst. Henri Poincaré Probab. Stat. 45 7-57. · Zbl 1168.62044
[15] Koltchinskii, V. (2009b). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799-828. · Zbl 1452.62486
[16] Lambert-Lacroix, S. and Zwald, L. (2011). Robust regression through the Huber’s criterion and adaptive lasso penalty. Electron. J. Stat. 5 1015-1053. · Zbl 1274.62467
[17] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [ Results in Mathematics and Related Areas (3)] 23 . Springer, Berlin. · Zbl 0748.60004
[18] Loubes, J. M. and van de Geer, S. (2002). Adaptive estimation in regression, using soft thresholding type penalties. Statistica Neerlandica 56 453-478. · Zbl 1090.62534
[19] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90-102. · Zbl 1306.62155
[20] Massart, P. (2000). About the constants in Talagrand’s concentration inequalities for empirical processes. Ann. Probab. 28 863-884. · Zbl 1140.60310
[21] McCullagh, P. and Nelder, J. A. (1983). Generalized Linear Models . Chapman & Hall, London. · Zbl 0588.62104
[22] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436-1462. · Zbl 1113.62082
[23] Negahban, S., Ravikumar, P., Wainwright, M. and Yu, B. (2012). A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Statist. Sci. 27 538-557. · Zbl 1331.62350
[24] Schelldorfer, J., Bühlmann, P. and van de Geer, S. (2011). Estimation for high-dimensional linear mixed-effects models using \(\ell_{1}\)-penalization. Scand. J. Stat. 38 197-214. · Zbl 1246.62161
[25] Städler, N., Bühlmann, P. and van de Geer, S. (2010). \(\ell_{1}\)-penalization for mixture regression models. TEST 19 209-256. · Zbl 1203.62128
[26] Tarigan, B. and van de Geer, S. A. (2006). Classifiers of support vector machine type with \(l_{1}\) complexity regularization. Bernoulli 12 1045-1076. · Zbl 1118.62067
[27] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[28] van de Geer, S. (2001). Least squares estimation with complexity penalties. Math. Methods Statist. 10 355-374. · Zbl 1005.62043
[29] van de Geer, S. A. (2003). Adaptive quantile regression. In Recent Advances and Trends in Nonparametric Statistics (M. G. Akritas and D. N. Politis, eds.) 235-250. Elsevier, Amsterdam.
[30] van de Geer, S. A. (2007). The deterministic Lasso. In JSM Proceedings , 2007 140. American Statistical Association, Alexandria.
[31] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614-645. · Zbl 1138.62323
[32] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360-1392. · Zbl 1327.62425
[33] van de Geer, S., Bühlmann, P. and Zhou, S. (2011). The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron. J. Stat. 5 688-749. · Zbl 1274.62471
[34] van de Geer, S. and Lederer, J. (2012). The Lasso, correlated design, and improved oracle inequalities. In IMS Collections : A Festschrift in Honor of Jon Wellner 9 . IMS. · Zbl 1327.62426
[35] van de Geer, S. and Müller, P. (2012). Supplement to “Quasi-likelihood and/or robust estimation in high dimensions.” . · Zbl 1331.62354
[36] Wang, H., Li, G. and Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J. Bus. Econom. Statist. 25 347-355.
[37] Wu, Y. and Liu, Y. (2009). Variable selection in quantile regression. Statist. Sinica 19 801-817. · Zbl 1166.62012
[38] Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E. and Lange, K. (2009). Genomewide association analysis by Lasso penalized logistic regression. Bioinformatics 25 714-721.
[39] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563. · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.