zbMATH — the first resource for mathematics

Oracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert space. (English) Zbl 06870279
Summary: This paper considers the estimation of the sparse additive quantile regression (SAQR) in high-dimensional settings. Given the nonsmooth nature of the quantile loss function and the nonparametric complexities of the component function estimation, it is challenging to analyze the theoretical properties of ultrahigh-dimensional SAQR. We propose a regularized learning approach with a two-fold Lasso-type regularization in a reproducing kernel Hilbert space (RKHS) for SAQR. We establish nonasymptotic oracle inequalities for the excess risk of the proposed estimator without any coherent conditions. If additional assumptions including an extension of the restricted eigenvalue condition are satisfied, the proposed method enjoys sharp oracle rates without the light tail requirement. In particular, the proposed estimator achieves the minimax lower bounds established for sparse additive mean regression. As a by-product, we also establish the concentration inequality for estimating the population mean when the general Lipschitz loss is involved. The practical effectiveness of the new method is demonstrated by competitive numerical results.

62G20 Asymptotic properties of nonparametric inference
62G05 Nonparametric estimation
AS 229; hgam
Full Text: DOI
[1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc.68 337-404. · Zbl 0037.20701
[2] Bach, F., Jenatton, R., Mairal, J. and Obozinski, G. (2012). Convex optimization with sparsity-inducing norms. In Optimization for Machine Learning. MIT Press, Cambridge, MA. · Zbl 1331.90050
[3] Bartlett, P. L., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities. Ann. Statist.33 1497-1537. · Zbl 1083.62034
[4] Bartlett, P. L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res.3 463-482. · Zbl 1084.68549
[5] Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci.2 183-202. · Zbl 1175.94009
[6] Belloni, A. and Chernozhukov, V. (2011). \(ℓ_{1}\)-penalized quantile regression in high-dimensional sparse models. Ann. Statist.39 82-130. · Zbl 1209.62064
[7] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist.37 1705-1732. · Zbl 1173.62022
[8] Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495-500. · Zbl 1001.60021
[9] Breheny, P. and Huang, J. (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat. Comput.25 173-187. · Zbl 1331.62359
[10] Buchinsky, M. (1994). Changes in the U.S. wage structure 1963-1987: Application of quantile regression. Econometrica 62 405-458. · Zbl 0800.90235
[11] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\). Ann. Statist.35 2313-2351. · Zbl 1139.62019
[12] Chatterjee, A. and Lahiri, S. N. (2011). Bootstrapping lasso estimators. J. Amer. Statist. Assoc.106 608-625. · Zbl 1232.62088
[13] Chatterjee, A. and Lahiri, S. N. (2013). Rates of convergence of the adaptive LASSO estimators to the oracle distribution and higher order refinements by the bootstrap. Ann. Statist.41 1232-1259. · Zbl 1293.62153
[14] Christmann, A. and Zhou, D.-X. (2016). Learning rates for the risk of kernel-based quantile regression estimators in additive models. Anal. Appl. (Singap.) 14 449-477. · Zbl 1338.62077
[15] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc.96 1348-1360. · Zbl 1073.62547
[16] He, X. (2009). Modeling and inference by quantile regression. Technical report, Dept. Statistics, Univ. Illinois at Urbana-Champaign.
[17] He, X., Wang, L. and Hong, H. G. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Statist.41 342-369. · Zbl 1295.62053
[18] Horowitz, J. L. and Lee, S. (2005). Nonparametric estimation of an additive quantile regression model. J. Amer. Statist. Assoc.100 1238-1249. · Zbl 1117.62355
[19] Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist.38 2282-2313. · Zbl 1202.62051
[20] Hunter, D. R. and Lange, K. (2000). Quantile regression via an MM algorithm. J. Comput. Graph. Statist.9 60-77.
[21] Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. Amer. Statist.58 30-37.
[22] Kato, K. (2016). Group Lasso for high dimensional sparse quantile regression models. arXiv:1103.1458.
[23] Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38. Cambridge Univ. Press, Cambridge.
[24] Koenker, R. and Bassett, G. Jr. (1978). Regression quantiles. Econometrica 46 33-50. · Zbl 0373.62038
[25] Koenker, R., Roger, W. and D’Orey, V. (1987). Algorithm AS 229: Computing regression quantiles. J. R. Stat. Soc. Ser. C. Appl. Stat.36 383-384.
[26] Koltchinskii, V. and Yuan, M. (2010). Sparsity in multiple kernel learning. Ann. Statist.38 3660-3695. · Zbl 1204.62086
[27] Li, Y., Liu, Y. and Zhu, J. (2007). Quantile regression in reproducing kernel Hilbert spaces. J. Amer. Statist. Assoc.102 255-268. · Zbl 1284.62405
[28] Lian, H. (2012). Semiparametric estimation of additive quantile regression models by two-fold penalty. J. Bus. Econom. Statist.30 337-350.
[29] Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist.34 2272-2297. · Zbl 1106.62041
[30] Lv, J. and Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist.37 3498-3528. · Zbl 1369.62156
[31] Lv, S., He, X. and Wang, J. (2017). A unified penalized method for sparse additive quantile models: An RKHS approach. Ann. Inst. Statist. Math.69 897-923. · Zbl 1447.62044
[32] Lv, S., Lin, H., Lian, H. and Huang, J. (2018). Supplement to “Oracle inequalities for sparse additive quantile regression in reproducing kernel Hilbert space.” DOI:10.1214/17-AOS1567SUPP. · Zbl 06870279
[33] Meier, L., van de Geer, S. and Bühlmann, P. (2009). High-dimensional additive modeling. Ann. Statist.37 3779-3821. · Zbl 1360.62186
[34] Mendelson, S. (2002). Geometric parameters of kernel machines. In Computational Learning Theory (Sydney, 2002). Lecture Notes in Computer Science 2375 29-43. Springer, Berlin. · Zbl 1050.68070
[35] Pearce, N. D. and Wand, M. P. (2006). Penalized splines and reproducing kernel methods. Amer. Statist.60 233-240.
[36] Raskutti, G., Wainwright, M. J. and Yu, B. (2012). Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res.13 389-427. · Zbl 1283.62071
[37] Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol.71 1009-1030. · Zbl 1411.62107
[38] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist.39 731-771. · Zbl 1215.62043
[39] Scholköpf, B. and Smola, A. (2002). Learning with Kernels: Support Vector Machine, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA.
[40] Steinwart, I. and Christmann, A. (2008). Support Vector Machines. Springer, New York. · Zbl 1203.68171
[41] Steinwart, I. and Christmann, A. (2011). Estimating conditional quantiles with the help of the pinball loss. Bernoulli 17 211-225. · Zbl 1284.62235
[42] Suzuki, T. and Sugiyama, M. (2013). Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness. Ann. Statist.41 1381-1405. · Zbl 1273.62090
[43] Tarigan, B. and van de Geer, S. A. (2006). Classifiers of support vector machine type with \(l_{1}\) complexity regularization. Bernoulli 12 1045-1076. · Zbl 1118.62067
[44] The Cancer Genome Atlas Network (2012). Comprehensive molecular portraits of human breast tumours. Nature 490 61-70.
[45] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol.58 267-288. · Zbl 0850.62538
[46] Tseng, P. and Yun, S. (2009). A coordinate gradient descent method for nonsmooth separable minimization. Math. Program.117 387-423. · Zbl 1166.90016
[47] van de Geer, S. (2002). Empirical Processes in M-Estimation. Cambridge Univ. Press, Cambridge. · Zbl 1030.62026
[48] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist.36 614-645. · Zbl 1138.62323
[49] Wang, L., Wu, Y. and Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Amer. Statist. Assoc.107 214-222. · Zbl 1328.62468
[50] Wei, F., Huang, J. and Li, H. (2011). Variable selection and estimation in high-dimensional varying-coefficient models. Statist. Sinica 21 1515-1540. · Zbl 1225.62056
[51] Wu, Y. and Liu, Y. (2009). Variable selection in quantile regression. Statist. Sinica 19 801-817. · Zbl 1166.62012
[52] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol.68 49-67. · Zbl 1141.62030
[53] Zhang, X., Wu, Y., Wang, L. and Li, R. (2016). Variable selection for support vector machines in moderately high dimensions. J. R. Stat. Soc. Ser. B. Stat. Methodol.78 53-76. · Zbl 1411.62176
[54] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res.7 2541-2563. · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.