×

zbMATH — the first resource for mathematics

Optimal regression rates for SVMs using Gaussian kernels. (English) Zbl 1337.62073
Summary: Support vector machines (SVMs) using Gaussian kernels are one of the standard and state-of-the-art learning algorithms. In this work, we establish new oracle inequalities for such SVMs when applied to either least squares or conditional quantile regression. With the help of these oracle inequalities we then derive learning rates that are (essentially) minmax optimal under standard smoothness assumptions on the target function. We further utilize the oracle inequalities to show that these learning rates can be adaptively achieved by a simple data-dependent parameter selection method that splits the data set into a training and a validation set.

MSC:
62G08 Nonparametric regression and quantile regression
62G05 Nonparametric estimation
68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI Euclid
References:
[1] Adams, R. A. and Fournier, J. J. F. (2003)., Sobolev Spaces , 2nd ed. Academic Press, New York. · Zbl 1098.46001
[2] Aronszajn, N. (1950). Theory of reproducing kernels., Trans. Amer. Math. Soc. 68 337-404. · Zbl 0037.20701
[3] Berens, H. and Vore, R. D. (1978). Quantitative Korovkin Theorems for Positive Linear Operators on \(L_p\)- Spaces., Trans. Amer. Math. Soc. 245 349-361. · Zbl 0397.41010
[4] Berlinet, A. and Thomas-Agnan, C. (2004)., Reproducing Kernel Hilbert Spaces in Probability and Statistics . Kluwer, Boston. · Zbl 1145.62002
[5] Caponnetto, A. and De Vito, E. (2007). Optimal rates for regularized least squares algorithm., Found. Comput. Math. 7 331-368. · Zbl 1129.68058
[6] Carl, B. and Stephani, I. (1990)., Entropy, Compactness, and the Approximation of Operators . Cambridge Tracts in Mathematics . Cambridge University Press, Cambridge. · Zbl 0705.47017
[7] Chaudhuri, P. (1991). Global nonparametric estimation of conditional quantile functions and their derivatives., J. Multivariate Anal. 39 246-269. · Zbl 0739.62028
[8] Chen, D.-R., Wu, Q., Ying, Y. and Zhou, D.-X. (2004). Support vector machine soft margin classifiers: Error analysis., J. Mach. Learn. Res. 5 1143-1175. · Zbl 1222.68167
[9] Cristianini, N. and Shawe-Taylor, J. (2000)., An Introduction to Support Vector Machines . Cambridge University Press, Cambridge. · Zbl 0994.68074
[10] Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning., Bull. Amer. Math. Soc. 39 1-49. · Zbl 0983.68162
[11] De Vito, E., Caponnetto, A. and Rosasco, L. (2005). Model selection for regularized least-squares algorithm in learning theory., Found. Comput. Math. 5 59-85. · Zbl 1083.68106
[12] DeVore, R. A. and Lorentz, G. G. (1993)., Constructive approximation . Grundlehren Der Mathematischen Wissenschaften . Springer-Verlag, Berlin. · Zbl 0797.41016
[13] DeVore, R. A. and Popov, V. A. (1988). Interpolation of Besov Spaces., Trans. Amer. Math. Soc. 305 397-414. · Zbl 0646.46030
[14] Eberts, M. and Steinwart, I. (2011). Optimal learning rates for least squares SVMs using Gaussian kernels. In, Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira and K. Q. Weinberger, eds.) 1539-1547.
[15] Edmunds, D. E. and Triebel, H. (1996)., Function Spaces, Entropy Numbers, Differential Operators . Cambridge University Press, Cambridge. · Zbl 0865.46020
[16] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002)., A Distribution-Free Theory of Nonparametric Regression . Springer, New York. · Zbl 1021.62024
[17] Johnen, H. and Scherer, K. (1976). On the equivalence of the K-functional and moduli of continuity and some applications. In, Lecture Notes in Math. , 571 119-140. Springer-Verlag, Berlin. · Zbl 0348.26005
[18] Keerthi, S. S. and Shevade, S. K. (2003). SMO Algorithm for Least Squares SVM Formulations., Neural Computation 15 487-507. · Zbl 1047.68100
[19] Koenker, R. (2005)., Quantile Regression , 1st ed. Econometric Society Monographs . Cambridge University Press. · Zbl 1111.62037
[20] Lee, S. S. (2003). Efficient Semiparametric estimation of a partially linear quantile regression model., Econometric Theory 19 1-31. · Zbl 1031.62034
[21] Li, Y., Liu, Y. and Zhu, J. (2007). Quantile Regression in Reproducing Kernel Hilbert Spaces., J. Amer. Statist. Assoc. 102 255-268. · Zbl 1284.62405
[22] Mendelson, S. and Neeman, J. (2010). Regularization in Kernel Learning., Ann. Statist. 38 526-565. · Zbl 1191.68356
[23] Micchelli, C. A., Pontil, M., Wu, Q. and Zhou, D. X. (2005). Error bounds for learning the kernel., . · Zbl 1392.68357
[24] Schölkopf, B. and Smola, A. J. (2002)., Learning with Kernels . MIT Press, Cambridge, MA. · Zbl 1019.68094
[25] Shen, X. (1998). On the Method of Penalization., Statist. Sinica 8 337-357. · Zbl 0907.62046
[26] Smale, S. and Zhou, D. X. (2003). Estimating the approximation error in learning theory., Anal. Appl. 1 17-41. · Zbl 1079.68089
[27] Smale, S. and Zhou, D. X. (2007). Learning theory estimates via integral operators and their approximations., Constr. Approx. 26 153-172. · Zbl 1127.68088
[28] Stein, E. M. (1970)., Singular integrals and differentiability properties of functions . Princeton University Press, Princeton, NJ. · Zbl 0207.13501
[29] Steinwart, I. and Christmann, A. (2008a). How SVMs can estimate quantiles and the median. In, Advances in Neural Information Processing Systems 20 (J. C. Platt, D. Koller, Y. Singer and S. Roweis, eds.) 305-312. MIT Press, Cambridge, MA.
[30] Steinwart, I. and Christmann, A. (2008b)., Support Vector Machines . Springer, New York.
[31] Steinwart, I. and Christmann, A. (2011). Estimating Conditional Quantiles with the Help of the Pinball Loss., Bernoulli 17 211-225. · Zbl 1284.62235
[32] Steinwart, I., Hush, D. and Scovel, C. (2009). Optimal Rates for Regularized Least Squares Regression. In, Proceedings of the 22nd Annual Conference on Learning Theory (S. Dasgupta and A. Klivans, eds.) 79-93. · Zbl 1143.68562
[33] Steinwart, I. and Scovel, C. (2007). Fast rates for support vector machines using Gaussian kernels., Ann. Statist. 35 575-607. · Zbl 1127.68091
[34] Suzuki, T. (2011). Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning. In, Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira and K. Q. Weinberger, eds.) 1575-1583.
[35] Takeuchi, I., Le, Q. V., Sears, T. D. and Smola, A. J. (2006). Nonparametric Quantile Estimation., J. Mach. Learn. Res. 7 1231-1264. · Zbl 1222.68316
[36] Temlyakov, V. (2006). Optimal estimators in learning theory., Banach Center Publications, Inst. Math. Polish Academy of Sciences 72 341-366. · Zbl 1102.62039
[37] Triebel, H. (2006)., Theory of function spaces III . Birkhäuser, Basel [u.a.]. · Zbl 1104.46001
[38] Triebel, H. (2010)., Theory of Function Spaces , Repr. of the 1983 ed. Birkhäuser, Basel. · Zbl 1235.46002
[39] Wu, Q., Ying, Y. and Zhou, D.-X. (2006). Learning Rates of Least-Square Regularized Regression., Found. Comput. Math. 6 171-192. · Zbl 1100.68100
[40] Xiang, D. H. and Zhou, D. X. (2009). Classification with Gaussians and Convex Loss., J. Mach. Learn. Res. 10 1447-1468. · Zbl 1235.68207
[41] Ye, G.-B. and Zhou, D.-X. (2008). Learning and approximation by Gaussians on Riemannian manifolds., Adv. Comput. Math. 29 291-310. · Zbl 1156.68045
[42] Ying, Y. and Campbell, C. (2009). Generalization bounds for learning the kernel. In, Proceedings of the 22nd Annual Conference on Learning Theory (S. Dasgupta and A. Klivans, eds.).
[43] Ying, Y. and Zhou, D.-X. (2007). Learnability of Gaussians with Flexible Variances., J. Mach. Learn. Res. 8 249-276. · Zbl 1222.68339
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.