## Fast rates for support vector machines using Gaussian kernels.(English)Zbl 1127.68091

Summary: For binary classification we establish learning rates up to the order of $$n^{-1}$$ for support vector machines with hinge loss and Gaussian RBF kernels. These rates are in terms of two assumptions on the considered distributions: Tsybakov’s noise assumption to establish a small estimation error, and a new geometric noise condition which is used to bound the approximation error. Unlike previously proposed concepts for bounding the approximation error, the geometric noise assumption does not employ any smoothness assumption.

### MSC:

 68T05 Learning and adaptive systems in artificial intelligence 41A46 Approximation by arbitrary nonlinear expressions; widths and entropy 62G20 Asymptotic properties of nonparametric inference 68T10 Pattern recognition, speech recognition
Full Text:

### References:

 [1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337–404. JSTOR: · Zbl 0037.20701 [2] Bartlett, P. L., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities. Ann. Statist. 33 1497–1537. · Zbl 1083.62034 [3] Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification and risk bounds. J. Amer. Statist. Assoc. 101 138–156. · Zbl 1118.62330 [4] Bartlett, P. L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3 463–482. · Zbl 1084.68549 [5] Bennett, C. and Sharpley, R. (1988). Interpolation of Operators . Academic Press, Boston. · Zbl 0647.46057 [6] Berg, C., Christensen, J. P. R. and Ressel, P. (1984). Harmonic Analysis on Semigroups . Springer, New York. · Zbl 0619.43001 [7] Bergh, J. and Löfström, J. (1976). Interpolation Spaces. An Introduction . Springer, Berlin. · Zbl 0344.46071 [8] Birman, M. Š. and Solomjak, M. Z. (1967). Piecewise polynomial approximations of functions of the classes $$W^\alpha_p$$. Mat. Sb. (N.S.) 73 331–355. · Zbl 0173.16001 [9] Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495–500. · Zbl 1001.60021 [10] Carl, B. and Stephani, I. (1990). Entropy , Compactness and the Approximation of Operators . Cambridge Univ. Press. · Zbl 0705.47017 [11] Courant, R. and Hilbert, D. (1953). Methods of Mathematical Physics 1 . Interscience, New York. · Zbl 0051.28802 [12] Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines : And Other Kernel-Based Learning Methods . Cambridge Univ. Press. · Zbl 0994.68074 [13] Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning. Bull. Amer. Math. Soc. (N.S.) 39 1–49 (electronic). · Zbl 0983.68162 [14] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition . Springer, New York. · Zbl 0853.68150 [15] Howse, J., Hush, D. and Scovel, C. (2002). Linking learning strategies and performance for support vector machines. Available at www.c3.lanl.gov/ml/pubs_ml.shtml. [16] Klein, T. (2002). Une inégalité de concentration à gauche pour les processus empiriques. C. R. Math. Acad. Sci. Paris 334 501–504. · Zbl 1003.60024 [17] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Isoperimetry and Processes . Springer, Berlin. · Zbl 0748.60004 [18] Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808–1829. · Zbl 0961.62058 [19] Marron, J. S. (1983). Optimal rates on convergence to Bayes risk in nonparametric discrimination. Ann. Statist. 11 1142–1155. · Zbl 0554.62053 [20] Massart, P. (2000). About the constants in Talagrand’s concentration inequalities for empirical processes. Ann. Probab. 28 863–884. · Zbl 1140.60310 [21] Mendelson, S. (2002). Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48 1977–1991. · Zbl 1061.68128 [22] O’Neil, R. (1963). Convolution operators and $$L(p,q)$$ spaces. Duke Math. J. 30 129–142. · Zbl 0178.47701 [23] Pietsch, A. (1980). Operator Ideals . North-Holland, Amsterdam. · Zbl 0434.47030 [24] Pietsch, A. (1987). Eigenvalues and $$s$$ -Numbers. Geest und Portig, Leipzig. · Zbl 0615.47019 [25] Rio, E. (2001). Inégalités de concentration pour les processus empiriques de classes de parties. Probab. Theory Related Fields 119 163–175. · Zbl 0976.60033 [26] Schölkopf, B. and Smola, A. (2002). Learning with Kernels : Support Vector Machines , Regularization , Optimization and Beyond . MIT Press, Cambridge, MA. [27] Smale, S. and Zhou, D.-X. (2003). Estimating the approximation error in learning theory. Anal. Appl. (Singap.) 1 17–41. · Zbl 1079.68089 [28] Steinwart, I. (2002). On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2 67–93. · Zbl 1009.68143 [29] Steinwart, I. (2002). Support vector machines are universally consistent. J. Complexity 18 768–791. · Zbl 1030.68074 [30] Steinwart, I. (2004). Sparseness of support vector machines. J. Mach. Learn. Res. 4 1071–1105. · Zbl 1094.68082 [31] Steinwart, I. (2005). Consistency of support vector machines and other regularized kernel classifiers. IEEE Trans. Inform. Theory 51 128–142. · Zbl 1304.62090 [32] Steinwart, I., Hush, D. and Scovel, C. (2006). An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. IEEE Trans. Inform. Theory 52 4635–4643. · Zbl 1320.68148 [33] Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22 28–76. · Zbl 0798.60051 [34] Triebel, H. (1978). Interpolation Theory , Function Spaces , Differential Operators . North-Holland, Amsterdam. · Zbl 0387.46033 [35] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166. · Zbl 1105.62353 [36] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes . With Applications to Statistics . Springer, New York. · Zbl 0862.60002 [37] Wu, Q. and Zhou, D.-X. (2003). Analysis of support vector machine classification. Technical report, City Univ. Hong Kong. [38] Yang, Y. (1999). Minimax nonparametric classification. I. Rates of convergence. IEEE Trans. Inform. Theory 45 2271–2284. · Zbl 0962.62026 [39] Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 56–85. · Zbl 1105.62323
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.