×

zbMATH — the first resource for mathematics

Multi-kernel regularized classifiers. (English) Zbl 1171.65043
Summary: A family of classification algorithms generated from Tikhonov regularization schemes are considered. They involve multi-kernel spaces and general convex loss functions. Our main purpose is to provide satisfactory estimates for the excess misclassification error of these multi-kernel regularized classifiers when the loss functions achieve the zero value. The error analysis consists of two parts: regularization error and sample error. Allowing multi-kernels in the algorithm improves the regularization error and approximation error, which is one advantage of the multi-kernel setting.
For a general loss function, we show how to bound the regularization error by the approximation in some weighted \(L^{q}\) spaces. For the sample error, we use a projection operator. The projection in connection with the decay of the regularization error enables us to improve convergence rates in the literature even for the one-kernel schemes and special loss functions: least-square loss and hinge loss for support vector machine soft margin classifiers. Existence of the optimization problem for the regularization scheme associated with multi-kernels is verified when the kernel functions are continuous with respect to the index set. Concrete examples, including Gaussian kernels with flexible variances and probability distributions with some noise conditions, are used to illustrate the general theory.

MSC:
65J20 Numerical solutions of ill-posed problems in abstract spaces; regularization
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aronszajn, N., Theory of reproducing kernels, Trans. amer. math. soc., 68, 337-404, (1950) · Zbl 0037.20701
[2] Bartlett, P.L.; Bousquet, O.; Mendelson, S., Local Rademacher complexities, Ann. statist., 33, 1497-1537, (2005) · Zbl 1083.62034
[3] Bartlett, P.L.; Jordan, M.I.; McAuliffe, J.D., Convexity, classification, and risk bounds, J. amer. statist. assoc., 101, 138-156, (2006) · Zbl 1118.62330
[4] B. Blanchard, O. Bousquet, P. Massart, Statistical performance of support vector machines, preprint, 2003. · Zbl 1133.62044
[5] Blanchard, B.; Lugosi, G.; Vayatis, N., On the rate of convergence of regularized boosting classifiers, J. Mach. learning res., 4, 861-894, (2003) · Zbl 1083.68109
[6] Boucheron, S.; Bousquet, O.; Lugosi, G., Theory of classification: a survey of some recent advances, ESAIM: probab. statist., 9, 323-375, (2005) · Zbl 1136.62355
[7] Bousquet, O.; Elisseeff, A., Stability and generalization, J. Mach. learning res., 2, 499-526, (2002) · Zbl 1007.68083
[8] Chapelle, O.; Vapnik, V.; Bousquet, O.; Mukherjee, S., Choosing multiple parameters for support vector machines, Mach. learning, 46, 131-159, (2002) · Zbl 0998.68101
[9] Chen, D.R.; Wu, Q.; Ying, Y.; Zhou, D.X., Support vector machine soft margin classifiers: error analysis, J. Mach. learning res., 5, 1143-1175, (2004) · Zbl 1222.68167
[10] Cortes, C.; Vapnik, V., Support-vector networks, Mach. learning, 20, 273-297, (1995) · Zbl 0831.68098
[11] Cristianini, N.; Shawe-Taylor, J., An introduction to support vector machines, (2000), Cambridge University Press Cambridge, MA
[12] Cucker, F.; Smale, S., On the mathematical foundations of learning, Bull. amer. math. soc., 39, 1-49, (2001) · Zbl 0983.68162
[13] Cucker, F.; Smale, S., Best choices for regularization parameters in learning theory: on the bias – variance problem, Found. comput. math., 2, 413-428, (2002) · Zbl 1057.68085
[14] F. Cucker, D.X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge University Press, Cambridge, MA, in press. · Zbl 1274.41001
[15] Devroye, L.; Györfi, L.; Lugosi, G., A probabilistic theory of pattern recognition, (1997), Springer New York
[16] T. Evgeniou, M. Pontil, Regularized multi-task learning, Proceedings of the 17th SIGKDD Conference on Knowledge Discovery and Data Mining, 2004.
[17] Evgeniou, T.; Pontil, M.; Poggio, T., Regularization networks and support vector machines, Adv. comput. math., 13, 1-50, (2000) · Zbl 0939.68098
[18] Lanckriet, G.R.G.; Cristianini, N.; Bartlett, P.; El Ghaoui, L.; Jordan, M.I., Learning the kernel matrix with semidefinite programming, J. Mach. learning res., 5, 27-72, (2004) · Zbl 1222.68241
[19] Lee, W.S.; Bartlett, P.L.; Williamson, R.C., Efficient agnostic learning of neural networks with bounded Fan-in, IEEE trans. inform. theory, 42, 2118-2132, (1996) · Zbl 0874.68253
[20] Li, J.; Barron, A., Mixture density estimation, ()
[21] Lin, Y., Support vector machines and the Bayes rule in classification, Data mining knowledge discovery, 6, 259-275, (2002)
[22] Lugosi, G.; Vayatis, N., On the Bayes-risk consistency of regularized boosting methods, Ann. statist., 32, 30-55, (2004) · Zbl 1105.62319
[23] Massart, P., Some applications of concentration inequalities to statistics, Ann. fac. sci. Toulouse ser., 6, 9, 245-303, (2000) · Zbl 0986.62002
[24] Mendelsen, S., Improving the sample complexity using global data, IEEE trans. inform. theory, 48, 1977-1991, (2002) · Zbl 1061.68128
[25] Micchelli, C.A.; Pontil, M., Learning the kernel function via regularization, J. Mach. learning res., 6, 1099-1125, (2005) · Zbl 1222.68265
[26] Mukherjee, S.; Rifkin, R.; Poggio, T., Regression and classification with regularization, (), 107-124
[27] Rakhlin, A.; Panchenko, D.; Mukherjee, S., Risk bounds for mixture density estimation, ESAIM: probab. statist., 9, 220-229, (2005) · Zbl 1141.62024
[28] Shawe-Taylor, J.; Bartlett, P.L.; Williamson, R.C.; Anthony, M., Structural risk minimization over data-dependent hierarchies, IEEE trans. inform. theory, 44, 1926-1940, (1998) · Zbl 0935.68090
[29] Smale, S.; Zhou, D.X., Estimating the approximation error in learning theory, Anal. appl., 1, 17-41, (2003) · Zbl 1079.68089
[30] Smale, S.; Zhou, D.X., Shannon sampling and function reconstruction from point values, Bull. amer. math. soc., 41, 279-305, (2004) · Zbl 1107.94007
[31] S. Smale, D.X. Zhou, Learning theory estimates via integral operators and their applications, Constr. Approx., in press. · Zbl 1127.68088
[32] Steinwart, I., On the influence of the kernel on the consistency of support vector machines, J. Mach. learning res., 2, 67-93, (2001) · Zbl 1009.68143
[33] Steinwart, I., Support vector machines are universally consistent, J. complexity, 18, 768-791, (2002) · Zbl 1030.68074
[34] I. Steinwart, C. Scovel, Fast rates for support vector machines, in: Proceedings of the Conference on Learning Theory (COLT-2005), 2005, pp. 279-294. · Zbl 1137.68564
[35] Suykens, J.A.K.; Vandewalle, J., Least squares support vector machine classifiers, Neural process. lett., 9, 293-300, (1999) · Zbl 0958.93042
[36] Tsybakov, A.B., Optimal aggregation of classifiers in statistical learning, Ann. statist., 32, 135-166, (2004) · Zbl 1105.62353
[37] van der Vaart, A.W.; Wellner, J.A., Weak convergence and empirical processes, (1996), Springer New York · Zbl 0862.60002
[38] Vapnik, V., Statistical learning theory, (1998), Wiley New York · Zbl 0935.62007
[39] Wahba, G., Spline models for observational data, (1990), SIAM Philadelphia, PA · Zbl 0813.62001
[40] Wu, Q.; Zhou, D.X., SVM soft margin classifiers: linear programming versus quadratic programming, Neural comput., 17, 1160-1187, (2005) · Zbl 1108.90324
[41] Wu, Q.; Zhou, D.X., Analysis of support vector machine classification, J. comput. anal. appl., 8, 99-119, (2006) · Zbl 1098.68680
[42] Y. Ying, D.X. Zhou, Learnability of Gaussians with flexible variances, preprint, 2004. · Zbl 1222.68339
[43] Zhang, T., On the dual formulation of regularized linear systems with convex risks, Mach. learning, 46, 91-129, (2002) · Zbl 0998.68100
[44] Zhang, T., Statistical behavior and consistency of classification methods based on convex risk minimization, Ann. statist., 32, 56-85, (2004) · Zbl 1105.62323
[45] Zhou, D.X., The covering number in learning theory, J. complexity, 18, 739-767, (2002) · Zbl 1016.68044
[46] Zhou, D.X., Capacity of reproducing kernel spaces in learning theory, IEEE trans. inform. theory, 49, 1743-1752, (2003) · Zbl 1290.62033
[47] D.X. Zhou, Density problem and approximation error in learning theory, preprint, 2003.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.