×

zbMATH — the first resource for mathematics

Learning with sample dependent hypothesis spaces. (English) Zbl 1165.68388
Summary: Many learning algorithms use hypothesis spaces which are trained from samples, but little theoretical work has been devoted to the study of these algorithms. In this paper we show that mathematical analysis for these algorithms is essentially different from that for algorithms with hypothesis spaces independent of the sample or depending only on the sample size. The difficulty lies in the lack of a proper characterization of approximation error. To overcome this difficulty, we propose an idea of using a larger function class (not necessarily linear space) containing the union of all possible hypothesis spaces (varying with the sample) to measure the approximation ability of the algorithm. We show how this idea provides error analysis for two particular classes of learning algorithms in kernel methods: learning the kernel via regularization and coefficient based regularization. We demonstrate the power of this approach by its wide applicability.

MSC:
68Q32 Computational learning theory
62J02 General nonlinear regression
68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Vapnik, V., Statistical learning theory, (1998), John Wiley & Sons New York · Zbl 0935.62007
[2] Anthony, M.; Bartlett, P., Neural network learning: theoretical foundations, (1999), Cambridge University Press · Zbl 0968.68126
[3] Evgeniou, T.; Pontil, M.; Poggio, T., Regularization networks and support vector machines, Adv. comput. math., 13, 1-50, (2000) · Zbl 0939.68098
[4] Cristianini, N.; Shawe-Taylor, J., An introduction to support vector machines, (2000), Cambridge University Press
[5] Blanchard, G.; Lugosi, G.; Vayatis, N., On the rate of convergence of regularized boosting classifiers, J. machine learning res., 4, 861-894, (2003) · Zbl 1083.68109
[6] Bousquet, O.; Elisseeff, A., Stability and generalization, J. machine learning res., 2, 499-526, (2002) · Zbl 1007.68083
[7] Zhang, T., Leave-one-out bounds for kernel methods, Neural comp., 15, 1397-1437, (2003) · Zbl 1085.68144
[8] Chen, D.; Wu, Q.; Ying, Y.M.; Zhou, D.X., Support vector machine soft margin classifiers: error analysis, J. machine learning res., 5, 1143-1175, (2004) · Zbl 1222.68167
[9] Wu, Q.; Ying, Y.M.; Zhou, D.X., Learning rates of least-square regularized regression, Found. comput. math., 6, 171-192, (2006) · Zbl 1100.68100
[10] Smale, S.; Zhou, D.X., Learning theory estimates via integral operators and their approximations, Constr. approx., 26, 153-172, (2007) · Zbl 1127.68088
[11] Aronszajn, N., Theory of reproducing kernels, Trans. amer. math. soc., 68, 337-404, (1950) · Zbl 0037.20701
[12] Wahba, G., Spline models for observational data, (1990), SIAM · Zbl 0813.62001
[13] Schölkopf, B.; Smola, A.J., Learning with kernels, (2002), MIT Press
[14] C. Scovel, I. Steinwart, Fast rates for support vector machines, in: Proc. of the 18th Annual Conference on Learning Theory, COLT 2005, Bertinoro, Italy, pp. 279-294 · Zbl 1137.68564
[15] Vert, R.; Vert, J., Consistency and convergence rates of one-class SVMs and related algorithms, J. machine learning res., 7, 817-854, (2006) · Zbl 1222.68324
[16] D.H. Xiang, D.X. Zhou, Classification with Gaussians and convex loss, 2008, preprint · Zbl 1235.68207
[17] Cucker, F.; Smale, S., Best choices for regularization parameters in learning theory, Found. comput. math., 2, 413-428, (2002) · Zbl 1057.68085
[18] Wu, Q.; Zhou, D.X., Analysis of support vector machine classification, J. comp. anal. appl., 8, 99-119, (2006) · Zbl 1098.68680
[19] Cherkassky, V.; Mulier, F., Learning from data: concepts, theory, and methods, (1998), Wiley New York · Zbl 0960.62002
[20] Lugosi, G.; Nobel, A., Adaptive model selection using empirical complexities, Ann. stat., 27, 1830-1864, (1999) · Zbl 0962.62034
[21] Binev, P.; Cohen, Al.; Dahmen, W.; DeVore, R.; Temlyakov, V., Universal algorithms for learning theory part I: piecewise constant functions, J. machine learning res., 6, 1297-1321, (2005) · Zbl 1191.62068
[22] V. Temlyakov, On universal estimators in learning theory, 2005, preprint · Zbl 1351.68238
[23] Chapelle, O.; Vapnik, V.; Bousquet, O.; Mukherjee, S., Choosing multiple parameters for support vector machines, Machine learning, 46, 131-159, (2002) · Zbl 0998.68101
[24] Lanckriet, G.R.G.; Cristianini, N.; Bartlett, P.; El Ghaoui, L.; Jordan, M.I., Learning the kernel matrix with semidefinite programming, J. machine learning res., 5, 27-72, (2004) · Zbl 1222.68241
[25] Micchelli, C.A.; Pontil, M., Learning the kernel function via regularization, J. machine learning res., 6, 1099-1125, (2005) · Zbl 1222.68265
[26] Hoerl, A.E.; Kennard, R.W., Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 55-67, (1970) · Zbl 0202.17205
[27] D. Donoho, For most large underdetermined systems of equations, the minimal \(\ell^1\)-norm solution is also the sparsest near-solution, 2004, preprint
[28] Wu, Q.; Zhou, D.X., SVM soft margin classifiers: linear programming virsus quadratic programming, Neural comp., 17, 1160-1187, (2005) · Zbl 1108.90324
[29] Vapnik, V., Estimation of dependences based on empirical data, (1982), Springer-Verlag New York · Zbl 0499.62005
[30] Cucker, F.; Smale, S., On the mathematical foundations of learning, Bull. amer. math. soc., 39, 1-49, (2001) · Zbl 0983.68162
[31] Wu, Q.; Ying, Y.; Zhou, D.X., Multi-kernel regularized classifiers, J. complexity, 23, 108-134, (2007) · Zbl 1171.65043
[32] Smale, S.; Zhou, D.X., Shannon sampling and function reconstruction from point values, Bull. amer. math. soc., 41, 279-305, (2004) · Zbl 1107.94007
[33] Zhou, D.X., Capacity of reproducing kernel spaces in learning theory, IEEE trans. inform. theory, 49, 1743-1752, (2003) · Zbl 1290.62033
[34] Zhou, D.X., The covering number in learning theory, J. complexity, 18, 739-767, (2002) · Zbl 1016.68044
[35] Smale, S.; Zhou, D.X., Estimating the approximation error in learning theory, Anal. appl., 1, 17-41, (2003) · Zbl 1079.68089
[36] Y. Lin, H.H. Zhang, Component selection and smoothing in smoothing spline analysis of variance models — COSSO, Institute of Statistics Mimeo Series 2556, NCSU, 2003
[37] M. Herbster, Relative loss bounds and polynomial-time predictions for the K-LMS-NET algorithm, in: Proc. 15-th Int. Conf. Algorithmic Learning Theory, 2004 · Zbl 1110.68442
[38] C.A. Micchelli, M. Pontil, Q. Wu, D.X. Zhou, Error bounds for Learning the kernel, 2005, preprint · Zbl 1392.68357
[39] Ying, Y.; Zhou, D.X., Learnability of gaussians with flexible variances, J. machine learning res., 8, 249-276, (2007) · Zbl 1222.68339
[40] Belkin, M.; Niyogi, P., Semi-supervised learning on Riemannian manifolds, Machine learning, 56, 209-239, (2004) · Zbl 1089.68086
[41] Bennett, K.P.; Demiriz, A., Semi-supervised support vector machines, (), 368-374
[42] S. Mukherjee, V. Vapnik, Support vector method for multivariate density estimation, CBCL/AI Memo, 1999
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.