zbMATH — the first resource for mathematics

Examples
Geometry Search for the term Geometry in any field. Queries are case-independent.
Funct* Wildcard queries are specified by * (e.g. functions, functorial, etc.). Otherwise the search is exact.
"Topological group" Phrases (multi-words) should be set in "straight quotation marks".
au: Bourbaki & ti: Algebra Search for author and title. The and-operator & is default and can be omitted.
Chebyshev | Tschebyscheff The or-operator | allows to search for Chebyshev or Tschebyscheff.
"Quasi* map*" py: 1989 The resulting documents have publication year 1989.
so: Eur* J* Mat* Soc* cc: 14 Search for publications in a particular source with a Mathematics Subject Classification code (cc) in 14.
"Partial diff* eq*" ! elliptic The not-operator ! eliminates all results containing the word elliptic.
dt: b & au: Hilbert The document type is set to books; alternatively: j for journal articles, a for book articles.
py: 2000-2015 cc: (94A | 11T) Number ranges are accepted. Terms can be grouped within (parentheses).
la: chinese Find documents in a given language. ISO 639-1 language codes can also be used.

Operators
a & b logic and
a | b logic or
!ab logic not
abc* right wildcard
"ab c" phrase
(ab c) parentheses
Fields
any anywhere an internal document identifier
au author, editor ai internal author identifier
ti title la language
so source ab review, abstract
py publication year rv reviewer
cc MSC code ut uncontrolled term
dt document type (j: journal article; b: book; a: book article)
Learning with sample dependent hypothesis spaces. (English) Zbl 1165.68388
Summary: Many learning algorithms use hypothesis spaces which are trained from samples, but little theoretical work has been devoted to the study of these algorithms. In this paper we show that mathematical analysis for these algorithms is essentially different from that for algorithms with hypothesis spaces independent of the sample or depending only on the sample size. The difficulty lies in the lack of a proper characterization of approximation error. To overcome this difficulty, we propose an idea of using a larger function class (not necessarily linear space) containing the union of all possible hypothesis spaces (varying with the sample) to measure the approximation ability of the algorithm. We show how this idea provides error analysis for two particular classes of learning algorithms in kernel methods: learning the kernel via regularization and coefficient based regularization. We demonstrate the power of this approach by its wide applicability.
MSC:
68Q32Computational learning theory
62J02General nonlinear regression
68T05Learning and adaptive systems
References:
[1]Vapnik, V.: Statistical learning theory, (1998) · Zbl 0935.62007
[2]Anthony, M.; Bartlett, P.: Neural network learning: theoretical foundations, (1999)
[3]Evgeniou, T.; Pontil, M.; Poggio, T.: Regularization networks and support vector machines, Adv. comput. Math. 13, 1-50 (2000) · Zbl 0939.68098 · doi:10.1023/A:1018946025316
[4]Cristianini, N.; Shawe-Taylor, J.: An introduction to support vector machines, (2000)
[5]Blanchard, G.; Lugosi, G.; Vayatis, N.: On the rate of convergence of regularized boosting classifiers, J. machine learning res. 4, 861-894 (2003) · Zbl 1083.68109 · doi:10.1162/1532443041424319 · doi:http://jmlr.csail.mit.edu/papers/v4/blanchard03a.html
[6]Bousquet, O.; Elisseeff, A.: Stability and generalization, J. machine learning res. 2, 499-526 (2002) · Zbl 1007.68083 · doi:10.1162/153244302760200704
[7]Zhang, T.: Leave-one-out bounds for kernel methods, Neural comp. 15, 1397-1437 (2003) · Zbl 1085.68144 · doi:10.1162/089976603321780326
[8]Chen, D.; Wu, Q.; Ying, Y. M.; Zhou, D. X.: Support vector machine soft margin classifiers: error analysis, J. machine learning res. 5, 1143-1175 (2004) · Zbl 1222.68167 · doi:http://www.jmlr.org/papers/v5/chen04b.html
[9]Wu, Q.; Ying, Y. M.; Zhou, D. X.: Learning rates of least-square regularized regression, Found. comput. Math. 6, 171-192 (2006) · Zbl 1100.68100 · doi:10.1007/s10208-004-0155-9
[10]Smale, S.; Zhou, D. X.: Learning theory estimates via integral operators and their approximations, Constr. approx. 26, 153-172 (2007) · Zbl 1127.68088 · doi:10.1007/s00365-006-0659-y
[11]Aronszajn, N.: Theory of reproducing kernels, Trans. amer. Math. soc. 68, 337-404 (1950) · Zbl 0037.20701 · doi:10.2307/1990404
[12]Wahba, G.: Spline models for observational data, (1990) · Zbl 0813.62001
[13]Schölkopf, B.; Smola, A. J.: Learning with kernels, (2002)
[14]C. Scovel, I. Steinwart, Fast rates for support vector machines, in: Proc. of the 18th Annual Conference on Learning Theory, COLT 2005, Bertinoro, Italy, pp. 279–294 · Zbl 1137.68564 · doi:10.1007/b137542
[15]Vert, R.; Vert, J.: Consistency and convergence rates of one-class svms and related algorithms, J. machine learning res. 7, 817-854 (2006) · Zbl 1222.68324 · doi:http://www.jmlr.org/papers/v7/vert06a.html
[16]D.H. Xiang, D.X. Zhou, Classification with Gaussians and convex loss, 2008, preprint
[17]Cucker, F.; Smale, S.: Best choices for regularization parameters in learning theory, Found. comput. Math. 2, 413-428 (2002) · Zbl 1057.68085 · doi:10.1007/s102080010030
[18]Wu, Q.; Zhou, D. X.: Analysis of support vector machine classification, J. comp. Anal. appl. 8, 99-119 (2006) · Zbl 1098.68680
[19]Cherkassky, V.; Mulier, F.: Learning from data: concepts, theory, and methods, (1998) · Zbl 0960.62002
[20]Lugosi, G.; Nobel, A.: Adaptive model selection using empirical complexities, Ann. stat. 27, 1830-1864 (1999) · Zbl 0962.62034 · doi:10.1214/aos/1017939241
[21]Binev, P.; Cohen, Al.; Dahmen, W.; Devore, R.; Temlyakov, V.: Universal algorithms for learning theory part I: Piecewise constant functions, J. machine learning res. 6, 1297-1321 (2005) · Zbl 1191.62068 · doi:http://www.jmlr.org/papers/v6/binev05a.html
[22]V. Temlyakov, On universal estimators in learning theory, 2005, preprint
[23]Chapelle, O.; Vapnik, V.; Bousquet, O.; Mukherjee, S.: Choosing multiple parameters for support vector machines, Machine learning 46, 131-159 (2002) · Zbl 0998.68101 · doi:10.1023/A:1012450327387
[24]Lanckriet, G. R. G.; Cristianini, N.; Bartlett, P.; El Ghaoui, L.; Jordan, M. I.: Learning the kernel matrix with semidefinite programming, J. machine learning res. 5, 27-72 (2004) · Zbl 1222.68241 · doi:http://www.jmlr.org/papers/v5/lanckriet04a.html
[25]Micchelli, C. A.; Pontil, M.: Learning the kernel function via regularization, J. machine learning res. 6, 1099-1125 (2005) · Zbl 1222.68265 · doi:http://www.jmlr.org/papers/v6/micchelli05a.html
[26]Hoerl, A. E.; Kennard, R. W.: Ridge regression: biased estimation for nonorthogonal problems, Technometrics 12, 55-67 (1970) · Zbl 0202.17205 · doi:10.2307/1267351
[27]D. Donoho, For most large underdetermined systems of equations, the minimal 1-norm solution is also the sparsest near-solution, 2004, preprint
[28]Wu, Q.; Zhou, D. X.: SVM soft margin classifiers: linear programming virsus quadratic programming, Neural comp. 17, 1160-1187 (2005) · Zbl 1108.90324 · doi:10.1162/0899766053491896
[29]Vapnik, V.: Estimation of dependences based on empirical data, (1982)
[30]Cucker, F.; Smale, S.: On the mathematical foundations of learning, Bull. amer. Math. soc. 39, 1-49 (2001) · Zbl 0983.68162 · doi:10.1090/S0273-0979-01-00923-5
[31]Wu, Q.; Ying, Y.; Zhou, D. X.: Multi-kernel regularized classifiers, J. complexity 23, 108-134 (2007) · Zbl 1171.65043 · doi:10.1016/j.jco.2006.06.007
[32]Smale, S.; Zhou, D. X.: Shannon sampling and function reconstruction from point values, Bull. amer. Math. soc. 41, 279-305 (2004) · Zbl 1107.94007 · doi:10.1090/S0273-0979-04-01025-0
[33]Zhou, D. X.: Capacity of reproducing kernel spaces in learning theory, IEEE trans. Inform. theory 49, 1743-1752 (2003)
[34]Zhou, D. X.: The covering number in learning theory, J. complexity 18, 739-767 (2002) · Zbl 1016.68044 · doi:10.1006/jcom.2002.0635
[35]Smale, S.; Zhou, D. X.: Estimating the approximation error in learning theory, Anal. appl. 1, 17-41 (2003) · Zbl 1079.68089 · doi:10.1142/S0219530503000089
[36]Y. Lin, H.H. Zhang, Component selection and smoothing in smoothing spline analysis of variance models – COSSO, Institute of Statistics Mimeo Series 2556, NCSU, 2003
[37]M. Herbster, Relative loss bounds and polynomial-time predictions for the K-LMS-NET algorithm, in: Proc. 15-th Int. Conf. Algorithmic Learning Theory, 2004
[38]C.A. Micchelli, M. Pontil, Q. Wu, D.X. Zhou, Error bounds for Learning the kernel, 2005, preprint
[39]Ying, Y.; Zhou, D. X.: Learnability of gaussians with flexible variances, J. machine learning res. 8, 249-276 (2007) · Zbl 1222.68339 · doi:http://www.jmlr.org/papers/v8/ying07a.html
[40]Belkin, M.; Niyogi, P.: Semi-supervised learning on Riemannian manifolds, Machine learning 56, 209-239 (2004) · Zbl 1089.68086 · doi:10.1023/B:MACH.0000033120.25363.1e
[41]Bennett, K. P.; Demiriz, A.: Semi-supervised support vector machines, Advances in neural information processing system, 368-374 (1999)
[42]S. Mukherjee, V. Vapnik, Support vector method for multivariate density estimation, CBCL/AI Memo, 1999