×

Concentration estimates for learning with \(\ell ^{1}\)-regularizer and data dependent hypothesis spaces. (English) Zbl 1221.68201

Summary: We consider the regression problem by learning with a regularization scheme in a data dependent hypothesis space and \(\ell ^{1}\)-regularizer. The data dependence nature of the kernel-based hypothesis space provides flexibility for the learning algorithm. The regularization scheme is essentially different from the standard one in a reproducing kernel Hilbert space: the kernel is not necessarily symmetric or positive semi-definite and the regularizer is the \(\ell ^{1}\)-norm of a function expansion involving samples. The differences lead to additional difficulty in the error analysis. In this paper we apply concentration techniques with \(\ell ^{2}\)-empirical covering numbers to improve the learning rates for the algorithm. Sparsity of the algorithm is studied based on our error analysis. We also show that a function space involved in the error analysis induced by the \(\ell ^{1}\)-regularizer and non-symmetric kernel has nice behaviors in terms of the \(\ell ^{2}\)-empirical covering numbers of its unit ball.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62J02 General nonlinear regression
65C60 Computational problems in statistics (MSC2010)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Anthony, M.; Bartlett, P. L., Neural Network Learning: Theoretical Foundations (1999), Cambridge University Press · Zbl 0968.68126
[2] Aronszajn, N., Theory of reproducing kernels, Trans. Amer. Math. Soc., 68, 337-404 (1950) · Zbl 0037.20701
[3] Candès, E.; Romberg, J., Sparsity and incoherence in compressive sampling, Inverse Problems, 23, 969-985 (2007) · Zbl 1120.94005
[4] Caponnetto, A.; De Vito, E., Optimal rates for regularized least-squares algorithms, Found. Comput. Math., 7, 331-368 (2007) · Zbl 1129.68058
[5] De Vito, E.; Caponnetto, A.; Rosasco, L., Model selection for regularized least-squares algorithm in learning theory, Found. Comput. Math., 5, 59-85 (2005) · Zbl 1083.68106
[6] Edmunds, D.; Triebel, H., Function Spaces, Entropy Numbers, Differential Operators (1996), Cambridge University Press · Zbl 0865.46020
[7] Evgeniou, T.; Pontil, M.; Poggio, T., Regularization networks and support vector machine, Adv. Comput. Math., 10, 51-80 (1999)
[8] Jetter, K.; Stöckler, J.; Ward, J. D., Error estimates for scattered data interpolation on sphere, Adv. Comput. Math., 13, 1-50 (2000)
[9] Smale, S.; Zhou, D. X., Learning theory estimates via integral operator and their approximation, Constr. Approx., 26, 153-172 (2007) · Zbl 1127.68088
[10] Smale, S.; Zhou, D. X., Estimating the approximation error in learning theory, Anal. Appl., 1, 17-41 (2003) · Zbl 1079.68089
[11] Smale, S.; Zhou, D. X., Shannon sampling and function reconstruction from point values, Bull. Amer. Math. Soc., 41, 279-305 (2004) · Zbl 1107.94007
[12] Smale, S.; Zhou, D. X., Online learning with Markov sampling, Anal. Appl., 7, 87-113 (2009) · Zbl 1170.68022
[13] Steinwart, I.; Christmann, A., Support Vector Machines (2008), Springer-Verlag: Springer-Verlag New York · Zbl 1203.68171
[14] Steinwart, I.; Scovel, C., Fast rates for support vector machines using Gaussian kernels, Ann. Statist., 35, 575-607 (2007) · Zbl 1127.68091
[15] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 58, 267-288 (1996) · Zbl 0850.62538
[16] van der Vaart, A. W.; Wellner, J. A., Weak Convergence and Empirical Processes (1996), Springer-Verlag: Springer-Verlag New York · Zbl 0862.60002
[17] H.Y. Wang, Q.W. Xiao, D.X. Zhou, Learning with \(\ell^1\); H.Y. Wang, Q.W. Xiao, D.X. Zhou, Learning with \(\ell^1\)
[18] Wendland, H., Local polynomial reproduction and moving least squares approximation, IMA J. Numer. Anal., 21, 285-300 (2001) · Zbl 0976.65013
[19] Wu, Q.; Ying, Y.; Zhou, D. X., Learning rates of least-square regularized regression, Found. Comput. Math., 6, 171-192 (2006) · Zbl 1100.68100
[20] Wu, Q.; Ying, Y.; Zhou, D. X., Multi-kernel regularized classifiers, J. Complexity, 23, 108-134 (2007) · Zbl 1171.65043
[21] Wu, Q.; Zhou, D. X., Learning with sample dependent hypothesis space, Comput. Math. Appl., 56, 2896-2907 (2008) · Zbl 1165.68388
[22] Xiao, Q. W.; Zhou, D. X., Learning by nonsymmetric kernel with data dependent spaces and \(\ell^1\)-regularizer, Taiwanese J. Math., 14, 1821-1836 (2010) · Zbl 1221.68204
[23] Yao, Y., On complexity issue of online learning algorithms, IEEE Trans. Inform. Theory, 56, 6470-6481 (2010) · Zbl 1368.68288
[24] Ye, G. B.; Zhou, D. X., SVM learning and \(L^p\) approximation by Gaussians on Riemannian manifolds, Anal. Appl., 7, 309-339 (2009) · Zbl 1175.68346
[25] Zhang, T., Leave-one-out bounds for kernel methods, Neural Comput., 15, 1397-1437 (2003) · Zbl 1085.68144
[26] Zhang, T., Some sharp performance bounds for least square regression with \(L_1\) regularization, Ann. Statist., 37, 2109-2114 (2009) · Zbl 1173.62029
[27] Zhao, P.; Yu, B., On model selection consistency of Lasso, J. Mach. Learn. Res., 7, 2541-2563 (2006) · Zbl 1222.62008
[28] Zhou, D. X., The covering number in learning theory, J. Complexity, 18, 739-767 (2002) · Zbl 1016.68044
[29] Zhou, D. X., Capacity of reproducing kernel spaces in learning theory, IEEE Trans. Inform. Theory, 49, 1743-1752 (2003) · Zbl 1290.62033
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.