×

zbMATH — the first resource for mathematics

Fast and strong convergence of online learning algorithms. (English) Zbl 1433.68344
Summary: In this paper, we study the online learning algorithm without explicit regularization terms. This algorithm is essentially a stochastic gradient descent scheme in a reproducing kernel Hilbert space (RKHS). The polynomially decaying step size in each iteration can play a role of regularization to ensure the generalization ability of online learning algorithm. We develop a novel capacity dependent analysis on the performance of the last iterate of online learning algorithm. This answers an open problem in learning theory. The contribution of this paper is twofold. First, our novel capacity dependent analysis can lead to sharp convergence rate in the standard mean square distance which improves the results in the literature. Second, we establish, for the first time, the strong convergence of the last iterate with polynomially decaying step sizes in the RKHS norm. We demonstrate that the theoretical analysis established in this paper fully exploits the fine structure of the underlying RKHS, and thus can lead to sharp error estimates of online learning algorithm.
MSC:
68T05 Learning and adaptive systems in artificial intelligence
46E22 Hilbert spaces with reproducing kernels (= (proper) functional Hilbert spaces, including de Branges-Rovnyak and other structured spaces)
68W27 Online algorithms; streaming algorithms
Software:
gss; Pegasos
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aronszajn, N., Theory of reproducing kernels, Trans. Am. Math. Soc., 68, 3, 337-404 (1950) · Zbl 0037.20701
[2] Birman, Ms; Solomjak, Mz, Piecewise-polynomial approximations of functions of the classes \({W}_p^{\alpha }\) Wpα, Math. USSR-Sbornik, 2, 3, 295-317 (1967) · Zbl 0173.16001
[3] Braun, Ml; Buhmann, Jm; Müller, K-R, On relevant dimensions in kernel feature spaces, J. Mach. Learn. Res., 9, Aug, 1875-1908 (2008) · Zbl 1225.68153
[4] Caponnetto, A.; Devito, E., Optimal rates for the regularized least squares algorithm, Found. Comput. Math., 7, 3, 331-368 (2007) · Zbl 1129.68058
[5] Cucker, F.; Zhou, Dx, Learning Theory: an Approximation Theory Viewpoint (2007), Cambridge: Cambridge University Press, Cambridge
[6] Didas, S.; Setzer, S.; Steidl, G., Combined ℓ2 data and gradient fitting in conjunction with ℓ1 regularization, Adv. Comput. Math., 30, 1, 79-99 (2009) · Zbl 1163.65037
[7] Dieuleveut, A.; Bach, F., Nonparametric stochastic approximation with large step-sizes, Ann. Stat., 44, 4, 1363-1399 (2016) · Zbl 1346.60041
[8] Gu, C., Smoothing Spline ANOVA Models. Springer Series in Statistics (2002), New York: Springer, New York · Zbl 1051.62034
[9] Guo, Zc; Ying, Y.; Zhou, Dx, Online regularized learning with pairwise loss functions, Adv. Comput. Math., 43, 1, 127-150 (2017) · Zbl 1407.68394
[10] Kivinen, J.; Smola, Aj; Williamson, Rc, Online learning with kernels, IEEE Trans. Signal Process., 52, 8, 2165-2176 (2004) · Zbl 1369.68281
[11] Lei, Y.; Shi, L.; Guo, Zc, Convergence of unregularized online learning algorithms, J. Mach. Learn. Res., 18, 171, 1-33 (2018) · Zbl 06982927
[12] Lin, J., Rosasco, L.: Optimal learning for multi-pass stochastic gradient methods. In: Advances in Neural Information Processing Systems, pp 4556-4564 (2016)
[13] Mendelson, S.; Neeman, J., Regularization in kernel learning, Ann. Stat., 38, 1, 526-565 (2010) · Zbl 1191.68356
[14] Mikusiński, J., The Bochner Integral (1978), Basel: Birkhäuser, Basel · Zbl 0369.28010
[15] Nemirovski, A.; Juditsky, A.; Lan, G.; Shapiro, A., Robust stochastic approximation approach to stochastic programming, SIAM J. Optim., 19, 4, 1574-1609 (2009) · Zbl 1189.90109
[16] Pillaud-Vivien, L., Alessandro, R., Francis, B.: Statistical optimality of stochastic gradient descent on hard learning problems through multiple passes. In: Advances in Neural Information Processing Systems, pp 8114-8124 (2018)
[17] Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12), pp 449-456 (2012)
[18] Rosasco, L., Tacchetti, A., Villa, S.: Regularization by early stopping for online learning algorithms. arXiv:1405.0042 (2014)
[19] Shalev-Shwartz, S.; Singer, Y.; Srebro, N.; Cotter, A., Pegasos: primal estimated sub-gradient solver for svm, Math. Program., 127, 1, 3-30 (2011) · Zbl 1211.90239
[20] Smale, S.; Zhou, Dx, Estimating the approximation error in learning theory, Anal. Appl., 1, 1, 17-41 (2003) · Zbl 1079.68089
[21] Smale, S.; Yao, Y., Online learning algorithms, Found. Comput. Math., 6, 2, 145-170 (2006) · Zbl 1119.68098
[22] Smale, S.; Zhou, Dx, Learning theory estimates via integral of operators and their approximations, Constr. Approx., 26, 2, 153C-172 (2007) · Zbl 1127.68088
[23] Steinwart, I.; Christmann, A., Support Vector Machines (2008), New York: Springer, New York
[24] Steinwart, I., Hush, D.R., Scovel, C.: Optimal rates for regularized least squares regression. In: Conference on Learning Theory (2009) · Zbl 1127.68090
[25] Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International Conference on Machine Learning, pp 1139-1147 (2013)
[26] Tarres, P.; Yao, Y., Online learning as stochastic approximation of regularization paths: optimality and almost-sure convergence, IEEE Trans. Inf. Theory, 60, 9, 5716-5735 (2014) · Zbl 1360.62192
[27] Wahba, G., Spline Models for Observational Data (1990), Philadelphia: Society for Industrial and Applied Mathematics (SIAM), Philadelphia · Zbl 0813.62001
[28] Wendland, H., Scattered Data Approximation (2005), Cambridge: Cambridge University Press, Cambridge · Zbl 1075.65021
[29] Yao, Y., On complexity issues of online learning algorithms, IEEE Trans. Inf. Theory, 56, 12, 6470-6481 (2010) · Zbl 1368.68288
[30] Yao, Y.; Rosasco, L.; Caponnetto, A., On early stopping in gradient descent learning, Constr. Approx., 26, 2, 289-315 (2007) · Zbl 1125.62035
[31] Ying, Y., Convergence analysis of online algorithms, Adv. Comput. Math., 27, 3, 273-291 (2007) · Zbl 1129.68070
[32] Ying, Y.; Pontil, M., Online gradient descent learning algorithms, Found. Comput. Math., 8, 5, 561-596 (2008) · Zbl 1175.68211
[33] Ying, Y.; Zhou, Dx, Online regularized classification algorithms, IEEE Trans. Inf. Theory, 52, 11, 4775-4788 (2006) · Zbl 1323.68450
[34] Ying, Y.; Zhou, Dx, Online Online pairwise learning algorithms, Neural Comput., 28, 4, 743-777 (2016) · Zbl 07062540
[35] Ying, Y.; Zhou, Dx, Unregularized online learning algorithms with general loss functions, Appl. Comput. Harmon. Anal., 2, 42, 224-244 (2017) · Zbl 1382.68204
[36] Zhang, T.; Yu, B., Boosting with early stopping: convergence and consistency, Ann. Stat., 33, 4, 1538-1579 (2005) · Zbl 1078.62038
[37] Zhou, Dx, The covering number in learning theory, J. Complex., 18, 3, 739-767 (2002) · Zbl 1016.68044
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.