zbMATH — the first resource for mathematics

On-line learning for very large data sets. (English) Zbl 1091.68063
The learning algorithms optimise an empirical cost function \(C_{n}(\theta)={1\over n}\sum_{i=1}^{n}L(z_{i},\theta)\). Each term measures the cost associated with running a model with parameter vector \(\theta\) on independently drawn examples \(z_{i}\). The authors consider two kinds of optimisation procedures: 1) batch gradient – parameter updates are performed on the basis of the gradient and Hessian information accumulated over the entire training set \(\theta(k)=\theta(k-1)-\Phi_{k}{\partial C_{n}\over\partial\theta}(\theta(k-1))\), and (2) on-line or stochastic gradient – parameter updates are performed on the basis of a single sample \(z_{t}\) picked randomly at each iteration \(\theta(t)=\theta(t-1)-{1\over t}\Phi_{t}{\partial L\over\partial\theta}(z_{t},\theta(t-1))\), where \(\Phi_{t}\) is an appropriately chosen positive definite symmetric matrix. Very often the examples \(z_{t}\) are chosen by cycling over a randomly permuted training set. Each cycle is called an epoch. This paper shows that performing a single epoch of a suitable on-line algorithm converges to the true solution of the learning problem asymptotically as fast as any other algorithm.

68Q32 Computational learning theory
93E35 Stochastic learning and adaptive control
Full Text: DOI
[1] Theory of Pattern Recognition. Nauka: Moscow, 1974 (in Russian). (German translation: Theorie der Zeichenerkennung. Akademie Verlag: Berlin, 1979).
[2] A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, vol. 5, Pittsburgh, Pennsylvania, USA, 1992; 144-152.
[3] Improving performance in neural networks using a boosting algorithm. In Advances in Neural Information Processing Systems 5, (eds). Morgan Kaufmann: San Mateo, CA, 1993; 42-49.
[4] Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference, Bari, Italy, 1996; 148-156.
[5] Comparison of classifier methods: a case study in handwritten digit recognition. In Proceedings of the 13th International Conference on Pattern Recognition, Jerusalem, 1994.
[6] Stochastic approximations and efficient learning. In The Handbook of Brain Theory and Neural Networks (2nd edn), (ed.). The MIT Press: Cambridge, MA, 2002.
[7] Numerical Methods For Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Inc.: Englewood Cliffs, NJ, 1983.
[8] Adaptive Algorithms and Stochastic Approximations. Springer: Berlin, New York, 1990. · doi:10.1007/978-3-642-75894-2
[9] Online algorithms and stochastic approximations. In Online Learning and Neural Networks, (ed.). Cambridge University Press: Cambridge, UK, 1998.
[10] Efficient backprop. In Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science, vol. 1524. (eds). Springer: Berlin, 1998.
[11] Murata, Signal Processing 74 pp 3– (1999)
[12] Mann, Annals of Mathematical Statistics 14 pp 217– (1943)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.