zbMATH — the first resource for mathematics

A unified framework of online learning algorithms for training recurrent neural networks. (English) Zbl 07255166
Summary: We present a framework for compactly summarizing many recent results in efficient and/or biologically plausible online training of recurrent neural networks (RNN). The framework organizes algorithms according to several criteria: (a) past vs. future facing, (b) tensor structure, (c) stochastic vs. deterministic, and (d) closed form vs. numerical. These axes reveal latent conceptual connections among several recent advances in online learning. Furthermore, we provide novel mathematical intuitions for their degree of success. Testing these algorithms on two parametric task families shows that performances cluster according to our criteria. Although a similar clustering is also observed for pairwise gradient alignment, alignment with exact methods does not explain ultimate performance. This suggests the need for better comparison metrics.
68T05 Learning and adaptive systems in artificial intelligence
Full Text: Link
[1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473, 2014.
[2] Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, and Yoshua Bengio. End-to-end attention-based large vocabulary speech recognition. In2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4945-4949. IEEE, 2016.
[3] Guillaume Bellec, Franz Scherr, Elias Hajek, Darjan Salaj, Robert Legenstein, and Wolfgang Maass. Biologically inspired alternatives to backpropagation through time for learning in recurrent neural nets.arXiv preprint arXiv:1901.09049, 2019.
[4] Frederik Benzing, Marcelo Matheus Gauy, Asier Mujika, Anders Martinsson, and Angelika Steger. Optimal kronecker-sum approximation of real time recurrent learning.arXiv preprint arXiv:1902.03993, 2019.
[5] Kyunghyun Cho, Bart Van Merri¨enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation.arXiv preprint arXiv:1406.1078, 2014.
[6] Kyunghyun Cho, Aaron Courville, and Yoshua Bengio. Describing multimedia content using attention-based encoder-decoder networks.IEEE Transactions on Multimedia, 17 (11):1875-1886, 2015.
[7] Tim Cooijmans and James Martens. On the variance of unbiased online recurrent optimization.arXiv preprint arXiv:1902.02405, 2019.
[8] Wojciech Marian Czarnecki, Grzegorz Swirszcz, Max Jaderberg, Simon Osindero, Oriol Vinyals, and Koray Kavukcuoglu. Understanding synthetic gradients and decoupled neural interfaces. InProceedings of the 34th International Conference on Machine LearningVolume 70, pages 904-912. JMLR. org, 2017.
[9] Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web].IEEE Signal Processing Magazine, 29(6):141-142, 2012.
[10] Alex Graves.Generating sequences with recurrent neural networks.arXiv preprint arXiv:1308.0850, 2013.
[11] Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Danihelka, Agnieszka Grabska-Barwi´nska, Sergio G´omez Colmenarejo, Edward Grefenstette, Tiago Ramalho, John Agapiou, et al. Hybrid computing using a neural network with dynamic external memory.Nature, 538(7626):471, 2016.
[12] Jordan Guerguiev, Timothy P Lillicrap, and Blake A Richards. Towards deep learning with segregated dendrites.ELife, 6:e22901, 2017.
[13] Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term memory.Neural computation, 9(8):1735-1780, 1997.
[14] Max Jaderberg, Wojciech Marian Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, David Silver, and Koray Kavukcuoglu. Decoupled neural interfaces using synthetic gradients. InProceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1627-1635. JMLR. org, 2017.
[15] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014.
[16] Timothy P Lillicrap and Adam Santoro. Backpropagation through time and the brain. Current Opinion in Neurobiology, 55:82 - 89, 2019. ISSN 0959-4388. doi: https://doi.org/ 10.1016/j.conb.2019.01.011. URLhttp://www.sciencedirect.com/science/article/ pii/S0959438818302009. Machine Learning, Big Data, and Neuroscience.
[17] Timothy P Lillicrap, Daniel Cownden, Douglas B Tweed, and Colin J Akerman. Random synaptic feedback weights support error backpropagation for deep learning.Nature communications, 7:13276, 2016.
[18] Mantas Lukoˇseviˇcius and Herbert Jaeger. Reservoir computing approaches to recurrent neural network training.Computer Science Review, 3(3):127-149, 2009.
[19] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation.arXiv preprint arXiv:1508.04025, 2015.
[20] Owen Marschall, Kyunghyun Cho, and Cristina Savin. Evaluating biological plausibility of learning algorithms the lazy way. 2019.
[21] Tom´aˇs Mikolov, Martin Karafi´at, Luk´aˇs Burget, Jan ˇCernock‘y, and Sanjeev Khudanpur. Recurrent neural network based language model. InEleventh annual conference of the international speech communication association, 2010.
[22] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning.Nature, 518(7540):529, 2015.
[23] Asier Mujika, Florian Meier, and Angelika Steger. Approximating real-time recurrent learning with random kronecker factors. InAdvances in Neural Information Processing Systems, pages 6594-6603, 2018.
[24] James M Murray. Local online learning in recurrent networks with random feedback.eLife, 8:e43299, 2019.
[25] Alexander Ororbia, Ankur Mali, C Lee Giles, and Daniel Kifer. Online learning of recurrent neural architectures by locally aligning distributed representations.arXiv preprint arXiv:1810.07411, 2018.
[26] II Ororbia, G Alexander, Patrick Haffner, David Reitter, and C Lee Giles. Learning to adapt by minimizing discrepancy.arXiv preprint arXiv:1711.11542, 2017.
[27] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. InInternational conference on machine learning, pages 1310-1318, 2013.
[28] SilviuPitis.Recurrentneuralnetworksintensorflow1.r2rt.com/ recurrent-neural-networks-in-tensorflow-i, 2016. Accessed: 2018-11-13.
[29] Christopher Roth, Ingmar Kanitscheider, and Ila Fiete. Kernel RNN learning (keRNL). InInternational Conference on Learning Representations, 2019.URLhttps:// openreview.net/forum?id=ryGfnoC5KQ.
[30] Jo˜ao Sacramento, Rui Ponte Costa, Yoshua Bengio, and Walter Senn. Dendritic cortical microcircuits approximate the backpropagation algorithm. InAdvances in Neural Information Processing Systems, pages 8721-8732, 2018.
[31] Richard S Sutton and Andrew G Barto.Reinforcement learning: An introduction. MIT press, 2018.
[32] Corentin Tallec and Yann Ollivier. Unbiased online recurrent optimization.arXiv preprint arXiv:1702.05043, 2017.
[33] T. Tieleman and G. Hinton. Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 2012.
[34] Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. Show and tell: A neural image caption generator. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3156-3164, 2015.
[35] Paul J Werbos et al. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550-1560, 1990.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.