adaQN swMATH ID: 44504 Software Authors: Nitish Shirish Keskar, Albert S. Berahas Description: adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs. Recurrent Neural Networks (RNNs) are powerful models that achieve exceptional performance on several pattern recognition problems. However, the training of RNNs is a computationally difficult task owing to the well-known ”vanishing/exploding” gradient problem. Algorithms proposed for training RNNs either exploit no (or limited) curvature information and have cheap per-iteration complexity, or attempt to gain significant curvature information at the cost of increased per-iteration cost. The former set includes diagonally-scaled first-order methods such as ADAGRAD and ADAM, while the latter consists of second-order algorithms like Hessian-Free Newton and K-FAC. In this paper, we present adaQN, a stochastic quasi-Newton algorithm for training RNNs. Our approach retains a low per-iteration cost while allowing for non-diagonal scaling through a stochastic L-BFGS updating scheme. The method uses a novel L-BFGS scaling initialization scheme and is judicious in storing and retaining L-BFGS curvature pairs. We present numerical experiments on two language modeling tasks and show that adaQN is competitive with popular RNN training algorithms. Homepage: https://arxiv.org/abs/1511.01169 Source Code: https://github.com/david-cortes/stochQN Keywords: Machine Learning; arXiv_cs.LG; Optimization and Control; arXiv_math.OC; arXiv_stat.ML; Recurrent Neural Networks; RNNs Related Software: L-BFGS; Saga; CIFAR; LIBSVM; AdaGrad; SONIA; Adam; CUTEst; LDGB; Finito; MNIST Cited in: 3 Documents all top 5 Cited by 9 Authors 2 Berahas, Albert S. 1 Curtis, Frank E. 1 Eisen, Mark 1 Jahani, Majid 1 Mokhtari, Aryan 1 Ribeiro, Alejandro R. 1 Richtárik, Peter 1 Takáč, Martin 1 Zhou, Baoyu Cited in 3 Serials 1 Mathematical Programming. Series A. Series B 1 SIAM Journal on Optimization 1 Optimization Methods & Software Cited in 3 Fields 3 Operations research, mathematical programming (90-XX) 1 Calculus of variations and optimal control; optimization (49-XX) 1 Numerical analysis (65-XX) Citations by Year