swMATH ID: 44504
Software Authors: Nitish Shirish Keskar, Albert S. Berahas
Description: adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs. Recurrent Neural Networks (RNNs) are powerful models that achieve exceptional performance on several pattern recognition problems. However, the training of RNNs is a computationally difficult task owing to the well-known ”vanishing/exploding” gradient problem. Algorithms proposed for training RNNs either exploit no (or limited) curvature information and have cheap per-iteration complexity, or attempt to gain significant curvature information at the cost of increased per-iteration cost. The former set includes diagonally-scaled first-order methods such as ADAGRAD and ADAM, while the latter consists of second-order algorithms like Hessian-Free Newton and K-FAC. In this paper, we present adaQN, a stochastic quasi-Newton algorithm for training RNNs. Our approach retains a low per-iteration cost while allowing for non-diagonal scaling through a stochastic L-BFGS updating scheme. The method uses a novel L-BFGS scaling initialization scheme and is judicious in storing and retaining L-BFGS curvature pairs. We present numerical experiments on two language modeling tasks and show that adaQN is competitive with popular RNN training algorithms.
Homepage: https://arxiv.org/abs/1511.01169
Source Code:  https://github.com/david-cortes/stochQN
Keywords: Machine Learning; arXiv_cs.LG; Optimization and Control; arXiv_math.OC; arXiv_stat.ML; Recurrent Neural Networks; RNNs
Related Software: L-BFGS; Saga; CIFAR; LIBSVM; AdaGrad; SONIA; Adam; CUTEst; LDGB; Finito; MNIST
Cited in: 3 Documents

Citations by Year