## adaQN

swMATH ID: | 44504 |

Software Authors: | Nitish Shirish Keskar, Albert S. Berahas |

Description: | adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs. Recurrent Neural Networks (RNNs) are powerful models that achieve exceptional performance on several pattern recognition problems. However, the training of RNNs is a computationally difficult task owing to the well-known ”vanishing/exploding” gradient problem. Algorithms proposed for training RNNs either exploit no (or limited) curvature information and have cheap per-iteration complexity, or attempt to gain significant curvature information at the cost of increased per-iteration cost. The former set includes diagonally-scaled first-order methods such as ADAGRAD and ADAM, while the latter consists of second-order algorithms like Hessian-Free Newton and K-FAC. In this paper, we present adaQN, a stochastic quasi-Newton algorithm for training RNNs. Our approach retains a low per-iteration cost while allowing for non-diagonal scaling through a stochastic L-BFGS updating scheme. The method uses a novel L-BFGS scaling initialization scheme and is judicious in storing and retaining L-BFGS curvature pairs. We present numerical experiments on two language modeling tasks and show that adaQN is competitive with popular RNN training algorithms. |

Homepage: | https://arxiv.org/abs/1511.01169 |

Source Code: | https://github.com/david-cortes/stochQN |

Keywords: | Machine Learning; arXiv_cs.LG; Optimization and Control; arXiv_math.OC; arXiv_stat.ML; Recurrent Neural Networks; RNNs |

Related Software: | L-BFGS; Saga; CIFAR; LIBSVM; AdaGrad; SONIA; Adam; CUTEst; LDGB; Finito; MNIST |

Cited in: | 3 Documents |

all
top 5

### Cited by 9 Authors

2 | Berahas, Albert S. |

1 | Curtis, Frank E. |

1 | Eisen, Mark |

1 | Jahani, Majid |

1 | Mokhtari, Aryan |

1 | Ribeiro, Alejandro R. |

1 | Richtárik, Peter |

1 | Takáč, Martin |

1 | Zhou, Baoyu |

### Cited in 3 Serials

1 | Mathematical Programming. Series A. Series B |

1 | SIAM Journal on Optimization |

1 | Optimization Methods & Software |

### Cited in 3 Fields

3 | Operations research, mathematical programming (90-XX) |

1 | Calculus of variations and optimal control; optimization (49-XX) |

1 | Numerical analysis (65-XX) |