×

Transformer-XL

swMATH ID: 36208
Software Authors: Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Description: Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80
Homepage: https://arxiv.org/abs/1901.02860
Dependencies: Python
Keywords: Machine Learning; arXiv_cs.LG; Computation and Language; arXiv_cs.CL; arXiv_stat.ML; Transformer-XL; Tensorflow; PyTorch
Related Software: BERT; XLNet; GPT-3; Tensor2Tensor; TensorFlow; SentencePiece; ALBERT; RoBERTa; GloVe; word2vec; DARTS; cmix; TopicRNN; Penn Treebank; PyTorch; MADE; MaskGAN; GLUE; BART; AutoExtend
Referenced in: 4 Publications

Standard Articles

1 Publication describing the Software Year
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
2019

Referencing Publications by Year