# Transformer-XL
- [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860)
- Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling
- learning dependency beyond a fixed length without disrupting temporal coherence
- segment-level recurrence mechanism and a novel positional encoding scheme
- resolves the context fragmentation problem
- enwiki8
- WikiText
- One [Billion Word](Billion%20Word.md)
- Penn Treebank