Transformer-XL - Subhaditya's Website

# Transformer-XL - [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) - Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling - learning dependency beyond a fixed length without disrupting temporal coherence - segment-level recurrence mechanism and a novel positional encoding scheme - resolves the context fragmentation problem - enwiki8 - WikiText - One [Billion Word](Billion%20Word.md) - Penn Treebank