# XLM-R - [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) - pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks - Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered [CommonCrawl](CommonCrawl.md) data - significantly outperforms multilingual BERT - low-resource languages - positive transfer and capacity dilution - performance of high and low resource languages at scale - possibility of multilingual modeling without sacrificing per-language performance