# BART
- [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)
- [[Denoising Autoencoder]]
- pretraining sequence-to-sequence
- trained by corrupting text with an arbitrary noising function, and learning a model to reconstruct the original text
- generalizing BERT (due to the bidirectional encoder), [[GPT]] (with the left-to-right decoder),
- finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token
- With BERT, random tokens are replaced with masks, and the document is encoded bidirectionally. Missing tokens are predicted independently, so BERT cannot easily be used for generation.
- With [[GPT]], tokens are predicted auto-regressively (generation of a new token is conditioned on the prior tokens), meaning [[GPT]] can be used for generation.
- noising schemes to an input document and thus corrupts it by replacing spans of text with mask symbols
- effective when finetuned for text generation but also works well for comprehension tasks
- matches the performance of [[RoBERTa]] with comparable training resource
- [[GLUE]]
- [[SQuAD]]