# XLNet - [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) - modeling bidirectional contexts - denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on [Autoregressive](Autoregressive.md) language modeling - However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy - generalized [autoregressive] pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order (thereby proposing a new objective called Permutation Language Modeling), and (2) overcomes the limitations of BERT thanks to its [autoregressive](autoregressive] pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order (thereby proposing a new objective called Permutation Language Modeling), and (2) overcomes the limitations of BERT thanks to its [autoregressive.md) formulation - uses a permutation language modeling objective to combine the advantages of [Autoregressive](Autoregressive.md) and autoencoder methods