# Listen Attend Spell
- [Listen, Attend and Spell](https://arxiv.org/abs/1508.01211)
- LAS
- learns to transcribe speech utterances to characters
- nlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly
- sequence-to-sequence framework
- trained end-to-end and has two main components: a listener (encoder) and a speller (decoder)
- listener is a pyramidal RNN encoder that accepts filter bank spectra as inputs, transforms the input sequence into a high level feature representation and reduces the number of timesteps that the decoder has to attend to.
- The speller is an [Attention](Attention.md)-based RNN decoder that attends to the high level [Features](Features.md) and spells out the transcript one character at a time
- The proposed system does not use the concepts of phonemes, nor does it rely on pronunciation dictionaries or HMMs
- bypass the [Conditional Independence](Conditional%20Independence.md) assumptions of [CTC](CTC.md), and show how they can learn an implicit language model that can generate multiple spelling variants given the same acoustics
- producing character sequences without making any independence assumptions between the characters is the key improvement of LAS over previous end-to-end [CTC](CTC.md) models
- used samples from the [Softmax](Softmax.md) classifier in the decoder as inputs to the next step prediction during training
- show how a language model trained on additional text can be used to rerank their top hypotheses
- [Google voice search task](Google%20voice%20search%20task.md)