## AudioLM - maps the input audio into a sequence of discrete tokens and casts audio generation as language modeling task in this representation space - training on large corpora of raw - audio waveforms - learns to generate natural and coherent continuations given short prompts - extended beyond speech by generating coherent piano music continuations, despite being trained without any symbolic representation of music - When it comes to audio synthesis, multiple scales make achieving high audio quality while displaying consistency very challenging - This gets achieved by this model by combining recent advances in neural audio compression, self-supervised representation learning and language modelling.