## Jukebox - generates music with singing in the raw audio domain - earlier models in the text-to-music genre generated music symbolically in the form of a pianoroll which specifies timing, pitch and velocity. - The challenging aspect is the non-symbolic approach where music is tried to be produced directly as a piece of audio - the space of raw audio is extremely high dimensional which makes the problem very challenging - the key issue is that modelling that raw audio produces long-range dependencies, making it computationally challenging to learn the high-level semantics of music. - hierarchical VQ-VAE architecture to compress audio into a discrete space [14], with a loss function designed to retain the most amount of information. - This model produces songs from very diferent genres such as rock, hip-hop and jazz.