Mimic3 - Never Complete Only Abandoned

Mimic3 is a [[AGPLv3]] licensed text to speech program and library written in [[Python]]. - Website - [Source](https://github.com/MycroftAI/mimic3) - [Documentation](https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/mimic-3#command-line-interface) - [Voices](https://github.com/MycroftAI/mimic3-voices) > A fast local neural text to speech engine for Mycroft. # Notability See also: [[Text to Speech]] Built for [[Mycroft AI]]. See also [[Personal Assistant]]. # Philosophy - [Mimic 1](https://github.com/MycroftAI/mimic1) was based on [[Text to Speech#CMU Flite]]. - [Mimic 2](https://github.com/MycroftAI/mimic2) was based on the Tacotron architecture. - The current Mimic 3 is based on VITS. > Mimic 3 uses the [VITS](https://arxiv.org/abs/2106.06103), a "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech". VITS is a combination of the [GlowTTS duration predictor](https://arxiv.org/abs/2005.11129) and the [HiFi-GAN vocoder](https://arxiv.org/abs/2010.05646). > > Our implementation is heavily based on [Jaehyeon Kim's PyTorch model](https://github.com/jaywalnut310/vits), with the addition of [Onnx runtime](https://onnxruntime.ai/) export for speed. \- https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/mimic-3#how-it-works # OS Support - [[Linux]] Only distributed via `.deb` or *giant* set of `pip` dependencies. # Features Supports CUDA rendering, if correct libraries installed. Supports voices which use the [[Text to Speech#Espeak]], [[Text to Speech#Gruut]], and [[Text to Speech#Epitran]] phonemizers. ## CLI The command line is simple to use, but it doesn't seem to support voice streaming. It can generate audio from a large file in under a minute on my desktop, but there is a fixed minimum startup time regardless so they recommend running it as a background service. [[Piper TTS]] can do very rapid text to audio streaming. ## Favorite Voices Lots of voices in different models, some of them quite good. Unfortunately they only seem to distribute low quality versions of them. Still, even some of those low quality models sound very good. - Alan Pope - https://github.com/MycroftAI/mimic3-voices/tree/master/voices/en_UK/apope_low/ - `apope_low` - RP, masc, slow and deep but a bit monotone - CMU Arctic - http://www.festvox.org/cmu_arctic/ - `cmu-arctic_low 3 ksp` - ??, masc, Uncertain about the accent type, but it is intelligible while being mostly flat - LJ Speech Dataset - https://keithito.com/LJ-Speech-Dataset/ - `ljspeech_low` - GA, femme, This voice is very clear and has a newscaster quality to it - CSTR VCTK Corpus - https://datashare.ed.ac.uk/handle/10283/3443 - `vctk_low p236` - RP, femme, A distinctive british girl - `vctk_low p274` - RP, masc, A distinctive young british man - `vctk_low p276` - RP, femme, A distinctive young british woman - `vctk_low p336` - RP, femme, A distinctive young british woman - `vctk_low p288` - RP?, femme, A voice with good emphasis, young british woman - left off at `p230` when the page stopped working # Tips ## Usage ```sh echo "Sphinx of black quartz, judge my vow!" | mimic3 --voice en_US/ljspeech_low ``` ## Server ```sh mimic3-server --voice en_US/ljspeech_low ``` ## Speech Dispatcher In `/etc/speech-dispatcher/modules/mimic3-generic.conf` from [Mimic3 example](https://github.com/MycroftAI/mimic3/blob/master/examples/speech-dispatcher/mimic3-generic.conf): ```sh Debug 0 GenericExecuteSynth "printf %s \'$DATA\' | $HOME/.local/bin/mimic3 --remote --voice \'$VOICE\' --stdout | $PLAY_COMMAND" GenericCmdDependency "$HOME/.local/bin/mimic3" AddVoice "en" "MALE1" "en_UK/apope_low" AddVoice "en" "FEMALE1" "en_US/ljspeech_low" AddVoice "en" "MALE2" "en_US/cmu-arctic_low" AddVoice "en" "MALE3" "en_US/hifi-tts_low" AddVoice "en" "FEMALE2" "en_US/m-ailabs_low" AddVoice "en" "FEMALE3" "en_US/vctk_low" DefaultVoice "en_US/ljspeech_low" ``` Note that the names must be `MALE1` etc due to some weird limitation of the Speech Dispatcher software. In `$HOME/.config/systemd/user/mimic3.service`: ```ini [Unit] Description=Run Mimic 3 web server Documentation=https://github.com/MycroftAI/mimic3 [Service] ExecStart=/path/to/mimic3-server [Install] WantedBy=default.target ``` # References - https://askubuntu.com/questions/53896/natural-sounding-text-to-speech - https://community.mycroft.ai/t/cannot-use-mimic3-from-speech-dispatcher/13198/3 - https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/mimic-3#speech-dispatcher