Text to Speech - Never Complete Only Abandoned

See also: [[Personal Assistant]], [[Speech to Text]], [[Speech to Text]] # Generators Quality can have more to do with the models than the software itself, but their demos are still what I judge on. ## Espeak Including Espeak-NG. - https://github.com/espeak-ng/espeak-ng/ ## Piper **Quality**: Good **Install**: Painless See also: [[Piper TTS]] Used by [[Personal Assistant#Rhasspy]] ## Mimic 3 **Quality**: Good **Install**: Meh See also: [[Mimic3]] ## Festival **Quality**: Medium Quality varies a lot. Some voices are loud and others quiet. Some are distorted and others are grainy. Some are pretty okay. - https://www.cstr.ed.ac.uk/projects/festival/ - https://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html ## Tortise TTS - https://nonint.com/static/tortoise_v2_examples.html - https://github.com/neonbjb/tortoise-tts Gives the feeling of a smaller project. The results seem to be pretty good. But the site feels kind of jank. ## RHVoice - https://rhvoice.org/ - https://github.com/RHVoice/RHVoice Documentation is spotty. Supposedly supports Linux but provides no info. Probably need to compile the C++ project from scratch. ## CMU Flite > CMU Flite (festival-lite) is a small, fast run-time open source text to speech synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative text to speech synthesis engine to [Festival](http://festvox.org/festival) for voices built using the [FestVox](http://festvox.org) suite of voice building tools. - http://www.festvox.org/flite/ - https://github.com/festvox/flite ## Larynx Predecessor to [[#Piper]]. ## Mary TTS - ! Written in Java - https://github.com/marytts/marytts - https://marytts.github.io/ ## Coqui TTS - https://github.com/coqui-ai/TTS - https://docs.coqui.ai/en/latest/ I'm not really clear what it is capable of, their demos are weird, and there's a lot of emojis and marketing speak. ## gTTS - ! Sends text to Google - https://github.com/pndurette/gTTS ## StyleTTS2 Primarily a research project, but seems to do a good job of reading text. Also can replicate the prosody of random speakers. Written in [[Python]]. - https://github.com/yl4579/StyleTTS2 - https://styletts2.github.io/ # Utilities ## SpeechD - https://freebsoft.org/speechd ## Obsidian TTS - https://github.com/joethei/obsidian-tts/issues/9 ## Gruut > A tokenizer, text cleaner, and [IPA](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) phonemizer for several human languages that supports [SSML](https://github.com/rhasspy/gruut/#ssml). - https://github.com/rhasspy/gruut/ ## PyTTSx3 Seems to be a wrapper library for some of the others here. - https://www.geeksforgeeks.org/python-text-to-speech-by-using-pyttsx3/ - https://pypi.org/project/pyttsx3/ # Other ## Bark Takes prompts to generate audio, not really TTS, sort of a Dali/GPT generator for voice and audio. Results may deviate from the prompt, but it may also seem very lifelike. - https://github.com/suno-ai/bark # Subfolders ```dataview LIST FROM #foldernote WHERE contains(file.folder, this.file.folder) AND file != this.file SORT file.name ASC ``` # Notes in this Folder ```dataview LIST FROM -#foldernote WHERE file.folder = this.file.folder AND database-plugin != "basic" SORT file.name ASC ``` # References - https://askubuntu.com/questions/53896/natural-sounding-text-to-speech