Speech to Text Apps

Thu, May 5, 2022

DeepSpeech : simpler although inferior
Kaldi : STT supports hybrid NN-HMM and lattice-free MMI models. Kaldi is used by many people both in research and in production.
Lingvo is the open source version of Google speech recognition toolkit, with support mostly for end-to-end models.
ESPNet is good and well known for end-to-end models as well.
RASR + RETURNN are very good as well, both for end-to-end models and hybrid NN-HMM, but they are for non-commercial applications only (or you need a commercial licence) (disclaimer: I work at the university chair which develops these frameworks).
http://gkarsay.github.io/parlatype/
https://github.com/juanerasmoe/pmTrans
https://pythonbasics.org/transcribe-audio/
Wav2Letter, the tool by Facebook.
snakers4/silero-models at mlnews Silero Speech to Text
coqui Coqui STT and TTS
voice2json - Command-line tools for speech and intent recognition on Linux
VOSK Offline Speech Recognition API
Dataset
- English: Tedlium, Librispeech, etc.
- https://github.com/gooofy/zamia-speech
- https://commonvoice.mozilla.org/en/datasets
- https://www.openslr.org/resources.php
snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple