Mozilla Voice STT

Mozilla Voice STT

Mozilla Voice STT (formerly research project “Deep Speech”) is an advanced open-source Speech-to-Textengine which aims to make speech recognition technology openly available to developers.

Supported by a community of like-minded developers, companies, and researchers, we have applied sophisticated machine learning techniques and a variety of innovations to build a deep learning-based STT engine that approaches human accuracy. Implemented with Google’s TensorFlow framework, it can run on anything from an off-line Raspberry Pi 4 to a server class machine, obviating the need to pay patent royalties or exorbitant fees for existing STT services.

Together with the growing Common Voice dataset we believe this technology can and will enable a wave of innovative products and services, and that it should be available to everyone.

GitHub Repository

"We have applied sophisticated machine learning techniques and a variety of innovations to build a deep learning-based STT engine that approaches human accuracy."

Getting Started

Mozilla Voice STT is a vibrant open source tool. Building on this foundation guarantees future developer support and continued performance optimizations. Its architecture allows for easy localization. No advanced linguistic knowledge is required to localize to a particular language, only data. This is in stark contrast to more traditional STT architectures, where advanced linguistic knowledge is required and hinders adoption by under-resourced languages.

Mozilla Voice STT offers packages for .NET, Python, JavaScript, and C bindings all available on GitHub. You can download our pre-trained models and start using Mozilla Voice STT in minutes. If you’d like to know more, you can find detailed release notes in the GitHub repo; installation and usage explanations in our README.

If that doesn’t cover what you’re looking for, you can also use our discussion forum to engage with the rest of the community.

Voice STT News