Mozilla Released Deep Speech 0.6, a speech-to-text engine

December 5, Mozilla released DeepSpeech 0.6, a speech-to-text engine. The quality has improved, and now it supports Windows.

DeepSpeech is an automatic speech recognition (ASR) engine on which Mozilla works, aiming to offer speech recognition technology and trained models to developers. It is a deep learning based engine with a simple API. Pre-trained American English model is available. The project was launched in 2017.

The latest version offers optimized quality and simplified API. It supports TensorFlow 1.14.0. Having CuDNN RNN support added to the training graph, the training performance is expected to improve substantially. It now supports TensorFlow Lite. As to API, function names are more uniform, and unused parameters are removed. It now includes a simple wrapper header that can be used in C API.

Thanks to the changes added to the data structure for the language model trie file, now the file can be memory mapped when loading. Language model has been updated, and uncommon words have been removed. As a result, only the top 500,000 words from the text, which it was trained on, are included, and now the size has been halved from 1800MB in version 0.5.1 to 900M in the latest version. It’s been reported that the error rate on the LibriSpeech test set was as low as 7.5%.

DeepSpeech 0.6 comes with bindings for .Net, Python, Java Script, and C. The most requested feature was the support for Windows.

DeepSpeech is available on the project’s website.

DeepSpeech
https://github.com/mozilla/DeepSpeech