• Jacek Jarmulak

Speech-to-Text Accuracy Benchmark - September 2020

Updated: 5 days ago

[UPDATE: For results reported using slightly different methodology see our new blog post.]

This is a continuation of the blog post from June where we reported the previous speech-to-text accuracy results. We encourage you to read it first, as it sets up a context to better understand the significance of benchmarking for speech-to-text.

Apart for that background intro, the key differences from the previous post are:

  • We have improved our recognizer and we are now essentially tied with Amazon

  • We added another set of benchmark files - 20 files published by rev.ai. Please reference the data linked here when trying to reproduce this benchmark.

Here are the results.

Comparison to the June benchmark on 44 files.

Less than 3 months have passed from the previous test, so it is not surprising to see no improvement on Google and Amazon recognizers.

Voicegain recognizer is now very close to Amazon. It is on average better from Amazon on the first 50th percentile of the data, and somewhat worse on the remainder of the data.

Microsrosoft recognizer has improved during this time period - on the 44 benchmark files it is now on average better than Google Enhanced (in the chart we retained ordering from the June test). The single bad outlier in Google Enhanced results does alone not account for the better average WER on the Microsoft on this data set.

Google Standard is still very bad and we will likely stop reporting on it in detail in our future comparisons.

Results from the benchmark on 20 new files.

The audio from the 20-file rev.ai test is not as challenging as some of the files in the 44-file benchmark set. Consequently the results are on average better, and the ranking of the recognizers also changes.

As you can see in this chart, on this data set the Voicegain recognizer is marginally better than Amazon in. It has lower WER on 11 out of 20 test files and it beats Amazon in the mean and median values. On this data set Google Enhanced beats Microsoft.

Combined results on 44+20 files

Finally, here are the combined results for all the 64 benchmark files we tested.

What we would like to point out is that when comparing Voicegain to Amazon and Google Enhanced to Microsoft, one of each pair wins if we compare the average WER while the other has a better median WER value. This highlights that the results vary a lot depending on what specific audio file is being compared.


These results highlight that choosing the best recognizer for a given application should be done only after thorough testing. Performance of the recognizers varies a lot depending on the audio data and acoustic environment. Moreover, the prices vary significantly. We encourage you to try the Voicegain Speech-to-Text engine for your application. It might be a better fit for your application. Even if the accuracy is a couple of points behind the top players, you might still want to consider Voicegain because:

  • Our acoustic models can be customized to your specific speech audio and this can reduce the word error rates below the best out-of-the-box options

  • If the accuracy difference is small, Voicegain might still make sense given the lower price.

  • We are continuously training our recognizer and it is only a matter of time before we catch up.

Contact Us