Voicegain introduces relative Speech-to-Text Accuracy SLA

Since June 2020, Voicegain has published benchmarks on the accuracy of its Speech-to-Text relative to big tech ASRs/Speech-to-Text engines like Amazon, Google, IBM and Microsoft.  

The benchmark dataset for this comparison has been a 3rd Party dataset published by an independent party and it includes a wide variety of audio data – audiobooks, youtube videos, podcasts, phone conversations, zoom meetings and more.

Here is a link to some of the benchmarks that we have published.

1.  Link to June 2020 Accuracy Benchmark

2.  Link to Sep 2020 Accuracy Benchmark

3.  Link to June 2021 Accuracy Benchmark

4. Link to Oct 2021 Accuracy Benchmark

5.  Link to June 2022 Accuracy Benchmark

Through this process, we have gained insights into what it takes to deliver high accuracy for a specific use case.


We are now introducing an industry-first relative Speech-to-Text accuracy benchmark to our clients. By "relative", Voicegain’s accuracy (measured by Word Error Rate) shall be compared with a big tech player that the client is comparing us to. Voicegain will provide an SLA that its accuracy vis-à-vis this big tech player will be practically on-par.

We follow the following 4 step process to calculate relative accuracy SLA  

1.  Identify Client Benchmark Dataset

In partnership with the client, Voicegain selects benchmark audio dataset that is representative of the actual data that the client shall process. Usually this is a randomized selection of client audio. We also recommend that clients retain their own independent benchmark dataset which is not shared with Voicegain to validate our results.

2.  Generate golden reference

Voicegain partners with industry leading manual AI labeling companies to generate a 99% human generated accurate transcript of this benchmark dataset. We refer to this as the golden reference.

3.  Run Relative Accuracy comparison

On this benchmark dataset, Voicegain shall provide scripts that enable clients to run a Word Error Rate (WER) comparison between the Voicegain platform and any one of the industry leading ASR providers that the client is comparing us to.

4.  Calculate KPIs for Relative Accuracy SLA

Currently Voicegain calculate the following two(2) KPIs 

a. Median Word Error Rate: This is the median WER across all the audio files in the benchmark dataset for both the ASRs

b. Fourth Quartile Word Error Rate: After you organize the audio files in the benchmark dataset in increasing order of WER with the Big Tech ASR, we compute and compare the average WER of the fourth quartile for both Voicegain and the Big Tech ASR 

So we contractually guarantee that Voicegain’s accuracy for the above 2 KPIs relative to the other ASR shall be within a threshold that is acceptable to the client. 

How often is this accuracy SLA measured?

Voicegain measures this accuracy SLA twice in the first year of the contract and annually once from the second year onwards.

What happens if Voicegain fails to meet the SLA?

If Voicegain does not meet the terms of the relative accuracy SLA, then we will train the underlying acoustic model to meet the accuracy SLA. We will take on the expenses associated with labeling and training . Voicegain shall guarantee that it shall meet the accuracy SLA within 90 days of the date of measurement.

Take Voicegain for a test drive!

1. Click here for instructions to access our live demo site.

2. If you are building a cool voice app and you are looking to test our APIs, click here to sign up for a developer account  and receive $50 in free credits

3. If you want to take Voicegain as your own AI Transcription Assistant to meetings, click here.

Sign up for an app today
* No credit card required.


Interested in customizing the ASR or deploying Voicegain on your infrastructure?

Contact Us → 
Voicegain - Speech-to-Text
Under Your Control