Build Voice AI apps with our Speech-to-Text APIs. Transcribe & analyze meetings, contact center calls, videos, podcasts and more. All built on our highly accurate and affordable deep-learning ASR. Train on your data to build custom models with very high accuracy.
APIs for Developers • Transcription for Business Users • Automation & Analytics for Call Centers
Voicegain’s deep learning ASR offers an unbeatable combination of accuracy, price and flexibility. Voicegain ASR can be deployed on-premise, in your VPC or invoked as a cloud service. We integrate out-of-the-box with leading contact center, video meeting and bot platforms.
Voicegain’s out-of-the-box accuracy – for both batch and streaming speech recognition - are on par with the very best. But you can achieve accuracy in the high 90s when you train with your data.
Voicegain is priced 50%-75% lower than the large Cloud Speech-to-Text players. Our Edge pricing is also very affordable compared to competing options.
Access Voicegain on our multi-tenant Cloud. Or deploy it in your Datacenter or VPC. Use your existing audio infrastructure and integrate with a protocol of your choice.
Our ASR is built on most recent advances in deep learning. We utilize end-to-end transformer-based deep neural networks and we have trained it with several tens of thousands of hours of diverse audio datasets.
APIs to embed transcription into your app and build voice bots accessible over telephony. Deploy Voicegain on your infrastructure (VPC, Datacenter) or use our cloud service
Transcribe meetings, webinars and events live using microphone or browser audio sharing. Or simply convert pre-recorded audio content into text. White-label or Source Code License of UI available.
Automate Quality Assurance and extract CX insights from voice interactions in contact center. White-label or Source Code License of UI available.
In addition to the current support for English, Spanish, and German languages in its Speech-to-Text platform, Voicegain is releasing support for many new languages over the next couple of months.
You can access these languages right now from the Web Console or via our Transcribe App or via the API
Upon request we will make these languages available for your testing. Generally, they can be available within hours from receiving a request. Please contact us at support@voicegain.ai
The Alpha early access models differ from full-featured production models in the following ways:
As alpha models are being trained on additional data, their accuracy will improve. We are also working on punctuation, capitalization, and formatting of each of those models.
We will update this post as soon as these languages are available in the Alpha early access program.
[UPDATE 1/23/22: After training on additional data, the Voicegain recognizer now achieves an average WER of 11.89% (an improvement of 0.35%) and a median WER of 10.82% (an improvement of 0.21%) on this benchmark.
Voicegain is now better than Google Enhanced on 44 files (previously 39).
Voicegain is now the most accurate recognizer on 12 of the files (previously 10).
We have additional data on which we will be training soon and will then provide a complete new set of results and comparison.]
It has been over 4 months since we published our last speech recognition accuracy benchmark. Back then the results were as follows (from most accurate to least): Amazon and Microsoft (close 2nd), then Google Enhanced and Voicegain (also close 4th) and then, far behind, IBM Watson and Google Standard.
Since then we have tweaked the architecture of our model and trained it on more data. This resulted in a further increase in the accuracy of our model. As far as the other recognizers are concerned, Microsoft improved the accuracy of their model the most, while the accuracy of others stayed more or less the same.
We have repeated the test using similar methodology as before: used 44 files from the Jason Kincaid data set and 20 files published by rev.ai and removed all files where the best recognizer could not achieve a Word Error Rate (WER) lower than 25%. Note: previously, we used 20% as the threshold, but this time we decided to keep more files with low accuracy to illustrate the differences on that type of files between recognizers.
Only three files were so difficult that none of the recognizers could achieve 25% WER. The two removed files were both radio phone interviews with bad quality of the recording.
As you can see in the results chart above, Voicegain is now better than Google Enhanced, both on average and median WER. Looking at the individual files the results also show the Voicegain accuracy is in most of the case better than Google:
Key observations about other results:
As you can see the field is very close and you get different results on different files (the average and median do not paint the whole picture). As always, we invite you to review our apps, sign-up and test our accuracy with your data.
When you have to select speech recognition/ASR software, there are other factors beyond out-of-the-box recognition accuracy. These factors are, for example:
1. Click here for instructions to access our live demo site.
2. If you are building a cool voice app and you are looking to test our APIs, click here to sign up for a developer account and receive $50 in free credits
3. If you want to take Voicegain as your own AI Transcription Assistant to meetings, click here.
Interested in customizing the ASR or deploying Voicegain on your infrastructure?