Today we are really excited to announce the launch of Voicegain Whisper, an optimized version of Open AI's Whisper Speech recognition/ASR model that can be accessed using Voicegain APIs. The same APIs currently process over 60 Million minutes of audio every month for leading enterprises in the US including Samsung, Aetna and several Fortune 100 enterprise. Generative AI developers now have access to a well-tested accurate, affordable and accessible transcription API. They can integrate Voicegain Whisper APIs with LLMs like GPT 3.5 and 4 (from Open AI) PaLM2 (from Google), Claude (from Anthropic), LLAMA 2 (Open Source from Meta), and their own private LLMs to power generative conversational AI apps. Open AI open sourced several versions of the Whisper models released. With today's release Voicegain supports Whisper-medium, Whisper-small and Whisper-base. Voicegain now supports transcription in over 99 different languages that are supported by Whisper.
There are four main reasons for developers to use Voicegain Whisper over other offerings:
1. Support for Private Cloud/On-Premise deployment (integrate with Private LLMs)
While developers can use Voicegain Whisper on our multi-tenant cloud offering, a big differentiator for Voicegain is our support for the Edge. The Voicegain platform has been architected and designed for single-tenant private cloud and datacenter deployment. In addition to the core deep-learning-based Speech-to-text model, our platform includes our REST API services, logging and monitoring systems, auto-scaling and offline task and queue management. Today the same APIs are enabling Voicegain to processes over 60 Million minutes a month. We can bring this practical real-world experience of running AI models at scale to our developer community.
Since the Voicegain platform is deployed on Kubernetes clusters, it is well suited for modern AI SaaS product companies and enterprises that want to integrate with their private LLMs.
2. Affordable pricing - 40% less expensive than Open AI
At Voicegain, we have optimized Whisper for higher throughput. As a result, we are able to offer access to the Whisper model at a price that is 40% lower than what Open AI offers.
3. Enhanced features for Contact Centers & Meetings.
Voicegain also offers critical features for contact centers and meetings. Our APIs support two-channel stereo audio - which is common in contact center recording systems. Word-level timestamps is another important feature that our API offers which is needed to map audio to text. There is another feature that we have for the Voicegain models - enhanced diarization models - which is a required feature for contact center and meeting use-cases - will soon be made available on Whisper.
4. Premium Support and uptime SLAs.
We also offer premium support and uptime SLAs for our multi-tenant cloud offering. These APIs today process over 60 millions minutes of audio every month for our enterprise and startup customers.
About OpenAI-Whisper Model
OpenAI Whisper is an open-source automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The architecture of the model is based on encoder-decoder transformers system and has shown significant performance improvement compared to previous models because it has been trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection.
Getting Started with Voicegain
Any developer - whether you are a one person startup or a large enterprise - can access the Voicegain Whisper model by signing up for a free developer account. We offer 15,000 mins of free credits when you sign up today. This should allow you to build and test your app. Here is a link to get started on Voicegain Console, our developer focused web application. Here is also a link to our Github
There are two ways to select Voicegain Whisper. The first is to configure the settings in Voicegain Consoler, our developer focused UI. The second method is to configure Whisper as the model in API settings. If you would like more information or if you have any questions, please drop us an email firstname.lastname@example.org