Our Blog

News, Insights, sample code & more!

ASR
Announcing the launch of Voicegain Whisper Speech Recognition for Gen AI developers

Today we are really excited to announce the launch of Voicegain Whisper, an optimized version of Open AI's Whisper Speech recognition/ASR model that can be accessed using Voicegain APIs. The same APIs currently process over 60 Million minutes of audio every month for leading enterprises in the US including Samsung, Aetna and several Fortune 100 enterprise. Generative AI developers now have access to a well-tested accurate, affordable and accessible transcription API. They can integrate Voicegain Whisper APIs with LLMs like GPT 3.5 and 4 (from Open AI) PaLM2 (from Google), Claude (from Anthropic), LLAMA 2 (Open Source from Meta), and their own private LLMs to power generative conversational AI apps. Open AI open sourced several versions of the Whisper models released. With today's release Voicegain supports Whisper-medium, Whisper-small and Whisper-base. Voicegain now supports transcription in over 99 different languages that are supported by Whisper.


There are four main reasons for developers to use Voicegain Whisper over other offerings:

1. Support for Private Cloud/On-Premise deployment (integrate with Private LLMs)

While developers can use Voicegain Whisper on our multi-tenant cloud offering, a big differentiator for Voicegain is our support for the Edge. The Voicegain platform has been architected and designed for single-tenant private cloud and datacenter deployment. In addition to the core deep-learning-based Speech-to-text model, our platform includes our REST API services, logging and monitoring systems, auto-scaling and offline task and queue management. Today the same APIs are enabling Voicegain to processes over 60 Million minutes a month. We can bring this practical real-world experience of running AI models at scale to our developer community.

Since the Voicegain platform is deployed on Kubernetes clusters, it is well suited for modern AI SaaS product companies and enterprises that want to integrate with their private LLMs.

2. Affordable pricing - 40% less expensive than Open AI 

At Voicegain, we have optimized Whisper for higher throughput. As a result, we are able to offer access to the Whisper model at a price that is 40% lower than what Open AI offers.

3. Enhanced features for Contact Centers & Meetings.

Voicegain also offers critical features for contact centers and meetings. Our APIs support two-channel stereo audio - which is common in contact center recording systems. Word-level timestamps is another important feature that our API offers which is needed to map audio to text. There is another feature that we have for the Voicegain models - enhanced diarization models - which is a required feature for contact center and meeting use-cases - will soon be made available on Whisper.

4. Premium Support and uptime SLAs.

We also offer premium support and uptime SLAs for our multi-tenant cloud offering. These APIs today process over 60 millions minutes of audio every month for our enterprise and startup customers.

About OpenAI-Whisper Model

OpenAI Whisper is an open-source automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The architecture of the model is based on encoder-decoder transformers system and has shown significant performance improvement compared to previous models because it has been trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection.

OpenAI Whisper model encoder-decoder transformer architecture

Source

Getting Started with Voicegain

Any developer - whether you are a one person startup or a large enterprise - can access the Voicegain Whisper model by signing up for a free developer account. We offer 15,000 mins of free credits when you sign up today. This should allow you to build and test your app. Here is a link to get started on Voicegain Console, our developer focused web application. Here is also a link to our Github

There are two ways to select Voicegain Whisper. The first is to configure the settings in Voicegain Consoler, our developer focused UI. The second method is to configure Whisper as the model in API settings. If you would like more information or if you have any questions, please drop us an email support@voicegain.ai

Read more → 
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Practical considerations for developers considering OpenAI's Whisper ASR
ASR
Practical considerations for developers considering OpenAI's Whisper ASR

On March 1st 2023, Open AI announced that developers could access the Open AI Whisper Speech-to-Text model via easy-to-use REST APIs. OpenAI also released APIs to GPT3.5, the LLM behind the buzzy ChatGPT product. General availability of the next version of LLM - GPT 4 is expected in July 2023.

Since Open AI Whisper's initial release in October 2022, it has been a big draw for developers. A highly accurate open-source ASR is extremely compelling. OpenAI's Whisper has been trained on 680,000 hours of audio data which is much more than what most models are trained on. Here is a link to their github.

However the developer community looking to leverage Whisper faces three major limitations:

1. Infrastructure Costs: Running Whisper - especially the large and medium models - requires expensive memory-intensive GPU based compute options (see below).

2. In-house AI expertise: To use Open AI's Whisper model, a company has to invest in building an in-house ML engineering team that is able to operate, optimize and support Whisper in a production environment. While Whisper provides core features like  Speech-to-Text, language identification, punctuation and formatting, there are still some missing AI features like speaker diarization and PII redaction that would need to be developed. In addition, companies would need to put in place a real-time NOC for ongoing support. Even a small scale 2-3 person developer team could be expensive to hire and maintain - unless the call volumes justify such an investment. This in-house team also needs to take full responsibility for the Cloud infrastructure related tasks like auto-scaling and log monitoring to ensure uptime.

3. Lack of support for real-time: Whisper is a batch speech-to-text model. For developers requiring streaming Speech-to-Text models, they need to evaluate other ASR/STT options.

By now taking over the responsibility of hosting this model and making it accessible via easy-to-use APIs, both Open AI and Voicegain addresses the first two limitations.

Aug 2023 Update: On Aug 5th 2023, Voicegain announced the release Voicegain Whisper, an optimized version of Open AI's Whisper using Voicegain APIs. Here is a link to the announcement. In addition to Voicegain Whisper, Voicegain also offer realtime/streaming Speech-to-Text and other features like two-channel/stereo support (required for call centers), speaker diarization and PII redaction. All of this is offered in Voicegain's PCI and SOC-2 compliant infrastructure.


This article highlights some of the key strengths and limitations of using Whisper - whether using Open AI's APIs, Voicegain APIs or hosting it on your own. 


Strengths

1. Accuracy

In our benchmark tests, OpenAI's Whisper models demonstrated high accuracy for a widely diverse range of audio datasets. Our ML engineers concluded that the Whisper models perform well on audio datasets ranging from meetings, podcasts, classroom lectures, YouTube videos and call center audio. We benchmarked Whisper-base, Whisper-small and Whisper-medium against some of the best ASR/Speech-to-Text engines in the market.

The median Word Error Rate (WER) for Whisper-medium was 11.46% for meeting audio and 17.7% for call center audio. This was indeed lower than the WERs of STT offerings of other large players like Microsoft Azure and Google. We did find that AWS Transcribe had a WER that is competitive with Whisper. 

Here is an interesting observation - it is possible to exceed Whisper's recognition accuracy, however it would take building custom models. Custom models are models that are trained on our client's specific audio data. In fact for call center audio, our ML Engineers were able to demonstrate that our call-center specific Speech-to-text models were either equal to or even better than some of the Whisper models. This makes intuitive sense because call center audio is not readily available on the internet for Open AI to get access to.

Please contact us via email (support@voicegain.ai) if you would like to review and validate/test these accuracy benchmarks.

2. Affordable relative to the Big players, but not the least expensive Whisper API in the market

Whisper's pricing at $0.006/min ($0.36/hour) is much lower than the Speech-to-Text offerings of some of the other larger cloud players. This translates to a 75% discount to Google Speech-to-Text and AWS Transcribe (based on pricing as of the date of this post).

Aug 2023 Update: At the launch of Voicegain Whisper, Voicegain announced a list price at $0.0037/min ($0.225/hour). This price is 37.5% lower than Open AI's price and has been accomplished since we optimized the throughput of Whisper. To test it out, please sign up for a free developer account. Instructions are provided here.

3. Whisper API + ChatGPT API, built to be used together

What was also significant was Open AI announced the release of ChatGPT APIs with the release of Whisper APIs. Developers can combine the power of Whisper Speech-to-Text models with the GPT 3.5 and GPT 4.0 LLM (the underlying model that ChatGPT uses) to power very interesting conversational AI apps. However here is an important consideration - Using Whisper API with LLMs like ChatGPT works as long as the app only uses batch/pre-recorded audio (e.g analyzing recording of call center conversations for QA or Compliance or transcribe and mine Zoom meetings to recollect context). For developers looking to build Voice Bots or Speech IVRs, they would need a good real-time Speech-to-Text model.

Limitations

1. Does not support Streaming/Real-time

As stated above, Open AI's Whisper  does not support apps that require real-time/streaming transcription - this could be relevant to a wide variety of AI apps that target call center, education, legal and meetings use-case. In case you are looking for a streaming Speech-to-Text API provider, please feel free to contact us with the email address provided below

2. Infrastructure Costs of running Whisper

The throughput of Whisper models - both for the medium and large models - is relatively low. At Voicegain, our ML engineers have tested the throughput of Whisper models on several popular NVIDIA GPU-based compute instances available in public clouds (AWS, GCP, Microsoft Azure and Oracle Cloud). We also have real-life experience because we process over 10 million hours of audio annually. As a result, we have a strong understanding of what it takes to run a model like OpenAI's Whisper in a production environment.

We have found out that the infrastructure cost of running Whisper-medium in a cloud environment is in the range of $0.07 - $0.10/hour. You can contact us via email to get the in-depth assumptions and backup behind our cost model. An important factor to note is that in a single-tenant production environment the compute infrastructure cannot be run at a very high utilization. The peak throughput required to support real-life traffic can be several times (2-3x) the average throughput. Net-net, we determined that while developers would not have to pay for software licensing, the cloud infrastructure costs would still remain substantial. 

In addition to this infrastructure cost the larger expense of running Whisper on the Edge (On-Premise + Private Cloud) is that it would require a dedicated back-end Engineering & Devops team that can chop the audio recording into segments that can be submitted to Whisper and perform the queue management. This team would need to also oversee all info-sec and compliance needs (e.g. running vulnerability scans, intrusion detection etc).

b) Price per channel makes it expensive for Call Center & Meeting use-case

As of the publication of this post, Whisper does not have a multi-channel audio API. So if your application involves audio with multiple speakers, then Whisper's effective price-per-min = Number of channels * 0.006. For both meetings and call center use-cases, this pricing can become prohibitive. 

3. Missing Key Features  - Diarization, Time-Stamps, PII Redaction

This release of Whisper is missing some key features that developers would need. The three important features we noticed are Diarization (speaker separation), Time-stamps and PII Redaction. 

Coming Soon - Voicegain Whisper APIs

Voicegain is working on releasing a Voicegain-Whisper Model over its APIs. With this developers can get benefits of Voicegain PCI/SOC-2 compliant infrastructure and advanced features like diarization, PII redaction, PCI compliance and time-stamps. To join the waitlist, please email us at sales@voicegain.ai

About Voicegain

At Voicegain, we build deep-learning-based Speech-to-Text/ASR models that match or exceed the accuracy of STT models from the large players. For over 4 years now, startup and enterprise customers have used our APIs to build and launch successful products that process over 600 million minutes annually. We focus on developers that need high accuracy (achieved by training custom acoustic models) and deployment in private infrastructure at an affordable price. We provide an accuracy SLA where we guarantee that a custom model that is trained on your data will be as accurate if not more than most popular options including Open AI's Whisper. 

We also have models that are trained specifically on call center audio. While Whisper is a worthy competitor (of course a much larger company with 100x our resources), as developers we welcome the innovation that Open AI is unleashing in this market. By adding ChatGPT APIs to our Speech-to-Text , we are planning to broaden our API offerings to developer community.

To sign up for a developer account on Voicegain with free credits, click here.

Read more → 
Voicegain Transcribe as an On-Premise or Private Cloud based Meeting AI Assistant
Edge, Transcription, Announcement, Enterprise
Voicegain Transcribe as an On-Premise or Private Cloud based Meeting AI Assistant

Like Voicegain Transcribe, there are other cloud-based Meeting AI and AI note-taking solutions that work with video meeting platforms like Zoom and Microsoft Teams. However they do not meet the requirements of privacy-sensitive enterprise customers in financial services, healthcare, manufacturing and high-tech and other industry verticals. Data privacy and control issues would mean that these customers would want to deploy an AI based meeting assistant in their private infrastructure behind their corporate firewall.

Voicegain Transcribe, built for the Edge - On-Prem or Private Cloud

Voicegain Transcribe has been designed and developed for the On-Prem Datacenter or Virtual Private Cloud use-case. Voicegain has already deployed this at a large global Fortune 50 company, making it one of the first truly On-premise/private-cloud AI Meeting Assistant solutions in the market.

The key features of Voicegain Transcribe are:

  1. Integrates with Zoom Local Recordings - In addition to data privacy and control this ensures 100% accurate speaker labels
  2. Custom AI Models - both Speech-to-Text and NLU models that summarize the meeting & extract key items  - are trained on customer data and are deployed behind the enterprise firewall
  3. Integrate with Enterprise SSO and email systems for signup
  4. Integrates with Local Storage & Databases

1. Zoom Local Recordings for 100% accurate Speaker labels

Zoom Local Recordings are recordings of your meetings that are saved in your computer's hard disk on your file-system and not on Zoom's cloud. This feature ensures that confidential and privacy-sensitive recorded audio and video content is stored within the enterprise and is not accessible to Zoom.

Voicegain offers a Windows desktop app (App for Mac OS is on the roadmap) that accesses these Zoom recordings and submits it for transcription and NLU.


The other major advantage of Zoom Local Recordings is that Zoom supports recording of a separate audio track for each participant. This feature is not available in its Cloud recording as of yet (as of Feb 2023). Voicegain Transcribe with Zoom Local Recordings can hence assign speaker labels with 100% accuracy. 

There are vendors that offer Meeting Assistants that join from the Cloud and record. However when this solution is picked, the Meeting Assistant has access only to a blended/merged mono audio file which includes audio of all the participants. So Meeting AI solution has to "diarize" the meeting audio - which is an inherently difficult problem to solve. Even state-of-the-art diarization/speaker separation models are only 83-85% accurate.

2. Customizable AI Models

For any Meeting AI solution to extract meaningful insights, the accuracy of the underlying transcription is extremely important. If the Speech-to-Text is not accurate, then even the best NLU algorithm or the largest language model cannot deliver valuable and accurate analytics.

Voicegain can train the underlying Speech-to-Text to help accurately transcribe different accents, customer specific words and the specifiic acoustic environment.

3. Enterprise SSO and email systems

Voicegain integrates with Enterprise SSO solutions using SAML. Voicegain also integrates with internal email systems to simplify user management tasks like sign-up, password reset and changes, adds and deletes.

4. Enterprise Storage & Database

All the meeting audio, transcripts and NLU-based analytics are stored in enterprise controlled NoSQL and SQL databases. Enterprises can either use in-house staff to maintain/administer these databases and storage or they can also use a managed database option like MongoDB Atlas or Managed PostgreSQL from a cloud provider like Azure, AWS or GCP

If you are looking for a Meeting AI solution that can be deployed fully behind your corporate firewall or in your own Private Cloud infrastructure, then Voicegain Transcribe is the perfect fit for your needs.

Have questions? We would love to hear from you. Send us an email -sales@voicegain.ai or support@voicegain.ai and we will be happy to offer more details.

Read more → 
Announcing Voicegain Zoom Meeting Assistant for Local Recordings
Announcement, Enterprise, Insights
Announcing Voicegain Zoom Meeting Assistant for Local Recordings

We are really excited to announce the launch of Zoom Meeting Assistant for Local Recordings. This is immediately available to all users of Voicegain Transcribe that have a Windows device. The Zoom Meeting Assistant can be installed on computers that have Windows 10 or Windows 11 as the OS.

What are local recordings? Zoom offers two ways to record a meeting - 1) Cloud Recording: Zoom users may save the recording of the meeting on Zoom's Cloud. 2) Local Recording - The meeting recording is saved locally on the Zoom user's computer. These recordings are saved in the default Zoom folder on the file system. Zoom processes the recording and makes it available in this folder a few minutes after the meeting is complete.

Below is a screenshot of how a Zoom user can initiate a local recording.

Zoom Local Recording

Advantages of Zoom Local Recordings

There are four big benefits of using Local Recordings

  1. Data Privacy: The single biggest reason to use Local Recordings is for Data Privacy. Many Zoom customers are enterprise customers in regulated industries like financial services, telecommunications, health care and government. Enterprise Info-sec policies would prevent the use of 3rd Party cloud like Zoom to store sensitive data like meeting transcripts. Even for enterprise customers in un-regulated industries, a significant proportion of meeting content is confidential and proprietary in nature. In addition, many countries have strict data residency requirements. In all the above scenarios Local Recordings is the ideal solution. Since the Voicegain Transcribe app can deployed in the datacenter or Virtual Private Cloud, it can be a solution that sits "behind the corporate firewall".
  2. Separate Audio track for Speakers: Zoom Local Recording supports a separate audio track for each meeting participant. This has multiple benefits including more accurate transcription and automated speaker labeling. To enable this, all you need to do is to open your Zoom App on your desktop and click on Settings --> Recordings. Under Local Recordings, as shown below, make sure you check the "Record a separate audio file of each participant.." (Screenshot below)
  3. Works with Free Zoom Accounts: Zoom provides the local recording feature for all Zoom users - even those in the free tier. This makes sense because with local recording, the user is responsible for incurring costs of storage. Since Voicegain Transcribe has a forever free plan, users now have a free Zoom meeting transcription solution that is accurate and private. The cloud recording feature on Zoom is only available to paid users.
  4. Control of recordings: Most users of Zoom - and especially enterprise users within an IT organization - would prefer to retain access to meeting recordings independent of Zoom. The Local Recording feature allows users to do that.
Enabling multi-track recording on Zoom

Getting Started

To use Voicegain Zoom Meeting Assistant, there are just two requirements

1. Users should first sign up for a Voicegain Transcribe account. Voicegain offers a free plan forever (up to 2 hours of transcription per month) and users can sign up using this link. You can learn more about Voicegain Transcribe here.

2. They should have a computer with Windows 10 or 11 as the OS.

This Windows App can be downloaded from the "Apps" page on Voicegain Transcribe. Once the app is installed, users will be able to access it on their Windows Taskbar (or Tray). All they need to do is to log into the Voicegain Transcribe App from the Meeting Assistant by entering their Transcribe user-id and password.

Once the Meeting Assistant App is logged into Voicegain Transcribe, it does two things

1. It constantly scans the Zoom folder for any new local recordings of Meetings. As soon as it finds such a recording, it submits/uploads it to Voicegain Transcribe for transcription, summarization and extraction of Key Items (Actions, Issues, Sales Blockers, Questions, Risks etc.)

2. It can also join any Zoom Meeting as the Users AI Assistant. Also this feature works whether the user is the Host of the Zoom Meeting or just a Participant . By joining the meeting, the Meeting Assistant is able to collect information on all the participants in the meeting.

Exciting features on our roadmap

While the current Meeting Assistant App works only for Windows users, Voicegain has native apps for Mac, Android and iPhone as part of its product roadmap.

Getting in touch

Send us an email at support@voicegain.ai if you have any questions.

Read more → 
Speech-to-Text Accuracy Benchmark - December 2022
Benchmark
Speech-to-Text Accuracy Benchmark - December 2022

It has been another 6 months since we published our last speech recognition accuracy benchmark. Back then, the results were as follows (from most accurate to the least): Microsoft, then Amazon closely followed by Voicegain, then new Google latest_long and Google Enhanced last.

While the order has remained the same as the last benchmark, three companies - Amazon, Voicegain and Microsoft showed significant improvement.

Since the last benchmark, at Voicegain we invested in more training - mainly lectures - conducted over zoom and in a live setting. Training on this type of data resulted in a further increase in the accuracy of our model. We are actually in the middle of a further round of training with a focus on call center conversations. 

As far as the other recognizers are concerned:

  • Microsoft and Amazon both improved by about the same amount.
  • Google recognizers did not improve. Actually, the WER numbers for them are worse than in June.


Methodology

We have repeated the test using similar methodology as before: used 44 files from the Jason Kincaid data set and 20 files published by rev.ai and removed all files where none of the recognizers could achieve a Word Error Rate (WER) lower than 25%.

This time again only one file was that difficult. It was a bad quality phone interview (Byron Smith Interview 111416 - YouTube) with WER of 25.48%

We publish this since we want to ensure that any third party - any ASR Vendor, Developer or Analyst - to be able to reproduce these results.

The Results

You can see box-plots with the results above. The chart also reports the average and median Word Error Rate (WER)

Only 3 recognizers have improved in the last 6 months.

  • Amazon by 0.68% in the median and 0.40% in the average
  • Voicegain by 0.47% in the median and 0.45% in the average
  • Microsoft by 0.33% in the median and 0.25% in the average

Detailed data from this benchmark indicates that Amazon is better than Voicegain on audio files with WER below the median and worse on audio files with accuracy above the median. Otherwise, AWS and Voicegain are very closely matched. However we have also run a client-specific benchmark where it was the other way around - Amazon as slightly better on audio files with WER above the median than Voicegain, but Voicegain was better on audio files with WER below the median. Net-net, it really depends on type of audio files, but overall, our results indicate that Voicegain is very close to AWS.

Best Recognizer

Let's look at the number of files on which each recognizer was the best one.

  • Microsoft was best on 36 out of the 63 files
  • Amazon was best on 15 files.
  • Voicegain was best on 9 audio files
  • Google latest-long was best on just 1 file
  • Google Video Enhanced was best on 2 files - these were the 2 easiest files - Google got 0.82% and 1.52% WER on them - one was Sherlock Holmes from Librivox and the other The Art of War by  Sun Tzu, also a Librivox audiobook.

Improvements over time

We now have done the same benchmark 5 times so we can draw charts showing how each of the recognizers has improved over the last 2 years and 3 months. (Note for Google the latest 2 results are from latest-long model, other Google results are from video enhanced.)

You can clearly see that Voicegain and Amazon started quite bit behind Google and Microsoft but have since caught up.

Google seems to have the longest development cycles with very little improvement since Sept. 2021 till about half a year ago. Microsoft, on the other hand, releases an improved recognizer every 6 months. Our improved releases are even more frequent than that.

As you can see, the field is very close and you get different results on different files (the average and median do not paint the whole picture). As always, we invite you to review our apps, sign-up and test our accuracy with your data.

Out-of-the-box accuracy is not everything

When you have to select speech recognition/ASR software, there are other factors beyond out-of-the-box recognition accuracy. These factors are, for example:

  • Ability to customize the Acoustic Model - Voicegain model may be trained on your audio data - we have several blogposts describing both research and real use-case model customization. The improvements can vary from several percent on more generic cases, to over 50% to some specific cases, in particular for voicebots.
  • Ease of integration - Many Speech-to-Text providers offer limited APIs especially for developers building applications that require interfacing with  telephony or on-premise contact center platforms.
  • Price - Voicegain is 60%-75% less expensive compared to other Speech-to-Text/ASR software providers while offering almost comparable accuracy. This makes it affordable to transcribe and analyze speech in large volumes.
  • Support for On-Premise/Edge Deployment - The cloud Speech-to-Text service providers offer limited support to deploy their speech-to-text software in client data-centers or on the private clouds of other providers. On the other hand, Voicegain can be installed on any Kubernetes cluster - whether managed by a large cloud provider or by the client.

Take Voicegain for a test drive!

1. Click here for instructions to access our live demo site.

2. If you are building a cool voice app and you are looking to test our APIs, click here to sign up for a developer account  and receive $50 in free credits

3. If you want to take Voicegain as your own AI Transcription Assistant to meetings, click here.

Read more → 
Why Conversational Voice AI should be on the Edge?
Edge
Why Conversational Voice AI should be on the Edge?

Enterprises are increasingly looking to mine the treasure trove of insights from voice conversations using AI. These conversations take place daily on video meeting platforms like Zoom, Google Meet and Microsoft Teams and over telephony in the contact center (which take place on CCaaS or on-premise contact center telephony platforms).

What is Voice AI?

Voice AI or Conversational AI refers to converting the audio from these conversations into text using Speech recognition/ASR technology and mining the transcribed text for analytics and insights using NLU. In addition to this, AI can be used to detect sentiment, energy and emotion in both the audio and text. The insights from NLU include extraction of key items from meetings. This include semantically matching phrases associated with things like action items. issues, sales blockers, agenda etc.

Over the last few years, the conversational AI space has seen many players launch highly successful products and scale their businesses. However most of these popular Voice AI options available in the market are multi-tenant SaaS offerings. They are deployed in a large public cloud provider like Amazon, Google or Microsoft. At first glance, this makes sense. Most enterprise software apps that automate  workflows in functional areas like Sales and Marketing(CRM), HR, Finance/Accounting or Customer service are architected as multi-tenant SaaS offerings. The move to Cloud has been a secular trend for business applications and hence Voice AI has followed this path.

However at Voicegain, we firmly believe that a different approach is required for a large segment of the market. We propose an Edge architecture using a single-tenant model is the way to go for Voice AI Apps.

Why does the Edge make sense for Conversational AI?

By Edge, we mean the following

1) The AI models for Speech Recognition/Speech-to-Text and NLU run on the customer's single tenant infrastructure – whether it is bare-metal in a datacenter or on a dedicated VPC with a cloud provider.

2) The Conversational AI app -which is usually a browser based application that uses these AI models is also completely deployed behind the firewall.

We believe that the advantages for Edge/On-Prem architecture for Conversational/Voice AI is being driven by the following four factors

1.    Privacy, Confidentiality and Data Residency requirements

Very often, conversations in meetings and call centers are sensitive from a business perspective. Enterprise customers in many verticals (Financial Services, Health Care, Defense, etc) are not comfortable storing the recordings and transcripts of these conversations on the SaaS Vendor's cloud infrastructure. Think about a highly proprietary information like product strategy, status of key deals, bugs and vulnerabilities in software or even a sensitive financial discussion prior to the releasing of earnings for a public company. Many countries also impose strict data residency requirements from a legal/compliance standpoint. This makes the Edge (On-Premises/VPC) architecture very compelling.

2. Accuracy/Model Customization

Unlike pure workflow-based SaaS applications, Voice AI apps include deep-learning based AI Models –Speech-to-Text and NLU. To extract the right analytics, it is critical that these AI models – especially the acoustic models in the speech-recognition/speech-to-text engine are trained on client specific audio data. This is because each customer use case has unique audio characteristics which limit the accuracy of an out-of-the-box multi-tenant model. These unique audio characteristics relate to

1.    Industry jargon – acronyms, technical terms

2.    Unique accents

3.    Names of brands, products, and people

4.    Acoustic environment and any other type of audio.

However, most AI SaaS vendors today use a single model to serve all their customers. And this results in sub-optimal speech recognition/transcription which in turn results in sub-optimal NLU. 

3. Latency ( for Real-time Voice AI apps) 

For real-time Voice AI apps - for e.g in the Call Center - there is an architectural advantage for the AI models to be in the same LAN as the audio sources.

4. Affordability

For many enterprises, SaaS Conversational AI apps are inexpensive to get started but they get very expensive at scale.

Voicegain’s Edge Offering

Voicegain offers an Edge deployment where both the core platform and a web app like Voicegain Transcribe can operate completely on our clients infrastructure. Both can be placed "behind an enterprise firewall".

Most importantly Voicegain offers a training toolkit and pipeline for customers to build and train custom acoustic models that power these Voice AI apps.

Have a question? Or just want to talk?

If you have any question or you would like to discuss this in more detail, please contact our support team over email (support@voicegain.ai) 

Read more → 
Zoom Meeting Transcription & Notes with Transcribe, an AI Meeting Assistant
Transcription
Zoom Meeting Transcription & Notes with Transcribe, an AI Meeting Assistant

As we announced here, Voicegain Transcribe is an AI based Meeting Assistant that you can take with you to all your work meetings. So irrespective of the meeting platform - Zoom, Microsoft Teams, Webex or Google Meet - Voicegain Transcribe has a way to support you.

We now have some exciting news for those users that regularly host Zoom meetings. Voicegain Transcribe users who are on Windows now have a free, easy and convenient way to access all their meeting transcripts and notes from their Zoom meetings. Transcribe Users can now download a new client app that we have developed - Voicegain Zoom Meeting Assistant for Local Recordings - onto their device.

With this client app, any Local Recording of a Zoom meeting (explained below) will be automatically submitted to Voicegain Transcribe. Voicegain's highly accurate AI models subsequently process the recording to generate both the transcript (Speech-to-Text) but also the minutes of the meeting and the topics discussed (NLU).

As always, you get started with a free plan that does not expire. So you can get started today without having to setup your payment information.

What is Zoom Local Recording?

Zoom provides two options to record meetings on its platform - 1) Local Recording and 2) Cloud Recording.

Zoom Local recording is a recording of the meeting that is saved on the hard disk of the user's device. There are two distinct benefits of using Zoom Local Recording

  1. Free: Zoom offers this Local Recording feature even on free Zoom accounts. So you can try this feature even if you are on an unpaid Zoom Account
  2. Privacy & Control: The audio content of your meeting could contain sensitive and confidential information. With a local recording, the audio is not shared with Zoom

Zoom Cloud Recording is when the recording of the meeting is stored on your Zoom Cloud account on Zoom's servers. Currently Voicegain does not directly integrate with Zoom Cloud Recording (however it is on our roadmap). In the interim, a user may download the Cloud Recording and upload it to Voicegain Transcribe in order to transcribe and analyze recordings saved in the cloud.

How does it work?

  1. Sign up for a free account with Voicegain Transcribe. Here is a link to our sign up page. Pick the first option.
  1. On the left menu click on Apps. You would visit a page as shown below
Zoom Meeting Assistant Download page

  1. Please refer to this knowledge-base article for steps after you download the Meeting Assistant.

Recording of individual speaker audio tracks

Zoom allows you to record individual speaker audio tracks separately as independent audio files. The screenshot above shows how to enable this feature on Zoom.

Voicegain Zoom Meeting Assistant for Local Recording supports uploading these independent audio files to Voicegain Transcribe so that you can get accurate speaker transcripts

Support for On-Premise/VPC and white-label UI

The entire Voicegain platform including the Voicegain Transcribe App and the AI models can be deployed On-Premise (or in VPC) giving an enterprise a fully secure meeting transcription and analytics offering.

Have a question?

If you have any question, please sign up today, and contact our support team using the App.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Category 1
This is some text inside of a div block.
by Jacek Jarmulak • 10 min read

Donec sagittis sagittis ex, nec consequat sapien fermentum ut. Sed eget varius mauris. Etiam sed mi erat. Duis at porta metus, ac luctus neque.

Read more → 
Sign up for an app today
* No credit card required.

Enterprise

Interested in customizing the ASR or deploying Voicegain on your infrastructure?

Contact Us → 
Voicegain - Speech-to-Text
Under Your Control