Why should Conversational Voice AI be on the Edge?

Enterprises are increasingly looking to mine the treasure trove of insights from voice conversations using AI. These conversations take place daily on video meeting platforms like Zoom, Google Meet and Microsoft Teams and over telephony in the contact center (which take place on CCaaS or on-premise contact center telephony platforms).

Voice AI refers to converting the audio from these conversations into text using Speech recognition/ASR technology and mining the transcribed text for analytics and insights using NLU. Enterprises are looking to extract key topics and action items from meetings, identify sales blockers and opportunities for coaching sales people and identifying customer sentiment from call center interactions.

Over the last few years, the conversational AI space has seen dozens of players launch successful products and scale their businesses. However most of the popular Voice AI options available in the market are multi-tenant SaaS offerings. The Conversational AI vendors have built web applications and deployed them at a large public cloud provider like Amazon, Google or Microsoft. At first glance, this makes sense. Most enterprise software companies that automate business workflows in functional areas like Sales and Marketing(CRM), HR, Finance/Accounting or Customer service have been architected as multi-tenant SaaS offerings. The move to Cloud has been a secular trend for business applications and hence Voice AI has followed this path.

However at Voicegain, we firmly believe that an Edge architecture using a single-tenant model is the way to go for Voice AI Apps.

Why does the Edge make sense for Conversational AI?

By Edge, we mean that the Speech Recognition/Speech-to-Text and NLU processing takes place on the customer's single tenant infrastructure – whether it is bare-metal in a datacenter or on a dedicated VPC with a cloud provider.

We believe that the advantages for Edge/On-Prem architecture for Conversational/Voice AI is being driven by two big factors

1.    Privacy and Data Residency

Very often, conversations in meetings and call centers are sensitive from a business perspective. Most businesses and enterprises are not comfortable storing the recordings of these meetings on a public cloud. Think about a difficult/sensitive conversation between a manager and his/her direct report or even a sensitive financial discussion prior to the releasing of earnings for a public company. Also many countries have strict data residency requirements from a legal/compliance standpoint. This makes the Edge (On-Premises/VPC) architecture very compelling.


2.    Accuracy/Model Customization

Unlike pure workflow-based SaaS applications, Voice AI apps include deep-learning based AI Models –Speech-to-Text and NLU. To extract the right analytics, it is critical that these AI models – especially the acoustic models in the speech-recognition/speech-to-text engine are trained on client specific audio data. This is because each customer use case has unique audio characteristics which limit the accuracy of an out-of-the-box multi-tenant model. These unique audio characteristics relate to

1.    Industry jargon – acronyms, technical terms

2.    Unique accents

3.    Names of brands, products, and people

4.    Acoustic environment and any other type of audio.

However, most AI SaaS vendors today use a single model to serve all their customers. And this results in sub-optimal speech recognition/transcription which in turn results in sub-optimal NLU. 

Voicegain’s Edge Offering

Voicegain offers an Edge deployment for both its core platform (STT and NLU APIs) and the Voicegain Transcribe app. Both the core platform and Voicegain Transcribe can operate completely on our clients infrastructure completely disconnected from the internet. Both can be placed "behind an enterprise firewall".

Most importantly Voicegain offers a training toolkit and pipeline for customers to build and train custom acoustic models that power these Voice AI apps. This makes the accuracy of these apps much higher than what enterprises get by licensing multi-tenant SaaS apps.

Have a question? Or just want to talk?

If you have any question or you would like to discuss this in more detail, please contact our support team over email (support@voicegain.ai) 

Sign up for an app today
* No credit card required.


Interested in customizing the ASR or deploying Voicegain on your infrastructure?

Contact Us → 
Voicegain - Speech-to-Text
Under Your Control