4 ways to integrate FreeSWITCH with Voicegain Speech-to-Text

FreeSWITCH is a very capable telephony platform suitable for building various telephony applications. Some of those applications will rely speech-to-text conversion, for example: ACDs (automatic call distribution), IVRs, Voice-Bots, Real-Time Agent Assist, real-time conference call transcription, call monitoring, etc.

Voicegain Speech-to-Text platform can be used with FreeSWITCH in a variety of ways.

1. mod_unimrcp for IVRs

Voicegain STT platform has supported MRCP (Media Resource Control Protocol) for a long time now. Our ASR can be accessed using MRCP and we support both grammar-based recognition (e.g. GRXML) and large-vocabulary transcription. MRCP is a communication protocol designed to connect telephony based IVRs and Voice Bots with speech recognizers (ASR) and speech synthesizers (TTS).

FreeSWITCH can interact with MRCP based recognizers using the included mod_unimrcp module. Voicegain STT has been tested with mod_unimrcp and interfaces with it without problems. You can learn more about using Voicegain STT via mod_unimrcp in this blog post.

Voicegain supports MRCP both in the Cloud and on the Edge (on-prem). We will soon be releasing in OpenSource a recognizer plugin for unimrcp server that will give you even more options in deploying FreeSWITCH with Voicegain and MRCP.

2. Bridge into Voicegain Telephony Bot API

Voicegain provides a Telephony Bot API which is a callback API - similar in style to Twilio TwiML. You can place a call to Voicegain endpoint either using a phone number obtained from Voicegain or using a SIP endpoint unique to your Voicegain application. When a call arrives you will get a web callback and the response you will provide will determine actions that the Voicegain platform will perform, like e.g. play a prompt, recognize speech, detect DTMF, etc.

You can learn more about this API from the following blog posts:

If you have a FreeSWITCH application and you would like to recognize spoken speech you can bridge into Voicegain SIP endpoint and in a callback specify a prompt and the type of speech capture (grammar-based or large vocabulary). Once the recognition finishes you will get a callback and then you can either issue a disconnect command which will transfer call flow back to your Freeswitch app, or you can continue with additional questions and recognitions on Voicegain platform as needed.

Below is an example of a simple interaction with 4 participants:

FreeSWITCH
Your control logic for FS application, e.g., a Lua script
Webservice that will handle callbacks from Voicegain Telephone Bot API. It has to be able to maintain session data.
Voicegain Telephone Bot API platform

‍

‍

3. mod_voicegain for using Voicegain ASR from FS apps/scripts

This is still not Generally Available - please contact us if you are interested in testing.

mod_voicegain will give you capabilities similar to using mod_unimrcp with Voicegain but without the whole overhead of using an MRCP protocol - mod_voicegain talks directly to Voicegain ASR.

mod_voicegain taps into the FreeSWITCH inbound audio stream and sends the audio data to Voicegain ASR in the Cloud or on the Edge. Voicegain ASR processes the audio according to the invocation parameters specified in the data argument. It then communicates the result of transcription or recognition in an Event.

mod_voicegain installs on FreeSWITCH as an app and can be invoked as a such, e.g.:

‍

or from LUA script:

‍

Results will always be returned as a FreeSWITCH event but it is also possible to get the results in a callback to the url specified in callback.uri

The FreeSWITCH event will be of custom type (Event-Name: CUSTOM) and Event-Subclass will be "voicegain_asr_update". The relevant payload will be in the "ASR-Response" field formatted as JSON.

You can read more about mod_voicegain is this Knowledge Base article.

4. mod_vg_tap for real-time transcription

mod_vg_tap has been developed with applications like Real-Time Agent Assist in mind. These apps need access to the audio stream from a FreeSWITCH call but do not otherwise need to interact with FreeSWITCH (unlike IVR and Voice-Bots).

mod_vg_tap installs as an app and has simple commands to start/stop streaming to Voicegain Speech-to-Text engine.

The start command can specify the following destinations:

websocket URL(s) - returned from a POST command that starts new speech-to-text session
socket IP:port for socket communication - this is only supported for Voicegain deployed on Edge (on-prem)
(on the roadmap) - complete JSON body to start a new speech-to-text session and start streaming to it

The results from transcription are generally not returned to a FreeSWITCH app but will be delivered to the destination specified when starting speech-to-text session - the results can be delivered via websocket, polling, or callback.

If you want more information about any of these methods of integrating Voicegain with FreeSWITCH, please email us at support@voicegain.ai.

Casey

AI Voice Agent Platform

Transcribe