Easy How-To: Build a Voicebot using Voicegain, RASA, and AWS Lambda
Updated: Apr 2
Voice Bot Setup
One of the previous blog posts described a Voice Bot built using Twilio, Voicegain, RASA, and AWS Lambda. Twilio was used for telephony (phone numbers, SIP Trunking, TwiML for call control) Voicegain provided speech recognition, while AWS Lambda was coordinating the actions. The setup works but is involved. The need to pass the speech recognition results via S3 (as Lambda is stateless and does not have memory between function calls) may occasionally cause delays in requests and responses.
Simple Inbuilt CPaaS Option
Voicegain now integrates with Amazon Chime Voice Connector to offer a pay as you go SIP Trunking service directly from the Voicegain web console. You can also purchase phone numbers and receive inbound calls. Support for making outbound Speech IVR calls is in the works.
Of course, we continue to support developer that use Twilio and SignalWire using simple SIP INVITE - this blog describes how.
How does it work ?
AWS Lambda function - a single Node.js function with an API Gateway trigger (simple HTTP API type).
Voicegain Telephony Bot API - the Telephony Bot API works with web callbacks. For Twilio and SignalWire developers, it is similar to working with Twilio TwiML and SignalWire LaML respectively.
RASA - dialog logic is provided by RASA NLU Dialog server which is accessible over RestInput API.
The sequence diagram is provided below. It is very simple. Basically, the sequence of operations is as follows:
Call a phone number provided by Voicegain (powered by Amazon Chime Voice Connector)
Voicegain Telephony Bot API makes call to a callback function on AWS Lambda.
Lambda function sends "Hi" RASA and RASA responds with the initial dialog prompt
Lambda function responds to Voicegain callback with the prompt received from RASA and tell Voicegain Speech-to-Text to capture callers response.
Voicegain uses TTS to generate from the text of the RASA question an audio prompt and plays it over the telephone to the caller
The Caller hears the prompt and says something in response
Voicegain ASR transcribes the speech to text and makes a callback with the result of transcription to Lambda function
Lambda function invokes RASA and passes to it the text of the response.
RASA processes the answer and generates next question in the dialogue
We continue next turn same as in step 4.
The sample code for the Lambda function (in python and node.js versions) is available on our github.