Voice command applications made easier
Updated: Dec 22, 2020
New continuous recognition option
In latest Voicegain release (1.16.0) we have added a new option to our /asr/recognize/async API for ASR/speech-to-text. It is called continuousRecognition and if enabled modifies the default behavior of the grammar-based recognition.
Normally when /asr/recognize/async API is used the recognizer will return when the grammar is matched and the complete timeout expires. That means that it is only possible to get a single recognition in one /asr/recognize/async API request. If a no-match or no-input is detected the recognition will terminate.
However, sometimes there are use cases which demand that the recognizer e.g. ignores all no-matches until a match is found. This is what the continuousRecognition option is for.
With continuousRecognition you have fine control over which of the 4 events - no-input, no-match, match, and error - will be returned in a callback and which (if any) event will terminate recognition. If you do not set any event to terminate recogntion, the recognition session can be stopped by closing the audio stream or by returning stop:true from the callback.
What is it good for?
An example might be a use case where a voicemail is being played to a caller and during the playback we want to interpret caller commands like: stop, next, previous, save, delete. If we used normal recognition we would encounter situations where what is said was not understood. Stopping recognition on no-match would not make much sense because either: (1) re-prompting would mess up the flow of the call, or (2) restarting recognition might introduce a gap in recognition that may result in missing a part what the caller said.
In scenario like this it is best to ignore no-match and continue to listen, the caller will notice no response to what he said and will naturally repeat that.
The settings for continuous recognition that would work in this case would be:
stopOn : match, error
noCallbackFor : no-input, no-match - notes: (1) in this case we suggest setting a noinputTimeout very long so that internally no no-inputs are generated, (2) application could also decide to accept no-match callbacks - they could be tracked and if too numerous acted upon.
Where can it be used
Continuous Recognition is supported in:
the core speech api - /asr/recognize/async
Voicegain integration for Twilio Media Streams - either TwiML <Stream> or <Connect><Stream> in Twilio Programmable Voice
It is not yet supported in Voicegain Telephony Bot APIs.