By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Forking Media Streams from Contact Center Platforms for Realtime Transcription


Voicegain Real-Time Transcription and Speech-Analytics APIs can get access to the streaming audio data in real-time from IP Telephony / Unified Communications systems (e.g. from Avaya, Cisco, Genesys) using 3 approaches:

  • Programmable integration (using APIs)

The details of each of those approaches are described below

Use Cases

The use cases for Realtime Transcription and Speech Analytics APIs are as follows

  1. Realtime Agent Assist in Contact Centers for customer service
  2. Realtime assist for sales people (SDRs, Sales Engineers, AEs) for telephony conversations and meetings
  3. Realtime Insights from internal meetings

The Transcription APIs convert audio into text real-time. The Speech Analytics APIs offer analytics both from Text - NLU Intents, sentiment, entities and keywords and Audio - Tone, Silence, OverTalk etc.


SIPREC is usually used for call recording but the standard essentially provides a real-time audio stream from the telephone call which makes it suitable for applications which have to work real-time.

Voicegain SIPREC interface has been tested with the following platforms:

  • Avaya Enterprise SBC - also supports Avaya AES/TSAPI integration for more call metadata
  • Broadsoft SIPREC sipua
  • Cisco built-in bridge (BIB) - built-in bridge functionality is available on some of the Cisco's 3rd generation VoIP Phones and supported by Cisco's UCM version 6.0 and higher.  
  • Cisco Cisco Unified Border Element (CUBE)
  • Metaswitch SIPREC sipua - The minimal version of Metaswitch that supports SIPREC is 9.0.10
  • Oracle SBC SIPREC - SelectiveCall Recording SIPREC (
  • Twilio TwiML <Siprec>

Voicegain can capture relevant call metadata in addition to obtaining the audio (the metadata capture functionality may differ in capabilities depending on the client platform).

Voicegain platform can be configured to automatically launch transcription and speech-analytics as soon as the new SIPREC session gets established.

SIPREC support is available both in the Cloud and the Edge (OnPrem) deployments of the Voicegain Platform.

SIPREC is an Enterprise feature of the Voicegain platform and is not included in the base package. Please contact or submit a Zendesk ticket for more information about SIPREC and if you would like to use it with your existing Voicegain account.


Certain platforms, like Genesys for example, do not support SIPREC. Instead they may offer ability to send separate- or combined-channel audio stream to a destination negotiated using a SIP INVITE. The Genesys platform, for example,  does support streaming of the inbound and outbound RTP media to two separate SIP endpoints.

Voicegain Platform allows you to define SIP addresses that will accept such SIP INVITE.  As part of the SIP INVITE custom sip headers may be sent to provide information that allows for session tie-up and may pass any additional metadata. Upon establishing SIP connection, Voicegain will make an HTTP callback to a specified endpoint to acknowledge the connection and pass all the connection data.

Programmable Integration

Some UC platforms, in particular the newer versions provide additional capabilities to get access to the real-time audio stream. In many of them such a capability was added specifically to simplify integration with Cloud Speech-to-Text services.

Examples of that type of integration are:

  • Use Avaya DMCC (which is part of Avaya Aura® Application Enablement (AE) Services) to open RTP streams with the content of the call
  • Use Extended Media Forking (XMF) provided by Cisco Unified Communications Gateway Services
  • Five9 VoiceStream

Voicegain Platform provides multiple protocols that allow for flexible programmable integration:

  • websockets - sending binary audio data over websocket is supported. In addition to binary data, message protocols used in Twilio and SignalWire for audio streaming over websocket are also supported. (If required, we can easily add support for additional message protocols.)
  • gRPC - binary audio data may also be sent using gRPC protocol. Note, that this capability is currently in beta.
  • plain RTP. Voicegain also supports plain RTP. The IP/port/encoding negotiation, however, has to be done using our HTTP API. We do not support RTCP nor RTSP. The HTTP API is very simple and we have already had some of our customers integrate this type of plain RTP streaming using XMF within the Cisco UC environment.    

All those protocols support uLaw, aLaw, and Linear 16-bit encoding in either 8- or 16kHz sample rate.

Interested in Voicegain? Take us for a test drive!

1. Click here for instructions to access our live demo site.

2. If you are building a cool voice app and you are looking to test our APIs, click hereto sign up for a developer account  and receive $50 in free credits.

3. If you want to take Voicegain as your own AI Transcription Assistant to meetings, click here.

Sign up for an app today
* No credit card required.


Interested in customizing the ASR or deploying Voicegain on your infrastructure?

Contact Us →