• Jacek Jarmulak

Python SDK Available


As of yesterday, programming in Python against Voicegain Speech-to-Text (STT) API got even easier with the release of official voicegain-speech package to Python Package Index (PyPI) repository.


The SDK package is available at: https://pypi.org/project/voicegain-speech/

The SDK source code is available at: https://github.com/voicegain/python-sdk


This package wraps Voicegain Speech-to-Text Web API. A preview of the API spec can be found at: https://www.voicegain.ai/api

Full API spec documentation is available at: https://portal.voicegain.ai/api-documentation


The core APIs are for Speech-to-Text, either transcription or recognition (further described below). Other available APIs include:

  • RTC Callback APIs which in addition to speech-to-text allow for control of RTC session (e.g., a telephone call).

  • Websocket APIs for managing broadcast websockets used in real-time transcription.

  • Language Model creation and manipulation APIs.

  • Data upload APIs that help in certain STT use scenarios.

  • Training Set APIs - for use in preparing data for acoustic model training.

  • GREG APIs - for working with ASR and Grammar tuning tool - GREG.


Transcribe API

/asr/transcribe The Transcribe API allows you to submit audio and receive the transcribed text word-for-word from the STT engine. This API uses our Large Vocabulary language model and supports long form audio in async mode.

The API can, e.g., be used to transcribe audio data - whether it is podcasts, voicemails, call recordings, etc. In real-time streaming mode it can, e.g., be used for building voice-bots (your the application will have to provide NLU capabilities to determine intent from the transcribed text).

The result of transcription can be returned in four formats:

  • Transcript - Contains the complete text of transcription

  • Words - Intermediate results will contain new words, with timing and confidences, since the previous intermediate result. The final result will contain complete transcription.

  • Word-Tree - Contains a tree of all feasible alternatives. Use this when integrating with NL postprocessing to determine the final utterance and its meaning.

  • Captions - Intermediate results will be suitable to use as captions (this feature is in beta).


Recognize API

/asr/recognize This API should be used if you want to constrain STT recognition results to the speech-grammar that is submitted along with the audio (grammars are used in place of the large vocabulary language model).

While having to provide grammars is an extra step (compared to Transcribe API), they can simplify the development of applications since the semantic meaning can be extracted along with the text.


Another advantage of using grammars is that they can ignore words in the utterance that are outside of grammar - still delivering recognition although with lower confidence.

Voicegain supports grammars in the JSGF and GRXML formats – both grammar standards used by enterprises in IVRs since early 2000s. The recognize API only supports short form audio - no more than 60 seconds.

38 views
Contact Us