Transcription for Live Streamed Event - an example

The video below shows an example of Voicegain Live Transcribe used to provide transcription for an event streamed over video.

Here are some details about this particular setup:

  • the video part is streamed using BoxCast

  • the audio for transcription is tapped live at the source on site

  • audio is streamed to Voicegain Cloud for processing using a small Java client running on raspberry pi computer

  • the audio client was downloaded pre-configured from the Voicegain portal and reads audio directly from USB audio device plugged into raspberry pi

  • speech is transcribed in the Cloud using Voicegain semi-real-time mode which delivers results in about 30 seconds (the real-time mode delivers results will less than 1 second delay))

  • the transcription output goes via a delay component that allows us to dial in the precise delay to match the streaming video delay - in this case the delay was 35.5 seconds

  • the transcribed words are sent to a Web Client over websocket - each word is sent with the set delay

  • the words are displayed with the gray font shade corresponding to the confidence in the words and the gap proportional to the gap between the spoken words

  • the Acoustic Model used here has been custom trained with additional 200h+ hours from this particular speaker

  • custom training data consisted simply of previously transcribed speeches by the speaker that were readily available on the website

  • we are also using a custom Language Model (on top of the base NLM) that was created from user provided corpus

