Real-Time Transcription for Deaf (a Use Case)
Updated: Dec 21, 2020
Countryside Bible Church has been using VoiceGain platform for real-time transcription since September 2018 (when our platform was still in alpha).
How it Started
In August 2018 one of our employees was approached by staff at CBC with a question about a software that would allow a deaf person to follow sermons live via transcription. One of the members at CBC is both hearing and vision impaired and cannot easily follow sign language; however, she can read large font on a computer screen from close by.
In August, Voicegain just started alpha tests of the platform, so his response was that indeed he knew such software and it was Voicegain. At that time, our testing was focusing on IVR use cases, so we still needed a few weeks to polish the transcription APIs and develop a web app that could consume the transcript stream (via websocket) and present it as scrolling text in a browser.
To improve recognition, we used about 200 hours of previously transcribed sermons from CBC to adapt our Acoustic DNN Model. Additionally, we created a specific CBC Language Model, by adding a corpus of text from several Bible translation, various transcribed sermons, list of CBC staff names, etc.
As far as the input audio is concerned, initially, we were streaming audio using a standard RTP protocol from ffmpeg tool. We had some issues with a reliability of raw RTP, so later we switched to a custom Java client that sends the audio using a proprietary protocol. The client runs as a daemon on a small Raspberry Pi device.
CBC audio-visual team has been running real-time transcription using our platform since September 2018, pretty much ever Sunday. You can see an example of the transcription in action in the video below
Current plans for the transcription service is to integrate it into CBC website and to make it available together with streamed video. This will allow hearing impaired to follow the services at home via streaming. For now, the transcription text will be presented as an embedded web page element under the embedded video.
Because the streamed video is more than 30 seconds delayed w.r.t. the real-time, we will be feeding the audio simultaneously to two ASR engines, one optimized for real-time response, and one optimized for accuracy. This is easy, because Voicegain Web API provides methods that allow for attaching two ASR sessions to a single audio stream. Each session, can in turn feed its own websocket stream. By accessing the appropriate websocket stream, web UI can display either the real-time of delayed transcript.
Example transcribed sermons
The audio is Copyright of Countryside Bible Church and transcripts are Copyright of Voicegain.
God's Plan for Human History (Part 2)
Tom Pennington | Daniel 2 | 2018-11-04 PM
55 minutes 13 seconds, 7475 words
Accuracy: 1.08% character error rate
Note: Voicegain output is formatted to match Transcript. Normally it also includes timing information. This specific output was obtained on 4/30/19 from real-time recognizer which has slightly lower accuracy compared to off-line recognizer.