Pay-as-you-go usage-based pricing with no commitments. $50 in Credits provided on signup, No Credit Card Required to start today. Rate-limits apply; get custom rate-limits with revenue commits. Please contact for details.Get Started - Free Credit
1. Platform usage is measured and billed per second but displayed in hours on the invoice.
2. Each STT API request is subject to a minimum billing of 6 seconds and 1 second increment after that. A real-time request of 4 second is billed for 6 seconds or $0.0012 ($0.00020*6) and a real-time request for 7 seconds is billed $0.00020*7.
3. Custom Speech-to-Text model is built by training our standard model with additional client data (using transfer learning). Pricing provided above is for inference and applies to both batch and real-time. Please contact us for the NRE training costs of a custom model.
4. Real-time STT with MRCP or Telephony Bot API is the price for use of our Speech-to-Text/ASR as part of an MRCP or Telephony Bot API Session. This price is applicable for the entire duration of the MRCP or Telephony Bot Session. It does not include 100% whole-call recording of sessions.
5. Rate Limits apply for pay as you go. We offer higher rate limits and lower pricing with volume & term commits. Please contact us at to get the details.
Deploy Voicegain on your infrastructure. Free 30 day trial provided. Pricing has a fixed annual support cost and a variable per port or per hour cost. Minimum purchase of ports apply.Contact Us
1. Voicegain Edge refers to our platform being deployed on client infrastructure (bare-metal or VPC). Voicegain is deployed on a Kubernetes Cluster. We prefer NVIDIA GPUs for apps that require high concurrency. CPUs are supported for low concurrency apps. Orchestration of the cluster is from Voicegain cloud.
2. Client shall incur infrastructure costs and is responsible for monitoring of Kubernetes. For VPC, we recommend managed Kubernetes from the cloud provider and for Datacenter, you can contact us for support options.
3. "Port" - for Batch Speech-to-Text - is defined as throughput. So 25 Ports would allow you to transcribe 25 hours of offline audio per hour. For Real-time, Port is the number of concurrent sessions. E.g 25 Ports means a maximum of 25 Concurrent Real-time STT sessions during a month.
4. For usage based licensing, each request is subject to a minimum billing of 6 seconds and 1 second increment after that. E.g. a real-time request for 4 seconds shall be billed for 6 seconds or $0.0012 ($0.00020*6) and a real-time request for 7 seconds shall be billed for 7 seconds.
5. Voicegain offers discounts for volume & term commits. Please contact us at to receive custom pricing.
You can stream audio for Voicegain transcription API from any computer, but sometimes it is handy to have a dedicated inexpensive device just for this task. Below we relay experiences of one of our customers in using a Raspbery Pi to stream audio for real-time transcription. It replaced a Mac Mini which was initially used for that purpose. Using Pi had two benefits: a) obviously the cost, and b) it is less likely than Mac Mini to be "hijacked" for other purposes.
Voicegain Audio Streaming Daemon requires very little as far as computing resources, so in even a Raspberry Pi Zero is sufficient ; however, we recommend using Raspberry Pi 3 B+ mainly because it has on-board 1Gbps wired Ethernet port. WiFi connections are more likely to have problems with streaming using UDP protocol.
Here is a list of all hardware used in the project (with amazon prices (as of July 2019)):
All the components added up to a total of $101.97. The reason why a mini monitor and a mini keyboard were included is that they make it more convenient to control the device while it is in the audio rack. For example, the alsa audio mixer can be easily adjusted this way, while at the same time monitoring the level of the audio via headphones.
Raspberry PI running AudioDaemon
The device is running standard Raspbian which can easily be installed from an image using e.g. balenaEtcher. After base install, the following was needed to get things running:
Here are some lessons learned from using this setup over the past 6 months:
You can find the complete code (minus the RASA logic - you will have to supply your own) at our github repository.
The setup allows you to call a phone number and then interact with a Voicebot that uses RASA as the dialog logic engine.
November 2021 Update: We do not recommend S3 and AWS Lambda for a production setup. A more up to date review of various options to build a Voice Bot is described here. You should consider replacing the functionality of S3 and AWS Lambda with a web server that is able to maintain state - like Node.js or Python Flask.
The sequence diagram is provided below. Basically, the sequence of operations is as follows: