API | Speech-to-Text Platform

Check out our blog for insights, benchmarks, sample code, and more

Voicegain Blog

ASR, Contact Center

Challenges in redaction of PII, PCI and PHI information in Call Center recordings for compliance requirements

Arun Santhebennur

•

2 mins

min read

This article highlights the technical challenges in redaction of PII, PCI and PHI information in call center recordings for compliance requirements. It is focused on CIOs, CISOs and VP Info-Secs of enterprises and BPOs that are responsible for compliant recording and storage in their Call Centers. This is a big area of focus in regulated industries like telecom, financial services, health care and government deal as these call centers deal with a lot of Personally Identifiable Information (PII) and Personal Health Information (PHI). In addition if a call center is processing payments, it needs to adhere to standards of PCI-DSS.

How does Redaction work?

Redaction of Call Center recordings involves 3 main steps, 1) Transcription 2) Named-Entity Recognition of PII/PHI/PCI entities and 3) Redaction (in both Audio & Text) of these entities. In order to be compliant with standards like PCI and HIPAA, it is important that before storing the audio data and text transcript long-term, all such PII information is masked in text and removed in audio prior to storage.

Key Technical Challenges in Redaction

1. Simplistic Algorithms designed for mono channels will not work

Most modern call center recordings are 2-channel or stereo. A simplistic algorithm designed for mono-channel recording will not work. For example, establishing that a credit card is being requested by the Agent can span multiple turns in the conversations. The NER algorithm while establishing context has to consider both channels. Also establishing where the context starts and ends is an important criterion.

2. PII information is provided in imprecise ways

In real-world conversations, customers are not very precise while sharing PII information. For example when they share their credit card number they can make mistakes while reading out the 15 or 16 digits. The Agent may not hear certain digits and ask the user to confirm or repeat certain digits. So when you are designing the algorithm to identify the PII entities it needs to be ablet to correct for all of this.

3. Speech Recognition Errors

Transcription accuracy especially in telephone conversations, which encode the audio in 8kHz, may result in missed or additional digits. Having a simple rule related to digit length - say 15 for American Express and 16 for Visa will not work. Also sometimes digits may get recognized as words based on accents. For example "eight" may get recognized as "ate" if spoken with an accent.

4. Context Windows

As shared in the first point above, the problem becomes one of establishing a context window where the PII information is shared. In the first place, it needs to be long enough to even recognize that PII information is being requested. However extending the context window too much could start to introduce false positives. There may be other important numbers - for example say a tracking number or confirmation number that is also spoken not too far from the PII information.

Net-net designing an algorithm that is able to look across two channels for stereo recordings, account for speech recognition errors and perform accurate PII entity recognition over turns of a conversation with a well-balanced context window is the key to successful PII redaction.

Achieved 95% Redaction Accuracy for Sutherland Global

We partnered with Sutherland Global, a Top 10 BPO, to build a compliant recording for their large install base of Fortune 500 companies. The Voicegain platform - which performs both transcription and PII compliant redaction- is deployed in their VPC. We tuned our algorithm over several months to get it to pass stringent test criteria.

Today our PII Redaction has achieved an accuracy of over 95%.

Get in touch

If you are looking to build a PII/PCI/PHI compliant recording solution, please give us a shout. We would love to share our experiences. Email us at sales@voicegain.ai

‍

Benchmark

2025 Speech-to-Text Accuracy Benchmark for 8 kHz Call Center Audio Files

Arun Santhebennur

•

min read

Voicegain is releasing the results of its 2025 STT accuracy benchmark on an internally curated dataset of forty(40) call center audio files. This benchmark compares the accuracy of Voicegain's in-house STT models with that of the big cloud providers and also Voicegain's implementation of OpenAI's Whisper.

In the years past, we had published benchmarks that compared the accuracy of our in-house STT models against those of the big cloud providers. Here is the accuracy benchmark release in 2022 and the first release in 2021 and our second release in 2021. However the datasets we compared our STT models was a publicly available benchmark dataset that was on Medium and it included a wide variety of audio files - drawn from meetings, podcasts and telephony conversations.

Since 2023, Voicegain has focused on training and improving the accuracy of its in house Speech-to-Text AI models call center audio data. The benchmark we are releasing today is based on a Voicegain curated dataset of 40 audio files. These 40 files are from 8 different customers and from different industry verticals. For example two calls are consumer technology products, two are health insurance and one each in telecom, retail, manufacturing and consumer services. We did this to track how well the underlying acoustic models are trained on a variety of call center interactions.

Why a separate benchmark for Call Center Audio Data ?

In general Call Center audio data has the following characteristics

Narrowband: Most telephony systems used in call center encode the audio in a limited bandwidth 8 kHz format. Unless AI models are trained on such audio, the recognition accuracy can be limited.
Noisy data: There is significant background noise and over-talk in call center audio recordings.
Accents: Call Center agents work in different international locations. Even the end customers in the US have different accents. So the STT engine needs to be tuned to different accents.

Results of our Benchmark:

How was the accuracy of the engines calculated? We first created a golden transcript (human labeled) for each of the 40 files and calculated the Word Error Rate (WER) of each of the Speech-to-Text AI models that are included in the benchmark. The accuracy that is shown below is 1 - WER in percentage terms.

Accuracy Benchmark of different STT engines on Curate 8 kHz call center benchmark

Most Accurate - Amazon AWS came out on top with an accuracy of 87.67%

Least Accurate - Google Video was the least trained acoustic model on our 8 kHz audio dataset. The accuracy was 68.38%

Most Accurate Voicegain Model - Voicegain-Whisper-Large-V3 is the most accurate model that Voicegain provides. Its accuracy was 86.17%

Accuracy of our inhouse Voicegain Omega Model - 85.09%. While this is slightly lower than Whisper-Large and AWS, it has two big advantages. The model is optimized for on-premise/pvt cloud deployment and it can further be trained on client audio data to get an accuracy that is higher.

Custom Acoustic Model Training

One very important consideration for prospective customers is that while this benchmark is on the 40 files in this curated list, the actual results for their use-case may vary. The accuracy numbers shown above can be considered as a good starting point. With custom acoustic model training, the actual accuracy for a production use-case can be much higher.

Private Cloud/On-Premise Deployment

There is also another important consideration for customers that want to deploy a Speech-to-Text model in their VPC or Datacenter. In addition to accuracy, the actual size of the model is very important. It is in this context that Voicegain Omega shines.

Additional Result of our Streaming Speech-to-Text

We also found that Voicegain Kappa - our Streaming STT engine has an accuracy that is very close to the accuracy of Voicegain Omega. The accuracy of Voicegain Kappa is less than 1% lower than Voicegain Omega.

Reproducing this Benchmark

If you are an enterprise that would like to reproduce this benchmark, please contact us over email (support@voicegain.ai). Please use your business email and share your full contact details. We would first need to qualify you, sign an NDA and then we can share the PII-redacted version of these audio call recordings.

‍

Casey

AI Voice Agent Platform

Transcribe

Voicegain API

Key APIs

Signup for access to all APIs