How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise

How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise

Optimizing only for Automatic Speech Recognition (ASR) and Word Error Rate (WER) is insufficient for modern, interactive voice agents. Robust evaluation must measure end-to-end task success, barge-in behavior and latency, and hallucination-under-noise—alongside ASR, safety, and instruction following. VoiceBench offers a multi-facet speech-interaction benchmark across general knowledge, instruction following, safety, and robustness to speaker/environment/content variations, but…

Read More
Building a Speech Enhancement and Automatic Speech Recognition (ASR) Pipeline in Python Using SpeechBrain

Building a Speech Enhancement and Automatic Speech Recognition (ASR) Pipeline in Python Using SpeechBrain

In this tutorial, we walk through an advanced yet practical workflow using SpeechBrain. We start by generating our own clean speech samples with gTTS, deliberately adding noise to simulate real-world scenarios, and then applying SpeechBrain’s MetricGAN+ model to enhance the audio. Once the audio is denoised, we run automatic speech recognition with a language model–rescored…

Read More
What is OLMoASR and How Does It Compare to OpenAI’s Whisper in Speech Recognition?

What is OLMoASR and How Does It Compare to OpenAI’s Whisper in Speech Recognition?

The Allen Institute for AI (AI2) has released OLMoASR, a suite of open automatic speech recognition (ASR) models that rival closed-source systems such as OpenAI’s Whisper. Beyond just releasing model weights, AI2 has published training data identifiers, filtering steps, training recipes, and benchmark scripts—an unusually transparent move in the ASR space. This makes OLMoASR one…

Read More
Rime Introduces Arcana and Rimecaster (Open Source): Practical Voice AI Tools Built on Real-World Speech

Rime Introduces Arcana and Rimecaster (Open Source): Practical Voice AI Tools Built on Real-World Speech

The field of Voice AI is evolving toward more representative and adaptable systems. While many existing models have been trained on carefully curated, studio-recorded audio, Rime is pursuing a different direction: building foundational voice models that reflect how people actually speak. Its two latest releases, Arcana and Rimecaster, are designed to offer practical tools for…

Read More
OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds

OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI’s voice AI models have gotten it into trouble before with actor Scarlett Johansson, but that isn’t stopping the company from continuing to advance its offerings in this category. Today, the ChatGPT maker has unveiled three,…

Read More