Recognition

How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise

Optimizing only for Automatic Speech Recognition (ASR) and Word Error Rate (WER) is insufficient for modern, interactive voice agents. Robust evaluation must measure end-to-end task success, barge-in behavior and latency, and hallucination-under-noise—alongside ASR, safety, and instruction following. VoiceBench offers a multi-facet speech-interaction benchmark across general knowledge, instruction following, safety, and robustness to speaker/environment/content variations, but…

Building a Speech Enhancement and Automatic Speech Recognition (ASR) Pipeline in Python Using SpeechBrain

ellonjohns1 month ago013 mins

In this tutorial, we walk through an advanced yet practical workflow using SpeechBrain. We start by generating our own clean speech samples with gTTS, deliberately adding noise to simulate real-world scenarios, and then applying SpeechBrain’s MetricGAN+ model to enhance the audio. Once the audio is denoised, we run automatic speech recognition with a language model–rescored…

What is OLMoASR and How Does It Compare to OpenAI’s Whisper in Speech Recognition?

ellonjohns2 months ago010 mins

The Allen Institute for AI (AI2) has released OLMoASR, a suite of open automatic speech recognition (ASR) models that rival closed-source systems such as OpenAI’s Whisper. Beyond just releasing model weights, AI2 has published training data identifiers, filtering steps, training recipes, and benchmark scripts—an unusually transparent move in the ASR space. This makes OLMoASR one…

Ultimate Guide to AI Voice Recognition

ellonjohns10 months ago012 mins

Introduction What is AI Voice Recognition? AI voice recognition is a technology that allows computers and devices to understand and respond to human speech. Imagine talking to your phone or a smart speaker, and it understands what you’re saying and follows your commands. This technology makes it possible. It’s like having a conversation with a…

Highlights

How to Design a Fully Functional Enterprise AI Assistant with Retrieval Augmentation and Policy Guardrails Using Open Source AI Models

The best smart doorbell cameras

Exclusive: Ghost of Yotei Joins Magic’s PlayStation x Secret Lair Crossover – IGN

4 ways mobile browsers are safer than PC

Category Collection

How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise

Building a Speech Enhancement and Automatic Speech Recognition (ASR) Pipeline in Python Using SpeechBrain

What is OLMoASR and How Does It Compare to OpenAI’s Whisper in Speech Recognition?

Ultimate Guide to AI Voice Recognition