
Speech

How to Evaluate Voice Agents in 2025: Beyond Automatic Speech Recognition (ASR) and Word Error Rate (WER) to Task Success, Barge-In, and Hallucination-Under-Noise
Optimizing only for Automatic Speech Recognition (ASR) and Word Error Rate (WER) is insufficient for modern, interactive voice agents. Robust evaluation must measure end-to-end task success, barge-in behavior and latency, and hallucination-under-noise—alongside ASR, safety, and instruction following. VoiceBench offers a multi-facet speech-interaction benchmark across general knowledge, instruction following, safety, and robustness to speaker/environment/content variations, but…

Building a Speech Enhancement and Automatic Speech Recognition (ASR) Pipeline in Python Using SpeechBrain
In this tutorial, we walk through an advanced yet practical workflow using SpeechBrain. We start by generating our own clean speech samples with gTTS, deliberately adding noise to simulate real-world scenarios, and then applying SpeechBrain’s MetricGAN+ model to enhance the audio. Once the audio is denoised, we run automatic speech recognition with a language model–rescored…

What is OLMoASR and How Does It Compare to OpenAI’s Whisper in Speech Recognition?
The Allen Institute for AI (AI2) has released OLMoASR, a suite of open automatic speech recognition (ASR) models that rival closed-source systems such as OpenAI’s Whisper. Beyond just releasing model weights, AI2 has published training data identifiers, filtering steps, training recipes, and benchmark scripts—an unusually transparent move in the ASR space. This makes OLMoASR one…

Prediction Pulse: Kalshi and Polymarket face competition from Prophet Arena, plus Jerome Powell’s speech
Prediction markets had themselves another lively week. Kalshi rolled out shiny new NFL contracts even as lawyers kept filing lawsuits faster than you can say “parlay.” This time it was a second tribal challenge, aimed at keeping Kalshi off tribal lands. But the real twist came from the NCAA, which finally peeked its head above…

Why a new anti-revenge porn law has free speech experts alarmed | TechCrunch
Privacy and digital rights advocates are raising alarms over a law that many would expect them to cheer: a federal crackdown on revenge porn and AI-generated deepfakes. The newly signed Take It Down Act makes it illegal to publish nonconsensual explicit images — real or AI-generated — and gives platforms just 48 hours to comply…

Rime Introduces Arcana and Rimecaster (Open Source): Practical Voice AI Tools Built on Real-World Speech
The field of Voice AI is evolving toward more representative and adaptable systems. While many existing models have been trained on carefully curated, studio-recorded audio, Rime is pursuing a different direction: building foundational voice models that reflect how people actually speak. Its two latest releases, Arcana and Rimecaster, are designed to offer practical tools for…

OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI’s voice AI models have gotten it into trouble before with actor Scarlett Johansson, but that isn’t stopping the company from continuing to advance its offerings in this category. Today, the ChatGPT maker has unveiled three,…

The Trump administration is coming for student protesters
The Trump administration is embarking on a massive university speech crackdown, starting with Columbia University, where it’s demanding external control of entire departments and punishment for student activists. Its first test case, Mahmoud Khalil, a graduate student with a green card, offers a hint of what’s to come: a state of intentional chaos that undermines…