asr multilingual benchmarking voice-agents code-switching

Code-switched speech breaks ASR pipelines predictably

ElevenLabs Scribe V2, Gemini 3 Flash, and AssemblyAI Universal 3-Pro handle bilingual switching best; transcription errors propagate directly into downstream task failure, measurable via Answer Error Rate.

June 10, 2026

Summary

Enterprise voice agents serving bilingual customers fail silently on code-switched input—misrouted tickets, wrong policy answers. Benchmark data lets you pick ASR systems that preserve semantic meaning, not just character accuracy.

Why it matters

Implementation verdict

Replaces guessing which ASR handles code-switching. Requires: testing your language pairs against this benchmark (Spanish-English, French-English, Canadian French-English, German-English covered). Ready now—use AU-Harness to evaluate models. Scribe V2 and Gemini 3 Flash are your safest bets; Whisper defaults to translation, not transcription, on code-switched audio.

Sources

1.ElevenLabs Scribe V2, Gemini 3 Flash, and Assembly AI Universal 3-Pro surface as the top models across metrics for the task
2.transcription errors propagate forward into every downstream component
3.When called without an explicit language parameter on code-switched audio, Whisper defaults to translating into English rather than transcribing
4.We report three metrics: Word Error Rate (WER), Semantic Word Error Rate (SWER), and Answer Error Rate (AER)

Dev Signal

Get briefs like this in your inbox — free, every weekday.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs