Two new speech-to-text models: Mini for batch work at $0.003/min with diarization, Realtime for live agents at sub-200ms latency, open-weights option available.
July 3, 2026
Summary
Replaces chunked-audio adapters with native streaming architecture, enabling real-time voice agents without offline model hacks. Diarization and context biasing reduce post-processing overhead for meeting transcription and domain-specific vocabularies.
Why it matters
Replaces chunked-audio adapters with native streaming architecture, enabling real-time voice agents without offline model hacks. Diarization and context biasing reduce post-processing overhead for meeting transcription and domain-specific vocabularies.
Implementation verdict
Realtime replaces Deepgram Nova and Assembly for voice agents; Mini undercuts on cost/accuracy for batch. Requires API key or Hugging Face weight download. Ready now—playground available in Mistral Studio for immediate testing. Worth trialing if you're building voice UX or call center automation.
Sources
Dev Signal
Get briefs like this in your inbox — free, every weekday.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.