Realtime voice agents skip the speech-to-text→LLM→text-to-speech pipeline; single model handles bidirectional audio with server-side VAD and mid-conversation tool calls.
Summary
Developers building voice interfaces can now route audio through the same Gateway infrastructure as text/images, with unified routing, spend controls, and observability. Eliminates latency overhead of chaining three separate models for conversation.
Why it matters
Developers building voice interfaces can now route audio through the same Gateway infrastructure as text/images, with unified routing, spend controls, and observability. Eliminates latency overhead of chaining three separate models for conversation.
Implementation verdict
Ready now in AI SDK 7 beta. Replaces custom WebSocket + three-model chains with `useRealtime` hook and token-based auth. Requires microphone access in browser; production-ready for OpenAI GPT-4 Realtime and xAI Grok TTS. Worth trying if already using AI Gateway.
Sources
Dev Signal
Get briefs like this in your inbox — free, every weekday.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.