realtime-voice ai-sdk speech-to-text vercel-gateway webrtc

AI Gateway adds realtime voice and speech support

Realtime voice agents skip the speech-to-text→LLM→text-to-speech pipeline; single model handles bidirectional audio with server-side VAD and mid-conversation tool calls.

Summary

Developers building voice interfaces can now route audio through the same Gateway infrastructure as text/images, with unified routing, spend controls, and observability. Eliminates latency overhead of chaining three separate models for conversation.

Why it matters

Implementation verdict

Ready now in AI SDK 7 beta. Replaces custom WebSocket + three-model chains with `useRealtime` hook and token-based auth. Requires microphone access in browser; production-ready for OpenAI GPT-4 Realtime and xAI Grok TTS. Worth trying if already using AI Gateway.

Sources

1.Audio launches with models from OpenAI and xAI
2.These capabilities are in beta and available in AI SDK 7
3.What sets it apart from chaining models together is that a single realtime model hears audio and produces audio directly, instead of running a speech-to-text, then language model, then text-to-speech pipeline
4.turnDetection: { type: 'server-vad' } lets the server decide when the user has stopped speaking, and lets the user talk over the model to cut a reply short (barge-in), with no client-side silence timers

Dev Signal

Get briefs like this in your inbox — free, every weekday.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs