realtime-voice ai-gateway transcription websocket vercel-ai

AI Gateway adds realtime voice and transcription support

Voice agents, text-to-speech, and transcription now available in AI Gateway with the same observability and spend controls as text models, via AI SDK 7.

Summary

Eliminates the need to chain separate models for voice interactions—a single realtime model handles audio in and out with low-latency response, reducing implementation complexity and latency for conversational agents. Observability and spend controls apply uniformly across modalities.

Why it matters

Implementation verdict

Replaces custom WebSocket plumbing and separate TTS/STT service integrations. Requires AI SDK 7 and a token route for client-side security. Beta status means API stability isn't guaranteed. Worth prototyping now if you need low-latency voice—the playground lets you test without code first.

Sources

1.Realtime support, a single model takes audio in and audio out, so a user can talk and hear a reply back in near real time
2.same observability, spend controls, and bring-your-own-key support as text, image, and video models in AI Gateway, with no markup or platform fees
3.available via AI SDK 7
4.The useRealtime hook handles microphone capture and playback

Dev Signal

Get briefs like this in your inbox — free, every weekday.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs