Voice agents, text-to-speech, and transcription now available in AI Gateway with the same observability and spend controls as text models, via AI SDK 7.
Summary
Eliminates the need to chain separate models for voice interactions—a single realtime model handles audio in and out with low-latency response, reducing implementation complexity and latency for conversational agents. Observability and spend controls apply uniformly across modalities.
Why it matters
Eliminates the need to chain separate models for voice interactions—a single realtime model handles audio in and out with low-latency response, reducing implementation complexity and latency for conversational agents. Observability and spend controls apply uniformly across modalities.
Implementation verdict
Replaces custom WebSocket plumbing and separate TTS/STT service integrations. Requires AI SDK 7 and a token route for client-side security. Beta status means API stability isn't guaranteed. Worth prototyping now if you need low-latency voice—the playground lets you test without code first.
Sources
Dev Signal
Get briefs like this in your inbox — free, every weekday.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.