speech-to-speech real-time-translation gemini-api multilingual audio-models

Gemini 3.5 Live Translate ships speech-to-speech translation

Streaming speech-to-speech model detects 70+ languages, generates continuous translated audio with <5 second latency via Gemini Live API, handles noise-robust inputs without manual language config.

Summary

Eliminates turn-by-turn translation bottleneck for real-time multilingual voice apps. Developers can build dubbing and simultaneous multi-language translation without managing complex media streaming infrastructure—platform partners (Agora, LiveKit, Pipecat) handle that layer.

Why it matters

Implementation verdict

Replaces previous Google Translate limit of 5 languages and English-only routing. Requires Gemini Live API integration (public preview for developers) or app-level integration via Google Translate SDK. Worth trying now if building voice features—early partners (Grab, CJ ENM) report low latency and quality. Private preview for Google Meet; mobile rollout already live.

Sources

1.Gemini 3.5 Live Translate, our latest audio model for live speech-to-speech translation
2.automatically detects 70+ languages and generates smooth, natural-sounding translated speech that preserves the speakers' intonation, pacing and pitch
3.stays just a few seconds behind the speaker throughout the session
4.Offering 70+ languages, an improvement from the previous limit of just five languages
5.Enabling conversations across over 2000+ language combinations in one meeting

Dev Signal

Get briefs like this in your inbox — free, every weekday.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs