Google DeepMind has released Gemini 3.5 Live Translate, an audio model designed to deliver near real-time speech-to-speech translation across more than 70 languages. The model is being deployed simultaneously to developers via public preview, enterprise customers through Google Meet in private preview, and general consumers through the Google Translate app on Android and iOS.
How the Model Works
Unlike traditional turn-based translation systems that wait for a speaker to finish before producing output, Gemini 3.5 Live Translate generates translated speech continuously as audio is streamed. The system attempts to balance the trade-off between waiting for sufficient context to improve accuracy and translating immediately to stay synchronized with the speaker. According to DeepMind, the model typically stays only a few seconds behind the live speaker while preserving the original speaker’s intonation, pacing, and pitch.
The model also handles multilingual inputs automatically, without requiring manual language configuration, and is designed to remain functional in noisy or unpredictable environments.
Deployment Scope
- Developers: Access via the Gemini Live API and Google AI Studio in public preview. Integration partners including Agora, Fishjam, LiveKit, and Pipecat are building voice translation applications on top of the API.
- Enterprise: Google Meet will adopt Gemini 3.5 Live Translate, expanding supported languages from five to more than 70 and enabling over 2,000 language pair combinations per meeting, up from a previous limitation of English-only pairings. Private preview begins this month for select Google Workspace customers.
- Consumers: The Google Translate app on Android and iOS is receiving the model globally. Android users are also getting a new listening mode that streams translated audio through the phone’s earpiece without requiring headphones.
Ride-hailing platform Grab, whose drivers and riders make over 10 million voice calls per month, is among the early partners testing the model for multilingual communication at pickups.
SynthID Watermarking
All audio output generated by Gemini 3.5 Live Translate is watermarked using Google’s SynthID technology. The watermark is embedded imperceptibly into the audio waveform and is intended to keep AI-generated content detectable, which DeepMind frames as a measure to help prevent misinformation. A model card covering safety and responsibility details has been published alongside the release.
The broader rollout of the Google Meet integration for enterprise customers is scheduled for later in 2026.
