Google DeepMind has released Gemini 3.1 Flash Live, its highest-quality real-time audio and voice model to date. The release targets three distinct audiences: developers accessing the model via the Gemini Live API in Google AI Studio, enterprises using Gemini Enterprise for Customer Experience, and general users through Search Live and Gemini Live.

Benchmark Performance

Google is backing the release with two third-party benchmark results. On ComplexFuncBench Audio, which tests multi-step function calling under various constraints, the model scores 90.8 percent, ahead of its predecessor. On Scale AI’s Audio MultiChallenge, which evaluates complex instruction following and long-horizon reasoning amid real-world interruptions, the model scores 36.1 percent with the “thinking” mode enabled. Both figures represent improvements over Google’s previous audio model generation.

Tonal and Conversational Improvements

The model includes enhanced tonal understanding, specifically improved recognition of acoustic signals such as pitch and pace. In enterprise deployments, it is described as more capable of dynamically adjusting responses when users express frustration or confusion. For consumer applications, Gemini Live built on 3.1 Flash Live now supports conversation context threads reported to be twice as long as those of the prior model, useful for extended brainstorming sessions.

Global Expansion and Multilingual Support

The 3.1 Flash Live model underpins a global rollout of Search Live, extending real-time multimodal search to more than 200 countries and territories. The model is described as inherently multilingual, enabling users to interact in their preferred language without mode switching.

SynthID Watermarking on All Audio Output

From a security and integrity standpoint, the most notable feature is that all audio generated by 3.1 Flash Live is watermarked using SynthID. The watermark is imperceptible to listeners and is embedded directly into the audio output at generation time, allowing downstream detection of AI-generated content. Google frames this as a measure to help prevent the spread of misinformation. Security teams evaluating AI-generated audio in threat intelligence, phishing calls, or synthetic voice fraud scenarios should note that SynthID detection tooling can, in principle, identify content produced by this model.

Early enterprise adopters cited in the announcement include Verizon, LiveKit, and The Home Depot, all of which provided positive feedback on conversational quality improvements.

Gemini 3.1 Flash Live is available in preview for developers via Google AI Studio starting today. A model card covering safety and responsibility details has been published alongside the release.