Google DeepMind has released two new generative media models targeting developer pipelines that require speed, cost efficiency, and multimodal capability: Nano Banana 2 Lite for image generation and Gemini Omni Flash for video generation and editing.

Nano Banana 2 Lite

Nano Banana 2 Lite (model identifier gemini-3.1-flash-lite-image) is positioned as the fastest and most cost-efficient model in the Nano Banana image family. Key performance characteristics include:

  • Latency: Text-to-image generation in approximately 4 seconds, suited to interactive prototyping and high-volume pipelines.
  • Cost: Priced at $0.034 per 1,000 images, making it viable for high-throughput workloads.
  • Capabilities retained: Prompt adherence, character consistency, and legible in-image text rendering are maintained despite the speed optimization.

DeepMind recommends Nano Banana 2 Lite as a direct replacement for developers currently using the first-generation Nano Banana model (gemini-2.5-flash-image), describing it as a drop-in swap with improvements across latency, quality, and cost.

The broader Nano Banana family now spans four tiers: Nano Banana 2 Lite for near-real-time volume work, Nano Banana 2 as a generalist option balancing quality and cost, Nano Banana Pro for complex professional tasks where accuracy outweighs speed, and the original Nano Banana now designated as legacy.

Alongside developer access via Google AI Studio and the Gemini API, the model is also rolling out to Google consumer products including AI Mode in Search, the Gemini app, NotebookLM, Google Photos, and Google Ads.

Gemini Omni Flash

Gemini Omni Flash (model identifier gemini-omni-flash-preview) is now available to developers through the Gemini API and Google AI Studio following its introduction at Google I/O. The model combines Gemini’s multimodal reasoning with video generation and editing, accepting text, image, and video inputs. Notable capabilities include:

  • Conversational video editing: Videos can be refined and modified using natural language instructions across multiple turns.
  • Multimodal referencing: Inputs from images, text, and video can be combined to maintain scene consistency.
  • Text and action synchronization: Text and graphics can be connected to on-screen actions through prompting.

The model is priced at $0.10 per second of video output. Current limitations include a maximum generation length of 10 seconds, no support for audio reference uploads or scene extension in the API, and video references up to 3 seconds are accepted by the API schema but are not correctly processed by the model at this time. Character consistency across scene changes and panning movements also has noted limitations.

Both models are available now via Google AI Studio, the Gemini API, and the Gemini Enterprise Agent Platform.