TECH

Google Gemini Omni launch at I/O 2026

32+

Signals

Strategic Overview

01.
At Google I/O 2026 on May 19, Google unveiled Gemini Omni, a unified model that turns text, image, audio, video, and hand-drawn sketches into a single cohesive video output, with multi-turn conversational editing that preserves scene, character, and physics consistency.
02.
Gemini Omni Flash is the first model in the Omni family, rolling out to Google AI Plus, Pro, and Ultra subscribers via the Gemini app and Google Flow, and free to YouTube Shorts Remix and YouTube Create users 18 and over, with developer and enterprise API access promised in the coming weeks.
03.
All Omni outputs ship with SynthID watermarking and C2PA Content Credentials; Google says more than 100 billion images and videos have been watermarked with SynthID to date.
04.
Alongside Omni, Google made Gemini 3.5 Flash the new default model for the Gemini app and AI Mode in Search globally, and showcased the Spark personal agent, expanded Antigravity, and agentic Search features powered by 3.5 Flash.

One model, three stacks: the world-model bet behind Omni

The most distinctive thing about Gemini Omni is not what it generates but what it claims to model. According to DeepMind, Omni is built by fusing three previously separate lineages — the Gemini reasoning stack, the Veo video backbone, and the Genie world-simulation layer — into a single unified architecture ^[1]. The pitch is that the model does not draw frames the way diffusion video models traditionally do; it reasons about a scene, then synthesizes the next state in a way that is supposed to respect physical laws and continuity across edits. Decrypt summarized the DeepMind framing bluntly: Omni is a world model AI that can understand and simulate the world ^[2]. EfficientlyConnected's Paul Nashawaty went further, arguing that a native world model changes the operational surface area of enterprise AI applications, not just the creative output ^[3].

This matters because it lines up with DeepMind's longer-running AGI thesis. Hassabis used the keynote to position Omni as a step toward general intelligence, and the supporting blog post leans on the same vocabulary ^[4]. The technical bet is that grounding a generative model in something resembling a physics simulator is what eventually lets agents act in the real world rather than just talk about it. Whether that bet pays off is unfalsifiable today, but it explains why Google chose to launch Omni as a sibling architecture to Gemini 3.5 rather than as a standalone video product: in the company's telling, this is the same intelligence stack that will eventually drive Spark, Antigravity, and agentic Search.

Follow the money: I/O 2026 was a token-economics pitch wearing a video demo

Strip away the cinematic reel and I/O 2026 was an unusually direct sales pitch to CIOs. Sundar Pichai openly named the pain — token budgets are exhausting faster than enterprises can plan around them — and made 3.5 Flash the answer. Google says 3.5 Flash is four times faster on output tokens per second than its predecessor, with an optimized variant that is twelve times faster at equivalent quality, and that shifting roughly 80 percent of typical enterprise workloads to it could yield up to a billion dollars in annual savings ^[4]. TechCrunch's read of the launch was that the company is explicitly betting its next AI wave on agents rather than chatbots, with 3.5 Pro as orchestrator and a fleet of cheaper 3.5 Flash sub-agents underneath ^[5].

The scale behind that pitch is jarring. Google now processes about 3.2 quadrillion tokens per month, a sevenfold year-over-year jump, with 19 billion tokens per minute flowing through its APIs, while the Gemini app has roughly doubled to 900 million monthly active users ^[4]. Against that, the company is projecting 2026 capital expenditure of 180 to 190 billion dollars for AI infrastructure, roughly six times the 2022 figure ^[4]. Skeptics inside the developer community have already started picking that math apart, noting that GCP revenue is an order of magnitude smaller than the capex line, but the strategic message is consistent: Google is trying to make per-token cost the axis of competition while everyone else is still arguing about per-clip quality. Omni's free placement inside YouTube Shorts Remix and YouTube Create ^[1]is the consumer half of the same maneuver — flood the funnel where OpenAI just retreated.

The 4-turn ceiling and the contrarian read on stack integration

The most useful counterweight to the launch hype comes from independent reviewers who actually tried multi-turn editing. The JXP review team reports that in testing, four turns is the reliable ceiling for Omni and that turn five is where motion drift and character inconsistency begin ^[6]. That matters because conversational editing is the entire product thesis. If the consistency window collapses at five edits, the workflow looks less like a director iterating with a model and more like a series of single-shot regenerations dressed up as a conversation. DigitalApplied's broader comparison reaches a related verdict: Veo 3 remains state-of-the-art on raw fidelity, and Omni differentiates on iteration rather than per-clip quality ^[7]. Coverage in the analytics community has also noted that ByteDance's Seedance 2.0 still leads Omni Flash on some raw generation benchmarks ^[8].

Community reaction sharpened the tension in a useful way. The r/singularity audience treated the early demos as a clear step-change in output quality, while r/ArtificialInteligence reframed I/O 2026 as a vertical-integration play — six layers, from silicon up through a physics-aware media model, all owned by Google. The most-quoted contrarian voice pushed back hard on that thesis, arguing that Nvidia still beats TPU on silicon, CoreWeave beats GCP on cost, Claude beats Gemini on language, and Seedance beats Veo on video, so buyers should assemble best-of-breed instead of betting the workflow on one stack. That is the real fault line for the next twelve months: if Omni's editing window stretches and the agent stack genuinely interoperates, the integration story wins; if the four-turn ceiling holds and 3.5 Flash's price-performance lead narrows, the best-of-breed camp gets its proof point.

Historical Context

2025-05

Launched Veo 3 at I/O 2025, the direct predecessor whose video backbone is now consolidated into Omni.

2025-09-30

Released Sora 2, intensifying competition in AI video and pushing Google to consolidate its own stack.

2025-10-15

Shipped Veo 3.1 with scene extension, keeping a dedicated cinematic offering on Vertex AI even as Omni's design was finalized.

2026-04-26

Discontinued the Sora consumer app, opening a consumer-creator gap that Omni Flash on YouTube Shorts and Create is positioned to fill.

2026-05-19

Demis Hassabis introduced Gemini Omni at I/O 2026 as the first unified text, image, audio, and video architecture from a top-tier AI lab.

Power Map

Key Players

Subject

Google Gemini Omni launch at I/O 2026

Google DeepMind

Built Omni by fusing the Gemini reasoning stack, the Veo video backbone, and the Genie world-simulation layer into one unified architecture, positioning the model as DeepMind's most explicit world-model bet to date.

Demis Hassabis

DeepMind CEO who unveiled Omni at the keynote, framing it as a step toward AGI and a model that can understand and simulate the physical world.

Sundar Pichai

Google CEO who framed I/O 2026 as the agentic Gemini era and openly pitched 3.5 Flash to CIOs as the answer to enterprise token-budget exhaustion.

YouTube (Shorts Remix and Create)

Consumer distribution surface for Omni Flash; free access through Shorts Remix and the Create app puts conversational video editing in front of the largest creator base in the world.

OpenAI

Primary competitor whose Sora consumer app was discontinued on April 26, 2026, with Sora API sunset slated for September 24, 2026, creating the consumer-creator opening Omni Flash is engineered to fill.

ByteDance (Seedance 2.0)

Competing video model that still leads on raw generation quality in some early benchmarks against Omni Flash, keeping pressure on Google's fidelity claims.

Fact Check

8 cited

Source Articles

Top 3

THE SIGNAL.

Analysts

"Pitched 3.5 Flash as a model that outperforms Google's prior frontier model on nearly all benchmarks while staying very fast, anchoring the agent-era pricing story."

Koray Kavukcuoglu

CTO, Google DeepMind

"Describes the new product topology as 3.5 Pro acting as orchestrator and planner while 3.5 Flash runs as the various sub-agents underneath it."

Tulsee Doshi

Senior Director, Head of Product, Gemini

"Argues the real story is token economics: a native world model changes the operational surface area of enterprise AI applications, not just the demo reel."

Paul Nashawaty

Practice Leader and Lead Principal Analyst, ECI

"Conclude that Veo 3 remains state-of-the-art on raw fidelity while Omni Flash differentiates on iteration and conversational editing rather than per-clip quality."

DigitalApplied analysts

Independent AI video analysis team, DigitalApplied

"Find in hands-on testing that four turns is the reliable ceiling for Omni's multi-turn editing, with motion drift and character inconsistency beginning at turn five."

JXP review team

Independent reviewers, jxp.com

"Frame Gemini Omni as a world model AI that can understand and simulate the world, aligning with DeepMind's AGI-adjacent positioning."

Decrypt staff

Editorial team, Decrypt

The Crowd

"I uploaded a screenshot of Google Maps to Gemini Omni with a route drawn on it. Then I prompted it to create a first person view of someone driving a taxi cab along the route in the reference image. Pretty close to the real thing."

@@chrisfirst2313

"2) Combine ingredients to create custom videos with Gemini Omni Bring any idea to life using any combination of text, images, and video inputs. No special equipment, editing tools, or tech jargon required. Gemini Omni is available to all Google AI Plus, Pro and Ultra"

@@GeminiApp68

"GOOGLE JUST SHIPPED GEMINI OMNI AT I/O 2026 LAST WEEK AND THE ENTIRE VIDEO EDITING INDUSTRY HAS 12 MONTHS BEFORE ITS BUSINESS MODEL COLLAPSES. THIS IS NOT FUTURE TECH. THIS IS LIVE TODAY. CONVERSATIONAL VIDEO EDITING THROUGH NATURAL LANGUAGE WITH PHYSICS-AWARE COMPOSITING. MOST"

@@lagerskoy71

"Google omni is underrated"

@u/Independent-Wind44622200

Broadcast

What is Gemini Omni?

Introducing Gemini Omni: Create Anything from Anything

Gemini Omni is Totally Wild (Google's New Video Model)