TECH

NVIDIA Cosmos 3 Open Omnimodel for Physical AI

37+

Signals

Strategic Overview

01.
NVIDIA launched Cosmos 3 at GTC Taipei and Computex 2026, billing it as the first fully open omnimodel for Physical AI that natively understands and generates language, images, video, ambient sound, and action trajectories inside a single architecture.
02.
The model ships in two open-weight variants — Cosmos 3 Super (64B: 32B reasoner + 32B generator) for datacenter use and Cosmos 3 Nano (16B: 8B + 8B) for workstation deployment — with weights and code released on Hugging Face and GitHub under the commercial-friendly OpenMDW-1.1 license.
03.
Architecturally Cosmos 3 uses a mixture-of-transformers (MoT) design that pairs a reasoning transformer with an expert generation transformer, letting the model first understand object interactions and spatial-temporal relationships before emitting video frames or action sequences.
04.
NVIDIA claims Cosmos 3 collapses Physical AI training and evaluation cycles from months to days, and at launch it tops the open-model leaderboards on Artificial Analysis (T2I and I2V), Physics-IQ, RoboArena, R-Bench, PAI-Bench, RoboLab, VANTAGE-Bench, and TAR.
05.
The release lands alongside a founding Cosmos Coalition (Agile Robots, Black Forest Labs, Generalist, LTX, Runway, Skild AI) and adopter wins spanning humanoid robotics (Doosan, LG, Samsung, 1X, Agility, Unitree), autonomy (Li Auto, DeepRoute, VinFast, Uber), and Taiwan-anchored manufacturing (Foxconn, TSMC, Pegatron, Compal, Delta, Inventec).

One model, two transformers: how Cosmos 3 collapses perception and generation

The defining mechanism in Cosmos 3 is a mixture-of-transformers (MoT) layout that pairs a reasoning transformer with an expert generation transformer inside a single forward pass. The reasoner first parses a scene — object interactions, motion, spatial-temporal relationships — and only then hands a structured representation to the generator, which emits the next video frames, audio, or robot action trajectory ^[1]. NVIDIA frames this as the difference between a model that 'understands what matters' and a model that merely paints plausible pixels, a distinction Rev Lebaredian stresses when he calls Cosmos 3 a 'physically accurate simulation' that predicts what happens next and generates actions ^[10].

In the Cosmos 2 generation, perception (Reason) and generation (Predict, Transfer) lived in separate models that had to be stitched together by application code. Cosmos 3 collapses that pipeline into one omnimodel that natively spans text, image, video, ambient sound, and action ^[2]. The practical payoff is fewer integration seams between the model that 'sees' and the model that 'acts,' which is what robotics teams need when they ask a foundation model to close the loop from camera input to motor command. NVIDIA argues this is what cuts training and evaluation cycles 'from months to days' for downstream Physical AI teams ^[1].

Open weights, NVIDIA hardware: the strategic geometry of the launch

Cosmos 3 is open, but it is opinionated about where it runs. NVIDIA shipped Super and Nano under the OpenMDW-1.1 license with commercial use permitted ^[14], posted the Cosmos3-Super weights on Hugging Face ^[5]alongside the Cosmos3-Nano variant ^[6], and released reference code via the NVIDIA/Cosmos GitHub repo ^[9]. Yet the inference profile is tuned for Hopper/Blackwell datacenter GPUs in the case of Super, and a single RTX PRO 6000 workstation for Nano ^[4]. The openness is real; the optimal hardware path runs straight back through NVIDIA's order book.

The ecosystem move reinforces that geometry. NVIDIA announced a Cosmos Coalition with Agile Robots, Black Forest Labs, Generalist, LTX, Runway, and Skild AI as founding partners ^[12], then bundled the launch with adopter lists that read like a Taiwan supply-chain roll call — TSMC, Foxconn, Pegatron, Compal, Delta, and Inventec, anchored at GTC Taipei ^[13]. The strategic read is open weights as a developer-acquisition channel for a closed hardware platform: free the model so every robotics and AV team in the world picks NVIDIA when they pick GPUs.

The synthetic-data flywheel and the numbers it has produced so far

The use case NVIDIA wants developers to internalize is synthetic data generation. Real robot and AV training data is rare, expensive, and dangerous to collect — collisions, edge-case weather, factory defects — so Cosmos 3's promise is to synthesize that data at scale and feed it back into the policies it also trains ^[12]. Self-driving teams using the platform reported running more than 300,000 renders/simulations per day ^[10], and AV adopters Li Auto, DeepRoute.ai, VinFast, and Uber are already named users ^[10].

The early manufacturing numbers, while still NVIDIA-sourced, point in the same direction. Pegatron reported a 67% reduction in training and deployment time, Delta Electronics a 17% lift in detection rate, Inventec a 30% drop in defect-data collection effort, and Foxconn a 3% improvement in first-pass yield ^[10]. Across the wider open-model field, the Cosmos 3 research page also reports Cosmos 3 topping Artificial Analysis (T2I + I2V), Physics-IQ, RoboArena, R-Bench, PAI-Bench, RoboLab, VANTAGE-Bench, and TAR at launch ^[7]. The composite story is a synthetic-data flywheel that NVIDIA can point at every step of a robotics or AV training loop and credibly claim a quantitative gain.

The reality gap: hardware floor, sim-to-real, and the practitioner pushback

The first reality check is hardware. Only Cosmos 3 Nano is explicitly tuned for workstation-grade compute like the RTX PRO 6000, while Super expects datacenter Hopper or Blackwell GPUs ^[4]. That places Super beyond reach for the typical individual developer and pushes most of the field toward Nano or hosted inference, which keeps the action inside an NVIDIA-shaped envelope.

The second reality check is methodological. An independent review cautions that Physical AI benchmarks lack the third-party reproduction infrastructure that LLM benchmarks have grown over the past three years, warning to treat headline numbers 'as strong directional signals, not as guarantees' and noting that 'the sim-to-real gap remains the dominant unsolved problem in humanoid robotics' ^[15]. Cosmos 3 may dominate every leaderboard NVIDIA cites, but the leaderboards themselves are young, single-vendor, and rarely reproduced. There is also a downstream safety question that the openness debate has not yet resolved: an omnimodel that can synthesize rare and dangerous scenarios on demand is genuinely useful for AV training and equally useful for actors NVIDIA does not control, and the OpenMDW commercial-use license does not by itself answer that. Directionally important, not yet validated.

Historical Context

2025-01-06

NVIDIA announced the original Cosmos World Foundation Model platform at CES 2025 (arXiv:2501.03575), making Cosmos-Predict1 and Cosmos-Transfer1 openly available to Physical AI developers.

2025

Cosmos 2 / 2.5 line followed the original Cosmos platform but kept perception and generation as separate models and remained limited to text, image, and video modalities — Cosmos 3 unifies them.

2026-05-31

Cosmos 3 launched at the GTC Taipei keynote, unifying language, image, video, audio, and action inside a single omnimodel architecture for the first time in the Cosmos line.

2026-06-01

Cosmos 3 received its broader Computex reveal alongside NVIDIA's chip-fab tooling and the Isaac GR00T humanoid platform, tying the model into NVIDIA's full Physical AI stack.

Power Map

Key Players

Subject

NVIDIA Cosmos 3 Open Omnimodel for Physical AI

NVIDIA

Creator and frontier-model sponsor; bundles Cosmos 3 with the Isaac GR00T humanoid platform, NIM microservices, and chip-fab tooling to extend its Physical AI stack from datacenter GPUs down to factory floors.

Cosmos Coalition (Agile Robots, Black Forest Labs, Generalist, LTX, Runway, Skild AI)

Founding partner coalition that will jointly advance open world models, giving NVIDIA an ecosystem lever as it pushes open weights into a space the largest closed labs have so far kept proprietary.

Agile Robots

Early-access partner using Cosmos 3 as a neural simulator to train robotic policies across its 20,000+ deployed robotic solutions worldwide.

Robotics adopters (Doosan, LG, Samsung, Skild AI, 1X, Agility, FieldAI, Unitree, Universal Robots, NEURA)

Build robot policies on Cosmos and partner with NVIDIA on the Isaac GR00T humanoid reference platform announced alongside Cosmos 3.

Autonomous-vehicle adopters (Li Auto, DeepRoute.ai, VinFast, Uber)

Use Cosmos for AV synthetic data and scenario simulation; self-driving teams reported running 300,000+ daily renders/simulations on the platform.

Vision-AI and manufacturing adopters (Foxconn, TSMC, Pegatron, Compal, Delta, Inventec)

Use Cosmos for industrial visual inspection and smart-space agents; reported gains include Pegatron's -67% training time, Delta's +17% detection rate, and Foxconn's +3% first-pass yield.

Fact Check

15 cited

Source Articles

Top 5

THE SIGNAL.

Analysts

"Frames Cosmos 3 as the inflection point for Physical AI, arguing that multimodal reasoning and world models are converging into a single moment for embodied intelligence."

Jensen Huang

Founder and CEO, NVIDIA

"Positions the Cosmos 3 family as a generational leap in the developer toolchain for robots, autonomous vehicles, and vision AI that must perceive, reason, plan, and act in the physical world."

Jensen Huang

Founder and CEO, NVIDIA

"Describes Cosmos 3 as a physically grounded simulator that both predicts next-step dynamics and emits actions, distinguishing it from generic video models that only render plausible pixels."

Rev Lebaredian

Vice President of Physical AI Simulation, NVIDIA

"Warns that benchmark wins lack the third-party reproduction infrastructure of LLM benchmarks and stresses that sim-to-real transfer remains the dominant unsolved problem in humanoid robotics."

buildfastwithai independent reviewer

Independent reviewer, buildfastwithai.com

The Crowd

"Introducing Cosmos 3: Our latest frontier model for Physical AI Cosmos 3 is the world's first fully open omnimodel with native vision reasoning, world and action generation. Today we're releasing Super (32B) and Nano (8B) variants."

@@NVIDIAAI2631

"This is THE moment of Physical AI! We are officially announcing Cosmos 3: Omnimodal World Models for Physical AI - Cosmos 3 is an omnimodal world model: within a unified architecture, it can understand and generate language, images, video, audio, and actions. - It is not just a VLM, not just a video generator, not just an audio-visual generative model, and not just a physics simulator / world-action model. It can understand images and videos, generate images, videos, and audio, simulate future worlds, predict actions, and generate robot policies—enabling models to truly begin to "touch the world." - Cosmos 3 is the #1 open-weight reasoner / T2I / I2V / robot policy across many benchmarks."

@@mli0603930

"1/ NVIDIA just open-sourced Cosmos 3 at GTC Taipei! It's the first fully open "omnimodel" for physical AI - one model that understands the real world, predicts what happens next, and generates the actions a robot should take. Weights, code, datasets. All open. And this is really big. Lets dig into everything:"

@@kimmonismus166

"Nvidia releases Cosmos3-Super-Image2Video . 64B parametres"

@u/AgeNo5351407

Broadcast

The Big Bang Of AI Just Happened: Cosmos 3

The Future of Physical AI is Here

Jensen's Four AIs: How Cosmos 3 Closes the Physical AI Loop