TECH

NVIDIA Nemotron 3 Ultra release

33+

Signals

Strategic Overview

01.
On June 4, 2026, NVIDIA released Nemotron 3 Ultra, a 550B-parameter (55B active) open Mixture-of-Experts hybrid Mamba-Transformer model with a 1M-token context window, purpose-built for long-running agents.
02.
Nemotron 3 Ultra scores 47.7 on the Artificial Analysis Intelligence Index (48.2 in BF16), making it the most intelligent US open-weights model, though it still trails the Chinese frontier (Kimi K2.6 at 53.9).
03.
The model ships with open weights, training data, and recipes under the Linux Foundation's OpenMDW-1.1 license, alongside day-zero distribution across 25+ platforms including NVIDIA NIM, Hugging Face, AWS SageMaker JumpStart, Together AI, Fireworks AI, DeepInfra, OpenRouter, Nebius, Ollama, and Modal.
04.
NVIDIA simultaneously launched the Nemotron Coalition with 8 inaugural members — LangChain, Mistral AI, Black Forest Labs, Cursor, Perplexity, Reflection AI, Sarvam, and Thinking Machines Lab — to co-develop open frontier models.

The architecture bet: hybrid Mamba-Transformer at 550B, scaled for million-token agent loops

Nemotron 3 Ultra is not a vanilla dense or MoE transformer. It is a hybrid Mamba-Transformer Mixture-of-Experts: 550B total parameters with 55B active per token, 108 layers, an 8,192 model dimension, 512 experts per layer, and top-22 routed per token ^[2]. The Mamba state-space component is what makes the 1M-token context window economical — attention's quadratic cost is replaced by linear-time selective scans on long ranges, while a smaller proportion of transformer blocks preserves the precise recall that long-horizon agents need.

That design choice is downstream of the workload. Long-running agents accumulate hundreds of tool calls, scratchpads, and retrieved documents per task; a 1M-token window means the agent can carry that working memory inside the model rather than offloading it to a vector store. The benchmarks NVIDIA highlights — RULER@1M of 94.7/95% needle-in-haystack recall and SWE-Bench Verified of 71.9 — are the ones a long-context agent actually fails on when its architecture is wrong ^[1]^[2]. In other words, NVIDIA is not chasing a raw intelligence score; it is chasing the intelligence-at-long-context curve.

Follow the money: why ~30% cheaper per agent task compounds faster than it looks

NVIDIA's headline claim is that Nemotron 3 Ultra delivers up to ~5-6x higher inference throughput than comparable open LLMs and lowers cost per agentic task by up to 30% ^[1]. The throughput numbers it publishes — 5.9x vs GLM-5.1, 4.8x vs Kimi K2.6, 1.6x vs Qwen-3.5 at 8K input / 64K output — are measured at the long-output regime that defines real agent runs, not at the 200-token chat-completion regime where most benchmarks live ^[2].

The pricing follows: $0.60 per 1M input tokens and $2.60 per 1M output tokens on Artificial Analysis-tracked endpoints, with output speeds of 140 tokens/sec median and peaks above 400 tokens/sec on DeepInfra ^[3]^[4]. For a team running a deep-research agent that emits 50K-200K output tokens per task, a 30% reduction in per-task token consumption stacks on top of a lower per-token rate — the compounding is what BuildFastWithAI flagged as the genuinely interesting economics for infrastructure operators at scale ^[9]. The pressure on closed providers offering 1M-context agent tiers is straightforward: match the price or explain the gap ^[10].

The 'truly open' trump card: OpenMDW-1.1 ships weights, data, and recipes — not just weights

Most open releases ship weights. Nemotron 3 Ultra ships weights, training data, and recipes under the Linux Foundation's OpenMDW-1.1 license, a permissive license purpose-built for AI model distribution ^[6]^[10]. The training-data manifest itself is unusually specific: 173B refreshed GitHub tokens through September 30, 2025, 35B synthesized Wiki tokens, and 4B synthetic legal tokens are called out by name in NVIDIA's developer post ^[1].

For enterprises, the value is licensing clarity — a single permissive license that covers weights, data, code, and documentation removes the legal ambiguity that has dogged Llama-style 'open' releases for years ^[10]. For the open community, the value is reproducibility: with recipes and data published, fine-tuning a domain-specialized Nemotron variant is not a black-box exercise. That fits the dominant theme in the r/LocalLLaMA top thread, where commenters argued the differentiator isn't the off-the-shelf chat quality but the fact that Nemotron is one of the few truly open frontier-scale bases available for commercial fine-tuning and US-government-certified workloads.

Reality check: the hardware-poor verdict and the 'fine-tune starting point' framing

On launch day NVIDIA's marketing and the broader X conversation were positive: Cline shipped it for free on day zero, AWS SageMaker JumpStart and Together AI listed it, and Artificial Analysis confirmed the US-open-weights crown. But the Reddit reception is materially cooler. The Unsloth GGUF release thread laid out the real hardware floor — roughly 200GB of RAM for 2-bit, 256GB for 3-bit, and 600GB for 8-bit — putting native local inference out of reach for almost every hobbyist. Meanwhile the r/opencodeCLI thread was openly skeptical of off-the-shelf coding quality, with top commenters calling it 'prone to hallucinations' and pointing out that 'the point is to be a fine-tuning starting point.'

That reframes the launch. Nemotron 3 Ultra is not a democratization story for individual developers; it is an enterprise and fine-tuning play. The democratization rhetoric in Jensen Huang's coalition announcement ^[5]reads against the operational reality that the people who can actually run this model unquantized are inference providers and enterprises with multi-GPU H100/B200 clusters. The hobbyist surface is the GGUF on Hugging Face and the free 14-day Nous Portal window — useful, but a different product than what the press release describes.

NVIDIA the model company: a coalition moat around its own silicon

Alongside the model, NVIDIA announced the Nemotron Coalition with eight inaugural labs — LangChain, Mistral AI, Black Forest Labs, Cursor, Perplexity, Reflection AI, Sarvam, and Thinking Machines Lab — covering agentic frameworks, multimodal generation, coding tools, search, sovereign language models, and foundation research ^[5]. Add coalition partner Nous Research, and the lineup looks less like a model release and more like an operating system: NVIDIA provides the open frontier base, coalition members contribute domain-specific post-training, evaluation, and product surfaces, and the entire stack is optimized for NVIDIA's silicon.

This is the strategic shift that the Prompt Engineering video posed as a question — is NVIDIA a model company now? The Computex announcement and OpenMDW-1.1 licensing answer affirmatively: NVIDIA is no longer content to sell the picks and shovels while OpenAI, Anthropic, and Chinese labs define the agent layer. The China gap remains real — 47.7 vs Kimi K2.6's 53.9 means the US open-weights lead is over Llama and other US labs, not over the global frontier ^[8]. But by coupling a frontier-class open model with a coalition of category-leading labs, NVIDIA is buying optionality: if the agent economy consolidates on open weights, the Nemotron stack is the natural default; if it stays closed, NVIDIA still sold the GPUs underneath.

Historical Context

2025-12-15

NVIDIA debuts the Nemotron 3 family (Nano, Super, Ultra). Nano ships immediately; Super and Ultra are slated for H1 2026.

2026-06-01

Nemotron 3 Ultra is announced at Computex 2026 in Jensen Huang's keynote as the largest model in the Nemotron 3 family.

2026-06-04

Nemotron 3 Ultra is released with open weights, training data, and recipes under OpenMDW-1.1, distributed across 25+ platforms simultaneously.

2026-06-04

Nous Research opens free Nemotron 3 Ultra access on Nous Portal through June 18, powered by Nebius, for use with the Hermes Agent harness.

Power Map

Key Players

Subject

NVIDIA Nemotron 3 Ultra release

NVIDIA

Model developer and coalition organizer; released the full open stack (weights, data, recipes) under OpenMDW-1.1 to compete with Chinese open-weights leaders and to anchor agentic workloads on its silicon.

Nous Research

Coalition partner; partnered with Nebius to offer Nemotron 3 Ultra free on Nous Portal from June 4 to June 18 for use with Hermes Agent.

LangChain

Inaugural Nemotron Coalition member; contributes agentic tool-use and long-horizon reasoning evaluation expertise across a framework ecosystem with over 100M monthly downloads.

Inference platforms

Day-zero distribution: NVIDIA NIM, Hugging Face, AWS SageMaker JumpStart, OpenRouter, Together AI, Fireworks AI, DeepInfra, Perplexity, Nebius, Ollama, and Modal carry the model from launch day.

Artificial Analysis

Independent benchmark organization providing the canonical intelligence index ranking; positions Nemotron 3 Ultra as the top US open-weights model while flagging the residual gap to Chinese leaders.

Fact Check

10 cited

Source Articles

Top 5

THE SIGNAL.

Analysts

"Frames open models as essential to global AI participation and innovation, casting Nemotron 3 Ultra as NVIDIA's contribution to an open frontier."

Jensen Huang

CEO, NVIDIA

"Argues that frontier models must move beyond raw intelligence to support reliable agentic tool use, long-horizon reasoning, and agent coordination — the design target Nemotron 3 Ultra is post-trained for."

Harrison Chase

CEO, LangChain

"Sees open frontier models as how AI becomes a true platform technology — the philosophical pitch underpinning Mistral's coalition membership."

Arthur Mensch

CEO, Mistral AI

"Ranks Nemotron 3 Ultra as the best US open-weights model on its intelligence index but notes it remains behind the Chinese-led open weights frontier represented by Kimi K2.6 at 53.9."

Artificial Analysis

Independent benchmark organization

"Concludes that the combination of frontier intelligence, higher throughput, and ~30% lower per-task token consumption is the genuinely interesting story for teams operating agent infrastructure at scale."

BuildFastWithAI

Independent technical review

The Crowd

"Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models."

@@NVIDIAAI3247

"NVIDIA just announced the release of Nemotron 3 Ultra in Jensen Huang's Computex keynote: at 550B parameters (55B active), this is the largest Nemotron 3 model to date, and it is the most intelligent US open weights model We partnered with @nvidia to evaluate this model for the Artificial Analysis Intelligence Index."

@@ArtificialAnlys929

"NVIDIA just released Nemotron 3 Ultra, a 550b-parameter agentic coding model with a 1m context window. It was built for token efficiency, and is up to 5x faster and 30% cheaper than other similar models. It's the largest US open-weights model release ever. Free in Cline now!"

@@cline204

"NVIDIA announces Nemotron 3 Ultra"

@u/themixtergames412

Broadcast

Introducing NVIDIA Nemotron 3 Ultra: An Open 550B Model for Long-Running Agents

Nemotron 3 Ultra NVIDIA's 550B Open Model

Nemotron 3 Ultra: Is NVIDIA a Model Company Now?