TECH

NVIDIA Vera Rubin Platform Enters Production

38+

Signals

Strategic Overview

01.
NVIDIA announced on May 31, 2026 that its seven-chip Vera Rubin platform has ramped into full production globally, framed as an AI factory engine for agentic workloads with shipments beginning in fall 2026.
02.
Vera is NVIDIA's first in-house datacenter CPU, built with 88 custom Olympus Arm cores, up to 1.5 TB of LPDDR5X memory, 1.2 TB/s memory bandwidth, and a 1.8 TB/s NVLink-C2C link to Rubin GPUs, positioned as the CPU for agents.
03.
Anthropic, OpenAI, SpaceX, and CoreWeave are named early customers, while HPE introduced the ProLiant Compute DL394 Gen12, the first OEM server purpose-built around Vera, available fall 2026.
04.
NVIDIA claims a 10x agent throughput improvement over the prior Grace Blackwell platform and raised its sales projection from $500B through 2026 to $1T through 2027 on the back of inference economics.

Why NVIDIA Built a CPU: The Agentic Workload Has a Different Shape

For a decade the NVIDIA story has been that the GPU is what matters and the host CPU is plumbing. Vera Rubin reverses the framing. The Vera CPU is NVIDIA's first in-house datacenter CPU, with 88 custom Olympus Arm cores and 176 threads via Spatial Multithreading, marketed verbatim as 'the CPU for agents' ^[1]. The reason is that the dominant compute shape inside frontier labs is changing. Reinforcement-learning loops, sandboxed Python execution, tool calls, retrieval, and orchestration are CPU-bound work — high single-thread throughput, large coherent memory, and predictable latency matter more than the next 20 percent of GPU FLOPS. NVIDIA's own pitch is that Vera delivers 1.8x faster agentic sandbox performance versus leading x86 CPUs ^[2], a benchmark category that didn't exist on a server CPU datasheet two years ago.

The second tell is what's bolted next to Vera in the rack. The seven-chip platform — Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, Spectrum-6 Ethernet Switch, and a newly integrated NVIDIA Groq 3 LPU — explicitly bundles a Groq-style language-processing unit alongside the GPU ^[3]. Moor Insights & Strategy analyst Matt Kimball read this as the quiet part out loud: 'they're quietly acknowledging that their GPUs are not the answer for every single workload' ^[4]. Dario Amodei's framing on the Anthropic side — that agentic Claude usage 'demands infrastructure that can keep pace' ^[3]— describes the same workload shift from the customer side. Vera is what NVIDIA built once it accepted that the unit of work is no longer 'a forward pass' but 'an agent loop'.

The Bundle Strategy: Seven Chips, One Rack, and Whose Wallet Share It Eats

Vera Rubin is best read as a vertical-integration play whose target is not just AMD and Intel but the entire supporting cast around the GPU. By selling the rack as one SKU — CPU, GPU, two switches, a SuperNIC, a DPU and an LPU — NVIDIA collects revenue that previously flowed to Intel and AMD on the CPU socket, to Broadcom on Ethernet, and to Arista or merchant DPU vendors on the data plane. NVIDIA's own escalation of its sales projection from $500B to $1T through 2027, reported by Data Center Knowledge from GTC 2026 ^[4], is roughly the size of that absorbed wallet share when applied across the next two years of hyperscaler capex.

The early customer roster is what gives the bundle pricing power. Anthropic and OpenAI are training and serving on Vera Rubin ^[3]; CoreWeave is committing to million-GPU AI factories on it ^[5]; HPE has built an OEM server specifically around Vera ^[6]; and NYSE Group is scaling its 1.1-trillion-messages-per-day trading stack on Vera CPUs via Redpanda and HPE ^[7]. That last one matters because it pulls Vera out of pure AI infrastructure and into latency-sensitive finance, which is the buyer profile Intel Xeon has historically owned. Lian Jye Su at Omdia framed the broader pattern as 'increasing demand from enterprises for a more tightly integrated and highly optimized full-stack AI infrastructure' ^[8]. The customer list reads less like a launch press release and more like a coordinated migration off the prior x86-plus-merchant-networking stack.

The 10x Number Has a Believable Core and a Marketing Skin

NVIDIA's headline claim — 10x agent throughput at scale versus the previous-generation Grace Blackwell platform ^[5]— is the figure that has driven both the stock-market reaction and the skepticism. Independent data is starting to triangulate where the real number is. Phoronix ran Linux server benchmarks across roughly 400 workloads and reported Vera leading the field with a 1.5x geomean advantage over a 128-core Intel Xeon 6980P and roughly 10 percent over AMD's EPYC 9575F ^[9], plus a 1.6x geomean improvement over the 72-core Grace predecessor ^[10]. That places Vera unambiguously ahead on general-purpose server work, but well short of 10x on isolated workloads.

The 10x figure is reconcilable, but only if read precisely. It is a platform claim — 10x agent throughput at full rack scale — not a single-CPU claim, and it leans on Rubin GPU memory (HBM4 at 22 TB/s, 2.8x Blackwell) plus 6th-gen NVLink's 260 TB/s copper spine and the new Groq 3 LPU to pull inference off the GPU entirely. Reddit's r/nvidia thread on the launch did not let the gap pass quietly: investor-leaning voices treated the full-stack pivot as the real story and accepted the 10x throughput claim at face value, while developer-leaning commenters insisted the realized number on customer workloads would land closer to 2x. The tension to track is exactly that — the platform-level claim is plausibly true; the per-workload uplift is what enterprises should actually demand to see on their own data before signing capex.

The $7.8M-Per-Rack Capex Concentration Nobody's Pricing

The less-told story is the geometry of who is paying for this ramp. Per-rack Vera Rubin cost has been reported at roughly $7.8 million ^[11], and the ramp involves 350-plus factories across 30 countries with 150-plus ecosystem partners in Taiwan alone ^[5]. The buyers are concentrated: a handful of frontier labs (Anthropic, OpenAI), one or two AI-cloud hyperscalers (CoreWeave), a small set of sovereign and enterprise customers, and the same big-three U.S. hyperscalers committing in parallel. That concentration is what underwrites NVIDIA's $1T sales line ^[4], but it is also the risk: if any one anchor customer slows orders, the supply tail snaps back fast.

The market is already pricing parts of this. HPE shares jumped more than 7 percent premarket to a record high on the ProLiant DL394 Gen12 launch ^[7], while datacenter cooling suppliers sold off on perceptions of better thermals — even though, as the CNBC factory tour showed, the NVL72 is still 100 percent liquid-cooled with 45°C water and roughly 2x the absolute power draw of Grace Blackwell despite 10x better performance-per-watt. The second-order effects are already visible: an OEM repositioning around Vera-only servers, a power-and-cooling reshuffle, and an inventory cycle for the prior Grace Blackwell generation that just lost its place on the roadmap. Dan Nystedt's supply-chain note on TSMC's N3P process and CoWoS-L tape-out timing surfaced on X in the same window — a useful reminder that the production ramp lives or dies on advanced packaging capacity, not on demand.

Historical Context

2024

Grace Blackwell entered production, setting the single-system compute density benchmark that Vera Rubin is now designed to dethrone.

2025-03-18

At GTC 2025 Jensen Huang first publicly placed Blackwell Ultra, Vera Rubin, and Feynman on the long-term roadmap, naming Rubin a year in advance.

2026-01

At CES 2026, Huang announced the Vera Rubin NVL72 chips were already in full production, pulling the timeline ahead of expectations.

2026-03-16

GTC 2026 unveiled the full seven-chip Vera Rubin platform and the 88-core Vera CPU, formally framing the platform as the engine for agentic AI factories.

2026-05-31

Vera Rubin ramps into full production worldwide across 350+ factories in 30 countries, with HPE simultaneously announcing the first Vera-powered OEM server.

2027

Rubin Ultra is targeted for H2 2027 with a four-GPU package delivering up to 100 petaflops, the next step on the same roadmap Vera Rubin opens.

Power Map

Key Players

Subject

NVIDIA Vera Rubin Platform Enters Production

NVIDIA

Platform vendor pivoting from GPU supplier to full-stack AI-factory integrator; its bundled rack of CPU, GPU, NVLink, networking, DPU and LPU directly absorbs spend that historically flowed to Intel, AMD, and Broadcom.

Anthropic

Frontier lab using Vera Rubin to scale Claude's agentic workloads; its public commitment validates the platform's design point of complex reasoning loops over single-shot inference.

OpenAI

Frontier lab committing to run more powerful models and agents on Vera Rubin at hundreds-of-millions-of-users scale; a marquee anchor for the production ramp.

CoreWeave

AI-cloud hyperscaler adopting Vera Rubin to build million-GPU AI factories; the primary capacity-provider channel through which other AI startups will rent the platform.

HPE

First OEM to ship a Vera-only server (ProLiant DL394 Gen12), opening a CPU-only path to Vera outside the full NVL72 rack; the announcement sent HPE shares to a record high.

NYSE Group

Non-AI enterprise reference customer scaling its 1.1-trillion-messages-per-day trading capacity on Vera CPUs alongside Redpanda and HPE, signaling Vera's reach beyond frontier labs into latency-sensitive finance.

Fact Check

11 cited

Source Articles

Top 5

THE SIGNAL.

Analysts

"Frames the launch as more than a product cycle: 'The agentic AI inflection point has arrived with Vera Rubin kicking off the greatest infrastructure buildout in history.'"

Jensen Huang

Founder & CEO, NVIDIA

"Argues agentic Claude usage is what forces the infrastructure change: 'Enterprises and developers are using Claude for increasingly complex reasoning, agentic workflows and mission-critical decisions. That demands infrastructure that can keep pace.'"

Dario Amodei

CEO, Anthropic

"Reads Vera Rubin primarily as latency and reliability headroom: 'With NVIDIA Vera Rubin, we'll run more powerful models and agents at massive scale and deliver faster, more reliable systems to hundreds of millions of people.'"

Sam Altman

CEO, OpenAI

"Reads the integration of Groq-style LPUs alongside Rubin GPUs as a tacit concession: 'Inference is not a one-size-fits-all... they're quietly acknowledging that their GPUs are not the answer for every single workload.'"

Matt Kimball

Analyst, Moor Insights & Strategy

"After running independent Linux benchmarks, reports that 'Vera led the tested CPU field, delivering a 1.5x overall performance advantage compared with a latest-generation 128-core x86 processor,' making it the most credible non-x86 server CPU yet measured."

Michael Larabel

Founder & Principal Author, Phoronix

"Calls Vera Rubin 'a fundamental shift that Nvidia is making to position the company for leadership in agentic AI' — i.e. a strategic repositioning, not an incremental refresh."

Karl Freund

Analyst, Cambrian AI Research

The Crowd

"Vera Rubin is in full production. We just kicked off the next generation of AI infrastructure with the NVIDIA Rubin platform, bringing together six new chips to deliver one AI supercomputer built for AI at scale. Here are the top 5 things to know"

@@nvidia3276

"NVIDIA Vera Rubin is opening the next frontier of AI. #NVIDIAGTC news: The Vera Rubin platform's seven chips are now in full production to scale the world's largest AI factories. Vera CPU, Rubin GPU, NVLink 6, ConnectX-9, BlueField-4, Spectrum-6 and Groq 3 work together as one"

@@nvidianewsroom361

"Nvidia's next-gen Rubin GPU and Vera CPU chips will finish tape-out at TSMC in June and begin trial production, with sample chips in September, earliest, media report, citing unnamed supply chain sources. TSMC will make Rubin on N3P and CoWoS-L advanced packaging. Mass production"

@@dnystedt249

"NVIDIA's Vera-Rubin is 10x in energy efficienct than Blackwell"

@tech_1729127

Broadcast

Deconstructing Nvidia's Vera Rubin — The Successor To Blackwell That's 10x More Efficient

Nvidia's Computex 2026 Keynote in Less Than 12 Minutes

2026 Best Choice Award-Golden Award: NVIDIA Vera Rubin NVL72 - The Peak of AI Supercomputing