TECH

AI Chip Architecture Landscape: CPUs Reclaim Relevance Alongside GPUs and TPUs

31+

Signals

Strategic Overview

01.
Nvidia unveiled the Vera Rubin platform at GTC 2026, a six-chip AI supercomputer architecture pairing the Vera CPU (88 custom Armv9.2 cores) with the Rubin GPU (50 petaflops), delivering a 10x reduction in inference token cost versus Blackwell.
02.
Arm launched its first in-house silicon product — the AGI CPU — a 136-core data center chip on 3nm process technology targeting agentic AI workloads, with Meta as lead partner. This marks the first time in Arm’s history that it has produced its own silicon.
03.
CPU-side tool processing accounts for up to 90.6% of total latency in agentic AI workloads, driving a projected shift to 7:1 CPU-to-GPU ratios in agentic data centers and reversing training-era GPU dominance.
04.
Five hardware architectures now power AI — CPU, GPU, TPU, NPU, and LPU — each making fundamentally different tradeoffs between flexibility, parallelism, and memory access, while emerging approaches like thermodynamic computing hint at further diversification.

Deep Analysis

Why This Matters

The AI chip landscape is undergoing a structural shift that challenges the GPU-centric orthodoxy of the past decade. For years, the narrative was simple: GPUs dominated AI training and inference, with Nvidia commanding roughly 86% of the AI GPU market. But the rise of agentic AI — systems that autonomously use tools, browse the web, write code, and chain multi-step reasoning — is fundamentally changing where compute bottlenecks occur. Research from Georgia Tech and Intel found that CPU-side tool processing accounts for up to 90.6% of total latency in agentic workloads, meaning the GPU is idle most of the time while the CPU handles orchestration, API calls, and data marshaling.

This shift has profound implications for data center economics. Arm projects that CPU-to-GPU ratios in agentic data centers will climb to approximately 7:1, a dramatic inversion from the GPU-heavy ratios seen in training-optimized facilities. Both Nvidia and Arm have responded by placing CPUs at the center of their newest platform strategies — Nvidia with the 88-core Vera CPU in the Vera Rubin platform, and Arm with its historic first silicon product, the 136-core AGI CPU. The $52.92 billion AI chip market (projected to reach $295.56B by 2030) is no longer a one-architecture race; it is fragmenting into specialized lanes where CPUs, GPUs, TPUs, NPUs, and LPUs each serve distinct roles.

How It Works

Five hardware architectures now power AI workloads, each making fundamentally different tradeoffs. CPUs offer maximum flexibility with complex control logic and deep cache hierarchies, making them ideal for the sequential, branching logic of agentic tool use. GPUs provide massive parallelism through thousands of simpler cores optimized for matrix math, dominating training and batch inference. TPUs, exemplified by Google’s Ironwood (TPU v7), are purpose-built ASICs with systolic arrays hardwired for tensor operations — Ironwood delivers 7.4 TB/s HBM bandwidth per chip and scales to 9,216 chips per pod for 42.5 exaflops of compute. NPUs bring lightweight AI inference to edge devices with dedicated neural engines. LPUs, pioneered by Groq, take a fully deterministic compiler-scheduled approach with all model weights stored in on-chip SRAM, eliminating memory bandwidth bottlenecks for ultra-low-latency inference.

The emerging trend is heterogeneous computing — combining multiple architectures in a single system. Nvidia’s Vera Rubin platform exemplifies this with six codesigned chips including both CPUs and GPUs connected by 260 TB/s total bandwidth. Anyscale has demonstrated that CPU/GPU pipeline disaggregation can achieve an 8x reduction in GPU requirements by offloading appropriate work to CPUs. Meanwhile, entirely novel approaches are emerging: Normal Computing’s thermodynamic chips leverage inherent randomness in physical systems to compute probabilistic AI operations more efficiently than deterministic silicon, targeting the looming data center energy wall expected around 2030.

By The Numbers

The scale of the AI chip market and the performance leaps in new architectures underscore the magnitude of this shift. The global AI chip market was valued at $52.92B in 2024 and is projected to reach $295.56B by 2030 at a 33.2% CAGR. Nvidia holds approximately 86% of the AI GPU market, while Intel retains less than 1% of discrete AI accelerators but claims roughly 22% of broader data center AI revenue when CPUs are included. The data center CPU market alone is projected to reach $76.6B by 2029 with 34.9% growth acceleration. On the performance front, Nvidia’s Vera Rubin platform promises a 10x reduction in inference token cost versus Blackwell and requires 4x fewer GPUs for mixture-of-experts model training. Google’s Ironwood TPU pod delivers 42.5 exaflops — over 24x the compute power of the El Capitan supercomputer. Modern AI chips can exceed $500 million in development costs before production begins, and 75 AI chip startups collectively raised over $2B in Q1 2025 alone. Arm’s AGI CPU packs 136 Neoverse V3 cores on a 3nm process, and Arm claims its CPU-centric data center approach could deliver up to $10 billion in capital expenditure savings per gigawatt of capacity.

Impacts & What’s Next

The most immediate impact is the reshaping of data center purchasing decisions. As agentic AI workloads grow, enterprises and hyperscalers will need to rethink their hardware mix. The projected 7:1 CPU-to-GPU ratio in agentic data centers represents a massive revenue shift — Arm is targeting a $15 billion revenue opportunity by FY2031 from this transition alone. Meta’s role as lead partner for the Arm AGI CPU signals that the largest AI infrastructure buyers are already diversifying away from GPU-only strategies. OpenAI’s Sachin Katti confirmed that the Arm AGI CPU will strengthen their orchestration layer, validating the CPU’s role in production AI systems at scale.

Looking further ahead, three trends will shape the next phase. First, inference is overtaking training as the dominant workload — Google’s Ironwood TPU is explicitly branded for "the age of inference," and Nvidia’s 10x inference cost reduction with Vera Rubin targets the same shift. Second, the energy wall looming around 2030 is driving investment in radically different architectures: Normal Computing’s $50M raise for thermodynamic chips represents a bet that the industry cannot simply acquire more energy to sustain current growth trajectories. Third, Arm’s move into direct silicon production risks alienating its existing licensees who now face competition from their own IP provider, potentially accelerating the trend of companies designing fully custom chips. The AI chip landscape is evolving from a GPU monoculture into a diverse ecosystem where workload characteristics — not brand loyalty — determine architecture choice.

The Bigger Picture

The resurgence of CPUs in AI reflects a broader maturation of the industry. During the training-dominated era, raw parallel compute was the scarce resource, and GPUs were the obvious answer. But as AI systems become more complex — chaining dozens of tool calls, managing state across long conversations, orchestrating multi-agent workflows — the workload profile increasingly resembles traditional server computing with its emphasis on branching logic, low-latency I/O, and diverse instruction streams. This is precisely what CPUs were designed for over decades of optimization. The fact that both Nvidia and Arm have simultaneously pivoted toward CPU-centric strategies suggests this is not a marketing narrative but a genuine architectural requirement.

At the same time, the proliferation of architectures — GPUs, TPUs, NPUs, LPUs, and now thermodynamic chips — signals that AI hardware is following the same diversification path that computing hardware has always followed as markets mature. Early general-purpose solutions give way to specialized designs optimized for specific workloads. Social media discourse reinforces this trajectory: a visual explainer comparing all five architectures garnered thousands of engagements, and Arm’s stock surged 18% on the AGI CPU announcement as investors recognized the CPU opportunity. The competitive dynamics are intensifying across every layer — Nvidia defending its CUDA ecosystem moat, Google deepening its TPU vertical integration, Arm expanding from IP licensing to silicon sales, and startups exploring physics-based computing paradigms. The companies that will thrive are those building platforms that can orchestrate across multiple chip types rather than betting on a single architecture.

Historical Context

2020-04-01

Published foundational report on AI chips noting that specialized AI chips are 'tens to thousands of times more cost-effective' than general-purpose CPUs for AI workloads.

2025-04-09

Announced Ironwood (TPU v7), its seventh-generation TPU optimized for inference, delivering 7.4 TB/s HBM bandwidth per chip and scaling to 42.5 exaflops per pod.

2026-01-01

At CES 2026, announced the Ryzen AI 400 series with upgraded NPUs for local AI tasks and next-gen Turin data center chips.

2026-03-13

At GTC 2026, unveiled the Vera Rubin platform with six new chips including the Vera CPU and Rubin GPU, marking CPUs taking center stage alongside GPUs in AI supercomputer design.

2026-03-24

Launched the AGI CPU — its first in-house silicon product in company history — a 136-core data center chip for agentic AI workloads, with Meta as lead deployment partner.

2026-03-25

Raised $50M led by Samsung Catalyst to develop thermodynamic AI chips that leverage inherent randomness in physical systems for more energy-efficient AI computation.

Power Map

Key Players

Subject

AI Chip Architecture Landscape: CPUs Reclaim Relevance Alongside GPUs and TPUs

Nvidia

Dominant AI chip maker (~86% GPU market share) launching the Vera Rubin six-chip platform, reasserting CPUs as essential for agentic AI alongside GPUs.

Arm

Launched its first in-house silicon (AGI CPU) for data centers, targeting a $15B revenue opportunity by FY2031 in CPU-centric agentic AI infrastructure.

Google

Developed the Ironwood TPU v7, a purpose-built inference chip achieving 42.5 exaflops per pod, reinforcing the ASIC approach to AI hardware.

AMD

Primary Nvidia challenger with AI chip division projected at $5.6B in 2025 and MI450 Helios rack-scale systems launching Q3 2026.

Normal Computing

Startup developing thermodynamic AI chips to reduce energy use; raised $50M from Samsung Catalyst to tackle the AI energy crisis with fundamentally different hardware.

THE SIGNAL.

Analysts

"Positioned the Vera CPU as central to Nvidia’s AI platform strategy alongside the Rubin GPU, stating: "Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof. With our annual cadence of delivering a new generation of AI supercomputers — and extreme codesign across six new chips — Rubin takes a giant leap toward the next frontier of AI.""

Jensen Huang

Founder & CEO, Nvidia

"Sees CPU infrastructure as critical for orchestrating large-scale AI workloads: "The Arm AGI CPU will play an important role in our infrastructure as we scale, strengthening the orchestration layer that coordinates large-scale AI workloads and improving efficiency, performance, and bandwidth across the system.""

Sachin Katti

Head of Industrial Compute, OpenAI

"Argues the industry needs fundamentally different hardware to solve the AI energy crisis: "Data centers are expected to hit an energy wall around 2030, and most of the strategy now is to find new ways to acquire more energy—but our position is to solve the problem in terms of the hardware that we’re using.""

Faris Sbahi

CEO, Normal Computing

"Outlined three CPU use cases in agentic data centers: CPUs as head nodes managing GPU clusters, standalone CPU racks for tool execution, and AI factory-level orchestrators coordinating multi-chip systems."

Mohamed Awad

EVP, Cloud AI Business Unit, Arm

"Emphasizes that intelligence scales directly with compute, driving demand for ever more powerful chip architectures: "Intelligence scales with compute. When we add more compute, models get more capable, solve harder problems and make a bigger impact for people.""

Sam Altman

CEO, OpenAI

The Crowd

"CPU vs GPU vs TPU vs NPU vs LPU, explained visually: 5 hardware architectures power AI today. Each one makes a fundamentally different tradeoff between flexibility, parallelism, and memory access."

@@_avichawla2200

"These guys literally burned the transformer architecture into their silicon. And built the fastest chip of the world of all time for transformers architecture. 500,000 tokens per second with Llama 70B throughput."

@@rohanpaul_ai2800

"Arm is up +18% (to B) after announcing its energy-efficient AGI CPU. CEO Rene Haas makes the case for why AI agents necessitate these chips: AI agents use >15x more tokens than humans, AI agents do general asynchronous tasks that need CPUs, a 1GW data center needs efficient CPU orchestration"

@@bearlyai153

Broadcast

CPU vs GPU vs TPU vs DPU vs QPU

How Nvidia GPUs Compare To Google's And Amazon's AI Chips

TPUs Are BETTER. But Why No One Uses Them?