Google unveils 8th-generation TPUs (TPU 8t and TPU 8i) at Cloud Next
TECH

Google unveils 8th-generation TPUs (TPU 8t and TPU 8i) at Cloud Next

41+
Signals

Strategic Overview

  • 01.
    At Google Cloud Next 2026, Google introduced the eighth generation of its Tensor Processing Unit as two purpose-built chips: TPU 8t for training and TPU 8i for inference, co-designed with Google DeepMind.
  • 02.
    A single TPU 8t superpod scales to 9,600 chips with two petabytes of shared high-bandwidth memory, while the Virgo network stitches superpods into training clusters of more than one million TPU chips.
  • 03.
    TPU 8t delivers up to 2.8x better price/performance than Ironwood for training, TPU 8i delivers 80% better performance-per-dollar for LLM inference, and both chips roughly double performance-per-watt versus the previous generation.
  • 04.
    Both chips will be generally available to Google Cloud customers later this year, and Google separately announced a partnership to deploy Nvidia's Vera Rubin chips on Google Cloud alongside the new TPUs.

Deep Analysis

The End of the General-Purpose AI Chip

For twelve years, every TPU generation was a single part number trying to be good at both training and inference. TPU 8 ends that. Google split the line into TPU 8t, engineered to "reduce the frontier model development cycle from months to weeks," and TPU 8i, a serving chip "optimized for post-training and high-concurrency reasoning." The two chips share a fabrication lineage but diverge at the level that matters most: memory hierarchy and network topology.

TPU 8t is built around shared high-bandwidth memory at pod scale — 9,600 chips pooling two petabytes of HBM so that trillion-parameter training runs can treat the superpod as one logical accelerator. TPU 8i goes the other direction: Google packed it with "our highest on-chip SRAM, a new Collectives Acceleration Engine (CAE), and a new serving-optimized network topology called Boardfly." SRAM favors the short, bursty memory accesses of token generation; CAE accelerates the all-reduce operations that dominate multi-chip inference; Boardfly is tuned for many small, latency-sensitive serving groups rather than one giant training fabric. Hyperframe Research's read of this is blunt: the era of the general-purpose AI accelerator is over, and "specialized TPU 8t and 8i architectures signal the end of general-purpose silicon." That thesis, if right, reshapes how every hyperscaler plans its next chip.

Virgo and the Million-Chip Cluster: Scale as a Moat

The more audacious claim buried in the technical deep dive is not about any single chip — it is about the fabric between them. Google's Virgo network now scales "to more than 1 million TPU chips in a single training cluster," connected at 47 Pb/s of bisectional bandwidth. A single TPU 8t superpod already delivers 121 ExaFlops of FP4 compute over its 9,600-chip unified-memory domain; Virgo then stitches superpods together across sites.

That topology is what analysts keep pointing to as the structural advantage. Hyperframe Research estimates that "Google's unified memory pool within a TPU 8t superpod is roughly two orders of magnitude larger than NVL72" — Nvidia's current flagship shared-memory domain. In plain terms: the largest coherent piece of silicon you can treat as one GPU on Google Cloud is about 100x larger than the largest coherent piece you can treat as one GPU anywhere else. For frontier labs training models whose weights no longer fit on a single pod, that difference turns from a nice-to-have into the gating constraint on how fast they can iterate. The TPU 8 launch is really a Virgo launch with a chip attached.

Anthropic's 3.5-Gigawatt Vote of Confidence

The single most concrete signal that TPU 8 is commercially real sits in Anthropic's updated Google partnership. Anthropic pre-booked 3.5 gigawatts of TPU compute starting in 2027 — not a trial footprint, not a co-marketing arrangement, but a multi-year capacity commitment at data-center scale. For context, 3.5 GW is in the same order of magnitude as the total electrical draw of a mid-sized US city, committed by a single AI lab to a single silicon vendor.

That matters because Anthropic had a choice. As one of the few labs with frontier training demand large enough to negotiate on price and architecture, its decision to lock in TPU capacity years in advance is a statement about total cost of ownership, not about 2026 benchmarks. It also partially answers the question every CFO at every other AI-serving company is quietly asking: can TPU economics actually beat Nvidia over a multi-year commitment once inference volume dominates the bill? Anthropic's answer, in gigawatts, is yes. If even one more frontier lab follows, the competitive gravity around AI infrastructure shifts in a way no per-chip benchmark sheet can convey.

Competing With Nvidia While Hosting Nvidia

The most underreported twist of the announcement was not the TPU numbers but the simultaneous Nvidia deal. At the same event, "Google and Nvidia also announced a partnership to deploy the most advanced Vera Rubin chips on Google Cloud." Google is now pitching customers both the argument that TPU 8i delivers 80% better performance-per-dollar for LLM inference than Ironwood, and the option to rent Nvidia's newest silicon on the same cloud — whichever the customer prefers.

This is a careful commercial posture, not a contradiction. Google Cloud's job is to capture AI spend regardless of who wins the silicon war; Google's TPU team's job is to make sure the in-house answer has the better long-run cost curve. Chip analyst Patrick Moorhead captured the historical humility in this position: he "predicted Google's TPU could be problematic for Nvidia back in 2016, though this forecast 'didn't exactly hold up to the test of time.'" A decade later, TPU is still not dethroning Nvidia in the open market — but it does not need to. If TPU 8 wins the workloads Google cares about most (its own models, DeepMind's training runs, and anchor tenants like Anthropic) while Vera Rubin keeps everyone else renting GPUs from Google Cloud, the strategic outcome for Alphabet is the same either way.

The reaction across the creator and enterprise-tech conversation mirrored this dual framing. On X, Google leadership and market-facing finance accounts emphasized the raw efficiency gains — performance-per-watt deltas and the bifurcated training/inference design — while developer-leaning YouTube commentary split between first-party explainers and louder independent framings that cast TPU 8 as a direct Nvidia confrontation. The angle dominating outside coverage was competitive; the angle dominating Google's own channels was architectural. Both can be true, which is precisely what the Vera Rubin partnership lets Google say without contradiction.

Goodput, Not FLOPs: The New Metric That Matters

Goodput, Not FLOPs: The New Metric That Matters
TPU 8t and TPU 8i efficiency gains over the previous-generation Ironwood TPU.

The quietest argument in the launch materials may end up being the most important. Hyperframe Research reduces the entire race to a one-liner: "The chip that serves the most useful tokens per megawatt is the chip that gets deployed at scale." Peak FLOPs per chip, the number vendors have marketed for a decade, is losing status as a figure of merit. What replaces it is cluster-level goodput — the rate at which a full system actually converts electricity into useful inference or training throughput, net of communication overhead, memory stalls, and scheduling losses.

TPU 8's spec sheet is organized around exactly that shift. The 80% better inference performance-per-dollar on TPU 8i and the 2.8x training price/performance on TPU 8t are cluster-level, not chip-level, claims. The 2x-ish performance-per-watt figures (124% for 8t, 117% for 8i over Ironwood) are explicit total-cost-of-compute arguments aimed at hyperscalers who are increasingly power-constrained, not silicon-constrained. Sundar Pichai's framing — "the conversation has gone from 'Can we build an agent?' to 'How do we manage thousands of them?'" — is the demand side of the same argument: agentic workloads generate vastly more inference tokens per user session than chat, and whoever serves those tokens cheapest per megawatt-hour wins the deployment. That is the yardstick TPU 8 is designed to win, and it is the yardstick everyone else now has to answer to.

Historical Context

2013-2018
Google began TPU development in 2013, deployed the chips internally in 2015, and opened TPU access to third-party cloud customers in 2018 — establishing a decade-plus head start on vertically-integrated AI silicon.
2024-05
Announced at Google I/O with a claimed 4.7x performance jump over TPU v5e; entered preview in October 2024 and signaled Google's shift toward efficiency-driven generations.
2025
Unveiled at Google Cloud Next '25 as the first TPU designed specifically for inference, shipping with 192 GB HBM per chip (6x Trillium) and up to 7.37 TB/s HBM bandwidth — the direct predecessor to TPU 8.
2026-04-22
Announced at Google Cloud Next 2026 as the first TPU generation to bifurcate training and inference into separate chips, with general availability to Cloud customers later this year.

Power Map

Key Players
Subject

Google unveils 8th-generation TPUs (TPU 8t and TPU 8i) at Cloud Next

GO

Google Cloud / Alphabet

Designed and will commercialize TPU 8t and TPU 8i through its AI Hypercomputer platform; using the launch to reposition Google Cloud as the default home for frontier training and high-concurrency agentic inference.

GO

Google DeepMind

Co-designed the chips with Google Cloud so the silicon matches DeepMind's frontier training and reasoning workloads, giving Google a tighter model-hardware feedback loop than most rivals have.

AN

Anthropic

Anchor frontier-lab customer; expanded its partnership with Google and pre-booked 3.5 gigawatts of TPU compute starting in 2027, validating TPU economics at frontier scale.

NV

Nvidia

Principal competitor in AI accelerators and, awkwardly, a simultaneous Google Cloud partner whose Vera Rubin chips will ship alongside TPU 8 on the same platform.

CI

Citadel Securities

Named reference customer for the new TPU generation, signaling interest in TPU 8 beyond AI labs and into latency-sensitive financial-services workloads.

AM

Amin Vahdat (SVP and Chief Technologist, AI and Infrastructure, Google)

Public face of the TPU 8 rollout and author of Google's official announcement framing the chips as the infrastructure for the agentic era.

THE SIGNAL.

Analysts

"Argues the general-purpose AI chip is fading and that what matters now is cluster-level goodput, not headline per-chip performance: "The real battleground for 2026 and 2027 will not be peak FLOPs per chip but rather cluster-level goodput.""

Hyperframe Research
Industry analyst firm

"On scale: "Google's unified memory pool within a TPU 8t superpod is roughly two orders of magnitude larger than NVL72," a gap the firm frames as the new structural moat in frontier training."

Hyperframe Research
Industry analyst firm

"On inference economics, the firm reduces the entire race to a single yardstick: "The chip that serves the most useful tokens per megawatt is the chip that gets deployed at scale.""

Hyperframe Research
Industry analyst firm

"Tempers the Nvidia-killer narrative — he "predicted Google's TPU could be problematic for Nvidia back in 2016, though this forecast 'didn't exactly hold up to the test of time,'" while still treating TPU 8 as a credible challenger."

Patrick Moorhead
Chip-market analyst, Moor Insights & Strategy

"Framed the Cloud Next 2026 keynote around why the silicon had to change: "The conversation has gone from 'Can we build an agent?' to 'How do we manage thousands of them?'""

Sundar Pichai
CEO, Alphabet / Google
The Crowd

"TPU 8t, optimized for training and TPU 8i, optimized for inference. Looking good!"

@@sundarpichai0

"Google unveils TPU 8t and TPU 8i at $GOOGL Cloud Next > TPU 8t is built for training frontier models > TPU 8i is built for inference, lower-latency agentic AI workloads, and more complex reasoning tasks > 8t delivers 124% more performance per watt and 8i 117% more than the [previous generation]"

@@wallstengine0

"The culmination of a decade of development, TPU 8t and TPU 8i are custom-engineered to power the next generation of supercomputing with efficiency and scale! Connect with our experts to build your future on our 8th generation TPUs"

@@GoogleCloudTech0
Broadcast
How Google's 8th Generation TPUs Power the Agentic Era

How Google's 8th Generation TPUs Power the Agentic Era

Google TPU 8t and TPU 8i: Purpose-built for the Agentic Era

Google TPU 8t and TPU 8i: Purpose-built for the Agentic Era

Google's TPU 8 Is A Direct Attack On NVIDIA - And It Rewrites AI Infrastructure Forever

Google's TPU 8 Is A Direct Attack On NVIDIA - And It Rewrites AI Infrastructure Forever