Why Google Split the TPU in Two
The most consequential hardware decision in Next '26 is not that Google shipped a new TPU — it's that Google shipped two of them. TPU 8t (codename 'Sunfish') is a training monster: 9,600 chips per superpod, 2 petabytes of HBM, 121 exaflops of FP4, 3x the processing power of last year's Ironwood, and double Ironwood's interchip bandwidth. TPU 8i is a different animal entirely — 1,152 chips per pod, 11.6 exaflops FP8, 3x more on-chip SRAM than Ironwood, 19.2 Tbps bidirectional scale-up bandwidth per chip, and 80% better performance per dollar for LLM inference. The architectural split says the quiet part out loud: training and inference have diverged enough as workloads that one chip can no longer be optimal for both.
The contrast with Nvidia is structural, not marketing. Nvidia's Rubin NVL72 caps its NVLink coherence domain at 576 accelerators; TPU 8t goes to 9,600. Google is trading per-chip peak performance for system-scale bandwidth, betting that frontier training runs are now bottlenecked by how many chips can share memory at speed, not by what a single chip can do. On the inference side, the 8i's enlarged SRAM cache and higher-capacity memory pool specifically target the memory-bandwidth wall that agent inference hits — long context windows, multi-step tool calls, and the kind of always-on reasoning fleets Pichai described when he said the industry has gone from 'Can we build an agent?' to 'How do we manage thousands of them?' The 8t/8i split is what 'managing thousands' looks like in silicon.



