Why a GPU is now a two-ton machine that draws 120 kilowatts
The core of Huang's reframe is that AI workloads no longer fit on a chip — they fit on a rack. NVIDIA is explicitly moving from chip-scale design to rack-scale design and on to infrastructure-scale design, co-designing the GPU, CPU, memory, networking, storage, power, cooling, software, racks, pods and entire data centers as one system [1]. The GB200 NVL72 is the physical proof: 72 Blackwell GPUs and 36 Grace CPUs wired together over a liquid-cooled NVLink domain so the whole rack behaves as a single, unified AI accelerator rather than 72 independent cards [3].
The specs make the 'computer, not chip' claim concrete. A single NVL72 rack delivers 1.44 exaflops of FP4 compute, 13.4 TB of unified GPU memory and 130 TB/s of NVLink bandwidth, weighs roughly 1.36 metric tons and can draw up to 120 kW in a non-standard 48U OCP rack [3]. Huang has described the machine as a two-ton system with roughly half a million components priced around $4 million apiece [2]. The successor, Vera Rubin, pushes this to about 3.6 exaflops per rack-scale system and 1.3 million components per system, with Taiwan assembly time cut from two hours to five minutes [1]. Vera itself is built for agents rather than human users — a CPU designed around agentic, not interactive, workloads.



