Why Inference Economics, Not Training, Are Driving the Bid
The Cerebras thesis is not 'a faster GPU' - it is that the AI industry's center of gravity has shifted from training models once to running them billions of times per day, and that this shift rewards a fundamentally different chip architecture. Cerebras' Wafer-Scale Engine keeps an entire silicon wafer intact as one giant chip with roughly 900,000 cores rather than dicing the wafer into hundreds of smaller dies as Nvidia does. The company markets the WSE as 58x larger than Nvidia's B200, with 19x more transistors, 250x more on-chip memory and 2,625x more memory bandwidth.
For inference decode - the autoregressive token-by-token step that dominates real-world LLM serving - that on-chip memory bandwidth is the binding constraint, which is why Meta runs Llama 4 inference on Cerebras and why OpenAI signed a multi-year compute deal. CEO Andrew Feldman frames the speed advantage on decode tasks as the explicit reason OpenAI committed. The IPO is effectively the public market's first chance to price the bet that inference, not training, becomes the larger and more durable share of AI compute spend over the next decade.




