Inference, not training, is where the money bleeds
Jalapeno is deliberately an inference chip, not a training chip — and that choice is the whole strategy. Training a frontier model is a periodic capital event; inference is the permanent, compounding cost of every ChatGPT reply, Codex completion, and API call OpenAI serves. By targeting inference, OpenAI is attacking the line item that scales with usage rather than with model releases. The economic stakes are stark: OpenAI is projected to burn through more than $200 billion in operating expenses through 2029 [5], and owning the silicon that runs its models is positioned as a primary lever to control that spend. Broadcom CEO Hock Tan put a number on the upside, citing roughly 50% cost savings versus typical AI GPUs [3], while OpenAI claims 'substantially better' performance per watt than current state-of-the-art in early testing [1].
The mechanism is specificity: a chip co-designed around OpenAI's own kernels, memory-movement patterns, and serving behavior can strip out the generality tax a merchant GPU pays to serve every customer's workload. Greg Brockman frames the broader logic as moving toward a 'compute-powered economy' where owning the full stack makes compute more abundant — a vertical-integration bet that only pays off if the chip actually runs OpenAI's specific models more cheaply than the Nvidia hardware it would otherwise rent.


