The Rationing Era Arrives at Your Subscription
The most immediate symptom of the AI compute crunch is something paying customers feel directly: their AI tools have started saying no. Heavy Claude users in late March 2026 began reporting they were burning through five-hour usage allotments in just 20 minutes during peak hours, and Anthropic explicitly tightened limits during weekday peaks (5am-11am PT / 1pm-7pm GMT). GitHub introduced new Copilot caps on April 10, 2026, citing 'rapid growth, high concurrency, and intensive usage,' and OpenAI shut down Sora to redirect capacity toward Codex as that product scaled to roughly 4 million weekly developers. Anthropic's API availability has fallen to 98.95%, far below the 99.99% industry standard, and the gap is large enough that enterprise customers are reportedly churning to OpenAI in search of more reliable uptime.
The mechanism behind this is best articulated by Lennart Heim, Epoch AI cofounder and former RAND compute researcher: 'Using AI 10 times more heavily costs the provider roughly 10 times more money.' Because cost-to-serve scales almost linearly with usage but subscription revenue is fixed, the rational response when capacity is tight is to rate-limit rather than reprice. As Heim put it, 'these companies prefer to rate limit, so everybody gets the experience, rather than raise prices.' That choice has a side effect: the price signal that would normally clear a shortage is suppressed, which means the imbalance can persist for as long as the labs are willing to absorb the goodwill cost. Bank of America analysts now expect AI compute demand to outstrip supply through at least 2029, suggesting this rationing posture is not a quarter-long blip but the default operating mode of the next several years.



