Flash at Half the Price: The Inference-Economics Reset
The headline of I/O 2026 isn't a new Pro model — it's the fact that the Flash tier is now the frontier. Sundar Pichai framed Gemini 3.5 Flash as delivering 'frontier-level capabilities at less than half the price of comparable frontier models' [1], and Google's own benchmarks claim it outperforms Gemini 3.1 Pro on most tasks including coding and agentic work while running roughly four times faster on output tokens per second [2]. Demis Hassabis added the in-product number that matters for builders: 800 tokens per second, and 12x faster inside Antigravity. For the first time in the post-GPT-4 era, the cheapest tier of a hyperscaler's lineup is being positioned as Pareto-superior to last quarter's flagship.
The business case is sharpened in Google's own enterprise pitch, which claims Flash can cut industry-wide enterprise AI spend by more than $1B per year [3]. That number only makes sense in the context of Google's infrastructure scale — its API platform now processes roughly 3.2 quadrillion tokens per month, up 7x year over year, and around 19 billion tokens per minute [4]. The implicit message to OpenAI and Anthropic is structural rather than rhetorical: if frontier reasoning is now a Flash-tier good, seat-based and premium-token pricing become harder to defend. Bloomberg Intelligence's 2026 outlook makes the same point from the buyer side, projecting that agentic AI will push software pricing 'from seat-based subscriptions toward usage- and outcome-based models' [5], which directly favors whoever owns the cheapest competent inference.



