An Image Model That Actually Thinks — And Why That Changes the Product Category

The headline feature of gpt-image-2 is not resolution or speed — it is that ChatGPT Images 2.0 is, per OpenAI, the company's 'first image model with thinking capabilities.' In thinking mode the model reasons before it generates, spending variable compute on the plan, and crucially can pull live information from the web mid-generation and self-verify the output before returning it. The Decoder captures the qualitative shift bluntly: the model 'thinks before it generates,' and 'can even search the web during that process.' That is not a diffusion refinement; it is a different loop.
The downstream consequence is that gpt-image-2 behaves less like an image renderer and more like a visual agent. OpenAI's own framing — 'images are a language, not decoration' and the model 'moves image generation from rendering to strategic design' — is doing real work here. Character and object continuity across up to eight outputs, rendering of dense UI mockups with small text and iconography, and multilingual typesetting in non-Latin scripts are all downstream of having a reasoning step in the pipeline. The same architectural story OpenAI told with o-series reasoning models is now stapled onto pixels.



