TECH

OpenAI GPT-5.5 release

83+

Signals

Strategic Overview

01.
OpenAI released GPT-5.5 on April 23, 2026, rolling it out to Plus, Pro, Business, and Enterprise users across ChatGPT and Codex, with API access coming soon and a higher-performance Pro variant for harder work.
02.
The model is explicitly positioned as an agent runtime rather than a chat assistant, focused on agentic coding, computer use, knowledge work, and scientific research, and matching GPT-5.4's per-token latency while using fewer tokens.
03.
GPT-5.5 posts state-of-the-art results on Terminal-Bench 2.0 (82.7%), SWE-Bench Pro (58.6%), OSWorld-Verified (78.7%), and GDPval (84.9%), arriving just six weeks after GPT-5.4 as frontier labs accelerate their release cadence.
04.
API pricing doubles to $5 per 1M input tokens and $30 per 1M output tokens (Pro at $30/$180) with a 1M-token context window, while Codex users get a 400K context window.

Agent-first runtime: GPT-5.5 as plumbing for OpenAI's super app

The most substantive shift in GPT-5.5 is not a benchmark number — it is OpenAI's explicit reframing of the model from chat assistant to agent runtime. OpenAI and its evaluators describe GPT-5.5 as 'a new class of intelligence for real work' built around agentic coding, computer use, knowledge work, and scientific research, with the headline evals (Terminal-Bench 2.0, SWE-Bench Pro, OSWorld-Verified, GDPval) all chosen to measure multi-step tool use rather than single-turn reasoning.

That framing aligns with the strategy Greg Brockman and Sam Altman are publicly telegraphing: GPT-5.5 is plumbing for a 'super app' spanning ChatGPT, Codex, and an AI browser. Brockman's own framing — 'This model is a real step forward towards the kind of computing that we expect in the future — but it is one step, and we expect to see many in the future' — is less a product pitch than a roadmap disclosure. The 400K-token context window inside Codex and the 1M-token API window are not chat features; they are agent-session budgets. Viewed through that lens, the 'incremental' critique that dominated Reddit misses the architectural point: OpenAI is not trying to ship a smarter chatbot, it is trying to ship the runtime its future products will execute on top of.

The 2x price, 40% fewer tokens paradox: who actually benefits

GPT-5.5's pricing story is genuinely contested. Headline API rates double to $5 per million input tokens and $30 per million output tokens, with the Pro variant landing at $30/$180 — a move Reddit threads described bluntly as telling users to 'expect about half as much usage as you got from 5.4.' The Decoder estimates the net effective increase at roughly 20% once GPT-5.5's lower token consumption is factored in, because the model reportedly emits about 40% fewer tokens per task than GPT-5.4.

That math reshapes the winners and losers. Developers running chat-style single-turn workloads see a straight price hike with little offset. But agentic workloads — long tool-using loops where output tokens dominate cost — can come out ahead if the efficiency gains hold: a Reddit defender captured this neatly, arguing that 'it's giving you better results with significantly fewer tokens. That's the real deal.' Enterprise buyers like Box and Harvey, who care about accuracy per completed task rather than tokens per call, report the strongest ROI, with Box measuring a 10-point jump in agent accuracy (77% vs 67%). The pricing is, in other words, a deliberate filter: it nudges the model toward the agent-first workloads OpenAI is optimizing for, and away from casual API usage where cheaper competitors already win.

Enterprise evaluators corroborate the gains Reddit wouldn't concede

While consumer-facing sentiment on Reddit and parts of X.com trended toward 'incremental' — with the top threads echoing the joke that this could not possibly be the rumored 'Spud' and that the real step change is still being held back — the enterprise evaluator set tells a different and more consistent story. Harvey reports 91.7% on BigLaw Bench, with 43% of tasks scoring perfect and 87% above 0.80, and describes 'improvement in legal reasoning, organizational structure, and audience calibration.' Box's Complex Work Eval shows GPT-5.5 beating GPT-5.4 by a full ten points on agent accuracy. Bank of New York, testing in a regulated financial context against Anthropic's models, pointed specifically to response quality as 'really important for a highly regulated institution.'

These evaluators have no incentive to echo OpenAI's marketing, and they are measuring the exact dimensions — domain reasoning, multi-step agent workflows, hallucination resistance in regulated contexts — that determine whether frontier models actually earn enterprise seats. OpenAI's own internal data reinforces the pattern: 85% of OpenAI employees now use Codex weekly, and the finance team processed 24,771 K-1 tax forms totaling over 71,000 pages, cutting the workflow by two weeks. The gap between the enterprise evaluator consensus and the developer-community shrug is itself the story — GPT-5.5 is a bigger deal for the people paying enterprise prices than for the people replacing one chat app with another.

The Claude Mythos problem: shipping beats gating, but only for now

GPT-5.5's benchmark sweep is real but qualified. Against generally available models, OpenAI reclaims the top of Artificial Analysis's Intelligence Index by three points, breaking a three-way tie with Anthropic and Google. Against Anthropic's gated Claude Mythos Preview, however, the story reverses: Mythos wins six of nine directly comparable benchmarks, and even within publicly shipped models, Claude Opus 4.7 still edges GPT-5.5 Pro on Humanity's Last Exam without tools (46.9% vs 43.1%).

That asymmetry shapes the competitive narrative in two directions at once. On one hand, OpenAI wins the 'shipped' race — GPT-5.5 is available today to Plus, Pro, Business, and Enterprise customers while Mythos remains gated — and enterprise procurement runs on what is deployable, not what is previewed. On the other hand, developer-community reaction crystallizes around the suspicion that Anthropic is sandbagging the real frontier; one recurring framing treats GPT-5.5 as an OpenAI comeback that only holds until Anthropic chooses to ship. OpenAI's six-week turnaround from GPT-5.4 is a direct response: the iteration cadence is now the moat, because any single model's lead is measured in weeks rather than quarters.

The hallucination paradox: benchmark king, honesty regression

The most uncomfortable finding in the launch coverage is that GPT-5.5's intelligence gains appear to come with a significant honesty regression. The Decoder's independent analysis reports a hallucination rate of 86% for GPT-5.5 versus 36% for Claude Opus 4.7 — a gap large enough that the publication concluded: 'Knowing when to pass or admit uncertainty is a trait you want in an AI model. By that measure, GPT-5.5 looks more like a step backward.'

That number sits in direct tension with Bank of New York's emphasis on response quality in regulated contexts, and it is precisely why OpenAI says GPT-5.5 underwent extensive third-party red teaming for cyber and bio risks and why API access is being delayed while external safeguards are hardened. For enterprise buyers, the practical implication is that GPT-5.5's wins on Terminal-Bench and SWE-Bench Pro — environments where correctness is checkable by executing code — do not automatically transfer to open-ended reasoning where the model is trusted to know the limits of its own knowledge. Expect the next several weeks of evaluation to focus not on whether GPT-5.5 is smarter, but on whether it is more confidently wrong, and whether the agentic frame that made the benchmarks possible also makes the hallucinations harder to catch.

Historical Context

2025-08-07

OpenAI launched the original GPT-5, emphasizing faster responses, improved coding, and reduced hallucinations over GPT-4 class models — the foundation for the GPT-5 family lineage.

2025-12-11

GPT-5.2 released with Instant, Thinking, and Pro modes plus a Codex variant, continuing the GPT-5 family's rapid iterative releases.

2026-02-05

GPT-5.3-Codex released, extending the coding-optimized lineage that GPT-5.5 now consolidates into the mainline Codex experience.

2026-03-05

GPT-5.4 launched as the prior frontier model for professional work, alongside GPT-5.4 mini and nano variants — the immediate benchmark GPT-5.5 is pitched against.

2026-04-23

GPT-5.5 (internal codename 'Spud') released, billed as the first fully retrained base model since GPT-4.5 and positioned explicitly as an agent runtime rather than a chat assistant.

Power Map

Key Players

Subject

OpenAI GPT-5.5 release

OpenAI

Released GPT-5.5 on April 23, 2026 across ChatGPT and Codex with a Pro tier, framing it as a fully retrained base model and agent runtime.

Anthropic

Principal competitor whose Claude Opus 4.7 and gated Claude Mythos Preview are the direct benchmark targets, with Mythos winning 6 of 9 head-to-head benchmarks against GPT-5.5.

Google (Gemini 3.1 Pro)

Third frontier competitor benchmarked against GPT-5.5 on agentic and reasoning evaluations as the three-way leadership race tightens.

Bank of New York

Regulated enterprise customer actively testing GPT-5.5 against Anthropic's models, emphasizing response quality and hallucination resistance.

Harvey

Legal AI evaluator that benchmarked GPT-5.5 on BigLaw Bench and reported gains in legal reasoning, organizational structure, and audience calibration.

Box

Enterprise content platform that measured a 10-point agent accuracy lead for GPT-5.5 over GPT-5.4 on complex agentic workflows.

THE SIGNAL.

Analysts

"Brockman framed GPT-5.5 as one concrete step toward a new computing paradigm rather than an endpoint, signaling OpenAI's super app trajectory: 'This model is a real step forward towards the kind of computing that we expect in the future — but it is one step, and we expect to see many in the future.'"

Greg Brockman

President, OpenAI

"Pachocki pushed back on the perception that frontier AI has stalled, implicitly positioning GPT-5.5 as a reacceleration: 'I would say, like, I think the last two years have been surprisingly slow.'"

Jakub Pachocki

Chief Scientist, OpenAI

"BNY singled out response quality rather than raw benchmark wins as the most practical gain for regulated enterprise workflows: 'What we're actually seeing from 5.5, that I think is really important for a highly regulated institution, is the response quality.'"

Bank of New York evaluator

Regulated financial institution testing GPT-5.5

"Box measured material gains on its Complex Work Eval: 'GPT 5.5 achieved a 10-percentage-point lead in overall agent accuracy, scoring 77% against 67%,' with particular strength on report drafting and expert review workflows."

Box evaluation team

Box (enterprise content cloud)

"The Decoder warned that benchmark dominance masks a regression on honesty, citing a sharply higher hallucination rate than Anthropic's model: 'Knowing when to pass or admit uncertainty is a trait you want in an AI model. By that measure, GPT-5.5 looks more like a step backward.'"

The Decoder

AI industry analysis publication

"Harvey judged GPT-5.5 as a clear step up for legal work, reporting 91.7% on BigLaw Bench and noting 'GPT-5.5 shows improvement in legal reasoning, organizational structure, and audience calibration.'"

Harvey evaluators

Harvey (legal AI) research team

The Crowd

"Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex."

@@OpenAI0

"GPT-5.5 takes OpenAI back to the clear number one in AI. OpenAI's new model tops the Artificial Analysis Intelligence Index by 3 points, breaking a three-way tie with Anthropic and Google. OpenAI gave us pre-release access to test all five reasoning effort levels: xhigh, high, ..."

@@ArtificialAnlys0

"GPT 5.5 drops Thursday. I haven't been this excited for a model release in a while. I used to be an OpenAI only user. Then they fell behind. Now Spud could outperform Claude Opus 4.7. OpenAI might just retake the frontier. Testing live the second it drops. BridgeBench"

@@bridgemindai0

"Introducing GPT-5.5 | OpenAI"

@u/Gerstlauer839

Broadcast

Introducing GPT-5.5

OpenAI's GPT 5.5 is wild...

GPT-5.5 in 7 Minutes