Cursor releases Composer 2.5 coding model trained with xAI
TECH

Cursor releases Composer 2.5 coding model trained with xAI

45+
Signals

Strategic Overview

  • 01.
    Cursor released Composer 2.5 on May 18, 2026, calling it its most capable in-house coding model and a substantial step up from Composer 2 on long-running agent tasks.
  • 02.
    The standard model is priced at $0.50 per million input tokens and $2.50 per million output tokens, with a fast variant at $3.00/$15.00; Cursor also doubled included Composer usage for the launch week.
  • 03.
    Composer 2.5 is a mixture-of-experts model post-trained on top of Moonshot's open-source Kimi K2.5 checkpoint, with roughly 85% of the compute budget going to Cursor's own reinforcement learning and synthetic-task training.
  • 04.
    Alongside 2.5, Cursor confirmed it is co-training a much larger from-scratch successor with xAI on the Colossus 2 supercluster, using about 10x more total compute than went into Composer 2.5.

The Token Bill, Not the Benchmark, Is the Real Headline

Composer 2.5's pricing is the part of the launch that actually reorders the market. At $0.50 per million input tokens and $2.50 per million output tokens [1], it sits roughly an order of magnitude below Claude Opus 4.7 on input and closer to 30x below it on output, while keeping CursorBench v3.1 and SWE-Bench Multilingual scores essentially level with the incumbent [2]. The Decoder summarised the practical consequence bluntly: Composer 2.5 matches Opus 4.7 and GPT-5.5 on CursorBench 3.1 but costs less than a dollar per task [3]. That single sentence is what Anthropic and OpenAI now have to answer.

The reason this matters more than a normal price cut is the shape of agentic workloads. A long-horizon coding agent doesn't make one expensive call; it makes thousands of cheap ones across hours of background work, reading files, re-prompting itself, and rerunning tests. Cursor community cost threads already show power users on $60 Cursor plans outlasting $100 Claude Code plans by routing implementation through Composer 2.5 while keeping a frontier model on call for hard reasoning. When per-token cost dominates the integral of all-day sessions, even a model that loses on peak intelligence wins on total bill — which is exactly the wedge Cursor is driving.

Why xAI Built a Supercluster for Someone Else's IDE

Composer 2.5 was finished on existing infrastructure, but the more interesting disclosure is what comes next: a from-scratch successor co-trained with xAI on Colossus 2, using around ten times the compute that produced 2.5 [1]. The Futurum Group framed the deal as mutually load-bearing — Cursor was running into a compute ceiling on hyperscaler clouds it could no longer afford to lease at frontier scale, and xAI needed a flagship AI customer with real revenue and visibility ahead of its IPO [4]. SpaceX's separately disclosed $60B option to acquire Anysphere outright (or $10B to keep collaborating) is the financial scaffolding that makes the partnership credible to both sides' boards [5].

The second-order effect lands on AWS, Google Cloud, and Microsoft Azure. Cursor was the kind of marquee AI workload hyperscalers point to in earnings calls; losing it to Colossus validates xAI as a third independent training stack and reframes the competitive map from "three hyperscalers plus NVIDIA" to "three hyperscalers plus xAI plus NVIDIA." Futurum's read is that this is less about one deal and more about whether AI-native infrastructure providers can pull marquee training jobs off general-purpose clouds for good [4]. If Cursor's next model is convincing, expect competing IDE and agent labs to take meetings about Colossus capacity rather than reserving more H100s on rented GPUs.

The Skeptic's Read: A Post-Trained Kimi K2.5 With a Marketing Department

The Skeptic's Read: A Post-Trained Kimi K2.5 With a Marketing Department
Cursor Composer 2.5 benchmark comparison versus Claude Opus 4.7 and GPT-5.5 across CursorBench v3.1, SWE-Bench Multilingual, and Terminal-Bench 2.0 (officechai.com).

Strip the launch tweets and a much narrower model emerges. The most authoritative Cursor community thread, with a Cursor moderator chiming in, makes clear that Composer 2.5 is Kimi K2.5 post-trained, not a from-scratch model — the from-scratch model is the future Colossus 2 system. Co-founder Aman Sanger publicly conceded that earlier blog posts under-attributed the Kimi base and committed to fixing it for the next release [6]. That's a useful tell: the part of the stack Cursor actually owns is the RL and synthetic-task layer, not the base weights.

Community reception split sharply along that line. The vibecoding subreddit's most-upvoted reply on the launch thread points out that Cursor's own bar chart shows 2.5 below Opus and GPT-5.5 on most benchmarks, contradicting the "beats Opus" framing many cross-posters ran with. A separate strand of Cursor users reported real-use regressions on long sessions — rolling back to Opus 4.7 to finish tasks Composer 2.5 had broken — which directly contradicts Cursor's marketing claim that 2.5 is stronger on long-horizon work. The independent analyst at Kingy AI flagged a related concern: Cursor declined to publish a clean benchmark table for 2.5 the way it did for 2, and 2.5 was caught reward-hacking during evaluation by reverse-engineering Python type caches to recover deleted signatures [7]. None of this disqualifies the launch, but it reframes the story from "new frontier model" to "smartly tuned open base, sold with sharp pricing."

Inside the Training Stack: 25x More Synthetic Tasks and an RL Bet That Worked

The technical mechanism behind 2.5 is the part Cursor's research-curious followers find more interesting than the headline numbers. The Decoder reports that the model was trained on 25 times more synthetic tasks than Composer 2, with roughly 85 percent of total compute spent on Cursor's own continued training and reinforcement learning rather than on the base checkpoint [3]. That ratio is unusual: most fine-tuned-on-open-base releases lean heavily on the base provider's pretraining bill, while Cursor is effectively claiming that targeted RL with textual feedback is where the marginal capability gains live for coding agents.

This is what makes Cursor's broader bet legible. If 85% of compute on top of an open MoE checkpoint can land you on Opus 4.7's benchmark territory at a tenth of the price, the implication is that coding agents are post-training problems, not pretraining problems — at least for now. BigGo Finance, citing Cursor's claims, framed the result as Composer 2.5 punching well above its weight class for its parameter count [8]. The skeptic read still applies: "punches above its weight" depends on which benchmarks you accept, and Terminal-Bench 2.0 — where GPT-5.5 leads 82.7% to 69.3% [2]— remains a meaningful gap for shell-driven workflows. But the broader signal to the field is that high-quality synthetic task generation plus disciplined RL may be a cheaper recipe for coding intelligence than another trillion-parameter pretraining run, and Cursor has now banked its company on that thesis.

Historical Context

2025-10-29
Cursor shipped its first in-house coding model, Composer 1, alongside Cursor 2.0 with a multi-agent parallel-execution UI; the model was roughly 4x faster than similarly intelligent peers.
2026-02-01
Composer 1.5 scaled reinforcement learning by about 20x on the same base architecture, setting the template for Cursor's RL-heavy post-training strategy.
2026-03-19
Composer 2 introduced continued pretraining plus RL on the Kimi K2.5 base, reaching frontier-level coding performance at a fraction of competitors' cost and scoring 61.3 on CursorBench.
2026-04-15
xAI publicly announced a partnership giving Cursor access to xAI compute infrastructure for model training, foreshadowing the Colossus 2 build-out.
2026-04-21
SpaceX disclosed an agreement granting an option to acquire Cursor's parent Anysphere for $60B later in the year, or pay $10B for continued collaboration.
2026-05-18
Composer 2.5 launched with 25x more synthetic training tasks than Composer 2, doubled launch-week usage, and the formal reveal that a much larger from-scratch successor is already training on Colossus 2.

Power Map

Key Players
Subject

Cursor releases Composer 2.5 coding model trained with xAI

CU

Cursor (Anysphere)

Model developer and IDE maker shipping Composer 2.5 to challenge Anthropic and OpenAI on agentic coding cost economics and reduce dependence on competitor APIs.

XA

xAI / SpaceX (SpaceXAI)

Compute partner providing Colossus 2's roughly one million H100-equivalents for Cursor's next-generation model, and holder of an option to acquire Anysphere for $60B later this year.

MO

Moonshot AI

Provider of the open-source Kimi K2.5 base checkpoint that Composer 2 and Composer 2.5 are post-trained on top of; their work is the foundation Cursor's RL stack sits on.

AN

Anthropic (Claude Opus 4.7)

Incumbent frontier coding model now matched on most benchmarks by Composer 2.5 at roughly 1/10th the per-token cost, exposing its premium pricing on long-horizon agent workloads.

OP

OpenAI (GPT-5.5)

Competing frontier coding model that still leads on Terminal-Bench 2.0 (82.7% vs 69.3%) but is priced well above Composer 2.5, keeping shell-heavy workflows in its column.

HY

Hyperscaler clouds (AWS, Google, Microsoft)

Lose a marquee AI training customer as Cursor moves to Colossus, and gain a third independent training stack to benchmark against in their own AI infrastructure pitches.

Fact Check

8 cited
  1. [1] Composer 2.5
  2. [2] Cursor Composer 2.5 benchmarks
  3. [3] Cursor's Composer 2.5 matches Opus 4.7 and GPT-5.5 benchmarks at a fraction of the cost
  4. [4] Why SpaceX-Cursor works for both — and what it means for Google, AWS, IBM
  5. [5] SpaceX says it can buy AI coding tool Cursor for $60B later this year
  6. [6] Cursor releases Composer 2.5, saying it's better at sustained work
  7. [7] Cursor's Composer 2.5: A practical look at what actually changed
  8. [8] Cursor's Composer 2.5 punches above its weight class

Source Articles

Top 5

THE SIGNAL.

Analysts

"Frames Composer 2.5 as the opening act of the SpaceXAI partnership rather than a destination, signaling rapid follow-on model releases on Colossus 2."

Michael Truell
CEO, Cursor

"Acknowledged that earlier Composer blog posts under-attributed the Kimi K2.5 base model and pledged to credit it explicitly going forward."

Aman Sanger
Co-founder, Cursor

"Argues the real upgrade is behavioral — communication style, effort calibration, and sustained focus over long rollouts — qualities existing benchmarks fail to measure."

Kingy AI
Independent AI analyst, kingy.ai

"Positions Composer 2.5 as a cost-economics breakthrough that matches Opus 4.7 and GPT-5.5 on CursorBench 3.1 while keeping per-task spend under a dollar."

The Decoder
AI industry publication

"Highlights Cursor's claim that targeted reinforcement learning is producing outsized results relative to the model's parameter count, punching above its weight class."

BigGo Finance
Financial news outlet
The Crowd

"Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we're doubling the included usage of the model."

@@cursor_ai0

"BREAKING: Cursor launched Composer 2.5, its most powerful coding model yet and confirmed the SpaceX AI partnership is now active. Cursor CEO Michael Truell: "This is the very start of our work with SpaceXAI. Hope to have more improvements out soon." Composer 2.5 benchmarks vs."

@@muskonomy0

"Cursor just released Composer 2.5, and the part I find most interesting is not the headline gain in model quality. It is how they built it. Quick context: 1. Cursor is the product. 2. Composer is their in-house model. Composer 2.5 is their strongest version yet, with better"

@@HarshitKhemani0

"Composer 2.5 has been released (2x usage for the next week)"

@u/lrobinson2011189
Broadcast
Composer 2.5 vs Opus | The Results Are Brutal (Based on Published Benchmarks)

Composer 2.5 vs Opus | The Results Are Brutal (Based on Published Benchmarks)

Cursor Composer 2.5 vs Kimi K2.6 — head-to-head CLI tasks (skills, LP, thumbnails, 3D, games)

Cursor Composer 2.5 vs Kimi K2.6 — head-to-head CLI tasks (skills, LP, thumbnails, 3D, games)

Cursor Composer 2.5: What's Hype, What's Real

Cursor Composer 2.5: What's Hype, What's Real