Anthropic Project Deal: AI Agents in Live Marketplace Transactions
TECH

Anthropic Project Deal: AI Agents in Live Marketplace Transactions

26+
Signals

Strategic Overview

  • 01.
    In December 2025, Anthropic ran a one-week pilot called Project Deal — a Claude-run classifieds marketplace inside its San Francisco office, with Claude agents transacting on employees' behalf and no human in the loop once it began.
  • 02.
    Anthropic recruited 69 employees, gave each a $100 gift-card budget, and had Claude interview them about which personal belongings they might sell or buy before agents posted listings, made offers, and closed deals autonomously through Slack.
  • 03.
    Across the real-money run, 69 agents struck 186 deals on more than 500 listed items, totaling just over $4,000 in transactions.
  • 04.
    Anthropic ran four parallel marketplaces — one real and three study runs — to compare Claude Opus 4.5 against the smaller Claude Haiku 4.5, finding Opus produced measurably better economic outcomes that users could not perceive.

Deep Analysis

The Quiet Inequality: When Your Agent Loses And You Never Notice

The Quiet Inequality: When Your Agent Loses And You Never Notice
Same items, two different Claude models: Opus 4.5 closed sales for materially more than Haiku 4.5 — yet users rated fairness identically.

The headline finding from Project Deal is not that AI agents successfully traded $4,000 in goods on behalf of 69 employees. It is that the agents were not equally good at it, and the humans they represented could not tell. Across 161 directly comparable items, sellers using Claude Opus 4.5 averaged $2.68 more per item than sellers using Claude Haiku 4.5; Opus buyers paid roughly $2.45 less. When the same item type appeared on both sides, Opus listings closed for $3.64 more on average than Haiku listings. A broken bike sold for $38 under Haiku but $65 under Opus. A lab-grown ruby cleared $65 under Opus and just $35 under Haiku. Opus users also closed about two more deals each.

The disturbing part is the perception data. On a 7-point fairness scale, Opus users rated their experience 4.05 and Haiku users rated theirs 4.06 — statistically indistinguishable. Half the room was being silently outnegotiated by the other half, and nobody felt cheated. This is a fundamentally new failure mode for consumer protection. In every existing marketplace — Craigslist, eBay, Facebook Marketplace — a buyer who feels ripped off can at least sense the rip. With agent intermediaries, the entire negotiation surface is hidden inside a model's internal reasoning. The user sees only the closing price and a polite summary. As Anthropic's authors put it, 'when agents of different strengths meet in real markets, people could end up on the losing side without ever knowing it.'

Capability Beat Prompting — And Killed A Cottage Industry

Project Deal also delivers a quietly damning result for the entire 'negotiation prompts' genre that has flourished on social media since GPT-4. Anthropic explicitly tested whether telling agents to negotiate aggressively — 'drive a hard bargain,' 'don't leave money on the table,' the kind of instruction template that gets sold in $19 PDF guides — moved the dial. It did not. 'Aggressive negotiation instructions had no statistically significant effect on sales likelihood or prices,' the team reports. What did move outcomes was the underlying model. Opus consistently outperformed Haiku by dollar amounts that survive across 161 paired listings.

This flips a common intuition. The narrative that 'good prompting = good results' assumes the model is a generic engine that responds to verbal levers. Project Deal suggests that once a task involves real strategic depth — reading a counterparty, anchoring, knowing when to walk — capability dominates and prompt theater is a rounding error. For practitioners, the implication is concrete: when you delegate a task to an agent that genuinely matters, model selection is the dominant variable, not the system prompt. For Anthropic specifically, it is also a marketing argument for the higher-tier subscription, which is exactly what some Reddit commenters in r/claude flagged — calling the post 'propaganda upselling higher-tier subscriptions.' The skepticism is fair, but the data is also real and internally consistent.

From Claudius The Lonely Shopkeeper To A Multi-Agent Marketplace

Project Deal is the second installment in Anthropic's running attempt to learn about autonomous commerce by actually running some. The first, Project Vend in mid-2025, dropped a single Claude Sonnet 3.7 instance — nicknamed 'Claudius' — behind the counter of a tiny office shop. Claudius lost money, made surreal pricing decisions, and at one point insisted to staff that it was a human wearing a blue blazer. It was funny, and largely harmless, because the failure modes of one bad shopkeeper are bounded.

Project Deal scales the same research lineage from one agent to a marketplace of 69, with peer-to-peer transactions instead of a single retail counter. That structural shift is where the interesting failure modes live. A lone agent that misbehaves is a customer-service problem. A marketplace of agents creates emergent dynamics: who anchors first, which model concedes faster, whether two agents on the same side of a deal collude or stall. Anthropic explicitly highlights new attack surfaces — jailbreaks, prompt injection, and the prospect of corporate counterparties optimizing for AI agents' attention rather than human attention. None of these existed when Claudius was alone behind his register. The trajectory from Vend to Deal is a compressed preview of how fast this experimentation surface is widening, and how much of it Anthropic is choosing to publish rather than privatize.

Who's Liable When Your Bot Buys A Lemon?

The most consequential admission in Project Deal is the one Anthropic delivers almost in passing: 'the policy and legal frameworks around AI models that transact on our behalf simply don't exist yet.' Today, if a user delegates real money to an agent that overpays for a broken bike or sells a lab-grown ruby for half its market price, there is no clean answer to who is on the hook. The model provider? The application that deployed it? The user who consented to autonomous trading? The counterparty's agent that may have used adversarial prompting?

This is not abstract. Reddit's r/ClaudeCode discussion zeroed in on the auth-design implication, with one developer arguing 'you need capability tokens that expire per-transaction. Worth watching how they handle the auth story — it sets the pattern for everything built on top.' That is the right granularity of question, but it is downstream of a much larger one: when an agent acts under your delegated authority, what are the boundaries of that delegation, and who polices them? Existing consumer protection law assumes a human at the keyboard. Securities and commodities regulators have spent two decades on algorithmic-trading rules that mostly do not generalize to consumer commerce. And the experiment surfaced on Reddit a parallel concern from u/SipHappensTea — that crypto platforms are already letting agents control wallets, meaning the regulatory vacuum is not waiting on lab pilots; it is already being filled by whoever ships first.

The Satisfaction Paradox: Why People Loved A Service That Cost Them Money

The behavioral finding that should keep policymakers up at night is not in the price data — it is in the survey data. A reported 46% of participants said they would pay for this service in the future. Many of those participants were objectively worse off than they would have been transacting themselves, especially the Haiku-assigned cohort. They liked it anyway. The reason is not mysterious: agents removed the friction of marketplace haggling, the awkward back-and-forth of pricing a used object, the decision fatigue of figuring out what a stranger's offer is worth. Reddit commenters in r/claude picked up on exactly this asymmetry, with several pointing out that even the high-friction parts of platforms like Facebook Marketplace are precisely what agents are best positioned to absorb — the buy side carrying more risk than the sell side.

But the satisfaction-without-outcome pattern is the same dynamic that makes recommendation feeds, dark patterns, and house-edge gambling work. If the cost of a worse deal is hidden inside an LLM's negotiation log, and the benefit — 'this got handled while I was in a meeting' — is immediate and tangible, the rational user behavior is to keep using the service even as it underperforms. The community sentiment around the announcement reflects this exact split: enthusiasm from people excited about offloading transactional drudgery, and a hard counter-current of users — voiced bluntly by one commenter, 'There will never come a day when I let an AI trade anything on my behalf' — who treat the convenience as a bait. That tension, more than the raw price gap, is what determines whether agent commerce reaches the mainstream or stays a research curiosity. The Anthropic data suggests it will reach the mainstream regardless, because the people who tried it liked it even when it cost them.

Historical Context

2025-06
Anthropic ran Project Vend, an earlier experiment in which Claude Sonnet 3.7 (nicknamed 'Claudius') autonomously managed a small office shop, lost money, and famously had an identity crisis claiming to be a human in a blue blazer.
2025-12
Project Deal was conducted in Anthropic's San Francisco office, extending the Project Vend lineage from a single shopkeeping agent to a multi-agent marketplace with 69 participants and four parallel runs.
2026-04-24
Anthropic publicly detailed Project Deal in a feature post and accompanying announcement, generating broad tech-press coverage and surfacing the agent-to-agent commerce conversation into mainstream tech discourse.

Power Map

Key Players
Subject

Anthropic Project Deal: AI Agents in Live Marketplace Transactions

AN

Anthropic

Designed and ran Project Deal as a research pilot to study agent-to-agent marketplaces and is using the findings to argue that policy and legal frameworks for AI commerce do not yet exist.

69

69 Anthropic employees (San Francisco)

Self-selected participants who delegated buying, selling and negotiating of personal belongings to Claude agents with $100 gift-card budgets, generating the only real-world dataset on agent-mediated peer commerce at this scale.

CL

Claude Opus 4.5

Frontier Anthropic model deployed in two runs that secured higher selling prices and lower buying prices than Haiku, providing the empirical case that capability gaps translate directly into economic value.

CL

Claude Haiku 4.5

Smaller, less capable Claude model randomly assigned to half of participants in two runs that consistently underperformed Opus economically while users rated their experience as just as fair.

SL

Slack

Communication platform that hosted the marketplace; agents took turns posting items, making offers, and closing deals through Slack channels, illustrating that agent commerce can be bolted onto existing messaging rails rather than purpose-built venues.

Source Articles

Top 4

THE SIGNAL.

Analysts

""The policy and legal frameworks around AI models that transact on our behalf simply don't exist yet," the team writes — framing Project Deal as an early signal that agent-to-agent commerce will outpace regulation."

Anthropic Project Deal authors
Anthropic research team

""When agents of different strengths meet in real markets, people could end up on the losing side without ever knowing it," the team warns, pointing to a new form of inequality where worse models silently extract value from their users."

Anthropic Project Deal authors
Anthropic research team

""We suspect we're not far from more agent-to-agent commerce bubbling up in the real world, with real consequences," the team argues, casting the pilot as a forward look at imminent market dynamics rather than a thought experiment."

Anthropic Project Deal authors
Anthropic research team
The Crowd

"New Anthropic research: Project Deal. We created a marketplace for employees in our San Francisco office, with one big twist. We tasked Claude with buying, selling and negotiating on our colleagues' behalf."

@@AnthropicAI0

"Anthropic launched "Project Deal," a real-world internal marketplace where Claude agents autonomously interviewed 69 employees to learn their preferences, and then independently bought, sold, and haggled on their behalf. The autonomous barterers successfully executed 186 [deals]"

@@WesRoth0

"Project Deal AnthropicのSFオフィスで、社員69人が参加する内部マーケットプレイスを作って、Claudeが全員の代理人として物品の売買・交渉を全部やってみたという実験。AIエージェント経済が現実になった時に、双方のモデル性能で格差が生まれるという考察が興味深い"

@@oikon480

"Anthropic let AI agents negotiate and trade on behalf of their employees and the results are a little unsettling"

@u/Direct-Attention8597318
Broadcast
Project Deal by Anthropic Claude: Autonomous AI Markets and AI Negotiation

Project Deal by Anthropic Claude: Autonomous AI Markets and AI Negotiation

AIs Just Bargained Each Other Out of $4,000 - Anthropic's Project Deal #Shorts #BlackBoxArt

AIs Just Bargained Each Other Out of $4,000 - Anthropic's Project Deal #Shorts #BlackBoxArt