Project Deal: The Uncomfortable Economics of Agent Inequality

Anthropic just ran the most important AI experiment nobody asked for — and the results are deeply uncomfortable.

In December 2025, 69 Anthropic employees spent a week letting AI agents trade real goods on their behalf. The agents wrote listings, found buyers, haggled in natural language, and closed deals across 186 transactions worth $4,000+. Some agents ran Claude Opus 4.5. Some ran Claude Haiku 4.5. The humans didn't know which was which.

The finding: stronger agents systematically extracted better deals. And here's the part that should worry everyone — the humans getting worse prices rated their experience as equally fair.

This is "Project Deal," and it's not a product. It's a warning about what happens when AI agents enter real markets with unequal capabilities.

The Experiment Design

Anthropic built a Slack-based classified marketplace mimicking Craigslist. Each participant got a $100 budget. Agents operated autonomously — writing listings, finding counter-parties, negotiating prices, closing deals. Humans only re-entered at the end to physically exchange goods.

The real insight came from running four parallel marketplaces:

Run A: All agents used Claude Opus 4.5 (frontier model)
Runs B & C: Roughly half Opus 4.5, half Haiku 4.5 (smallest/fastest model)
Run D: All Opus 4.5 as a second baseline

Participants weren't told which run was "real" or which model represented them until after surveys.

The Numbers Nobody Wanted to See

The results revealed a stark capability gap that translated directly into dollar outcomes:

| Metric | Opus Advantage | What It Means | |--------|---------------|---------------| | Seller premium | +$2.68 per sale | Opus sellers extracted more money | | Buyer discount | -$2.45 per purchase | Opus buyers paid less | | Opus vs. Haiku price | $24.18 vs. $18.63 | 30% price gap in cross-tier deals | | Fairness rating (Opus) | 4.05 / 7 | Users felt treated fairly | | Fairness rating (Haiku) | 4.06 / 7 | Users felt equally fairly treated |

The median deal was $12; average was $20.05. In this price range, a $2-3 difference per transaction is enormous — roughly 10-25% of deal value.

The critical finding isn't the price gap. It's the invisibility of the gap. Haiku users rated fairness 4.06 out of 7. Opus users rated it 4.05. Statistically identical. The people getting worse deals literally couldn't tell.

Why Prompting Doesn't Fix This

Some participants tried engineering around the gap. They instructed agents to "negotiate hard," "lowball," or adopt aggressive tactics. The result? Nothing meaningful.

Aggressive sellers set higher opening prices but saw no statistically significant difference in:

Sale likelihood
Final closing price (once opening prices were controlled for)
Purchase price

The instruction effect was negligible compared to the model effect. As Anthropic's data shows: you cannot prompt your way out of a capability disadvantage. Model choice directly maps to dollar outcomes, and clever system prompts don't close the gap.

This demolishes a common builder assumption: that prompt engineering can compensate for weaker models. In adversarial or negotiation contexts, it can't.

The Invisible Exploitation Problem

Anthropic calls this an "uncomfortable implication" — which is corporate speak for "this is genuinely alarming."

Consider what happens when this scales:

Procurement: Your company uses Haiku-tier agents to negotiate vendor contracts. Your vendor uses Opus-tier agents. You're systematically overpaying by 10-25% on every deal. Your agents report the contracts as "fair." You never know.

Real estate: Buyer agents with different capabilities negotiate property prices. The buyer with weaker agents pays more without knowing it. The transaction feels fair because both sides used "AI assistance."

Labor markets: Recruiting agents negotiate salaries. Candidates with weaker agents get lower offers. They rate the process as fair. The wage gap compounds silently.

The fairness paradox is the core problem: subjective satisfaction is uncorrelated with objective outcomes. People feel treated fairly while being systematically disadvantaged. Traditional market safeguards — transparency, competition, consumer protection — assume participants can detect bad deals. This experiment proves they can't.

What Scale Breaks

Project Deal worked because the conditions were artificially favorable:

Cooperative participants: 69 colleagues swapping personal items, not adversarial strangers
Low stakes: Snowboards, lab-grown rubies, broken bikes. Not commercial contracts
Trusted environment: No prompt injection, no scams, no malicious counter-parties
Equal information: Everyone saw the same marketplace, same items
No legal ambiguity: Disputes were resolved informally. No binding commitments at risk

As one analysis put it: "The conditions that made it work — motivated participants, no adversarial behavior, volunteer dynamics — are exactly the conditions that won't hold when this scales to actual commerce."

Real markets introduce three catastrophic risks:

Prompt injection: An attacker crafts messages to manipulate the opposing agent. In Project Deal, a "broken bike" listing was a funny anecdote. In real procurement, a manipulated agent committing to a bad deal is a lawsuit.

Invisible exploitation at scale: If 10% of transactions involve asymmetric agents with 15% price disadvantages, that's billions in value transfer that market participants never perceive. It's not theft if the victim doesn't know they're a victim.

Legal vacuum: Who's liable when an agent commits to a bad deal? What constitutes a binding agent commitment? Anthropic itself admits: "The policy and legal frameworks around AI models that transact on our behalf simply don't exist yet."

The Infrastructure Gap

Current multi-agent frameworks — CrewAI, AutoGen, LangGraph — focus on collaborative task completion, not adversarial negotiation. MCP (Anthropic's Model Context Protocol) and A2A (Google's Agent-to-Agent Protocol) define how agents communicate and access tools. Neither addresses the core problem Project Deal reveals: how to prevent capability asymmetry from creating systematic exploitation.

There's no standardized protocol for:

Agent capability disclosure in marketplaces
Negotiation fairness verification
Binding commitment enforcement
Dispute resolution between autonomous agents
Liability assignment for agent actions

The infrastructure to make agent commerce fair doesn't exist. The infrastructure to make it work is barely emerging. We're building the plane while it's in the air, and the passengers don't know some seats have better engines than others.

Why Anthropic Ran This

Project Deal isn't a product announcement. It's research — specifically, research into the societal implications of agent capabilities. Anthropic has been unusually focused on "societal impact" studies, from election integrity to economic effects of AI.

The subtext is clear: Anthropic is building the most capable agents in the world, and they're worried about what happens when those agents interact with less capable ones in economic contexts. They're essentially saying: "We can build this. We're not sure we should let everyone use different versions of it in markets."

This is responsible research — but it's also strategic positioning. By documenting the problem first, Anthropic positions itself as the company that understands agent commerce risks. When regulation inevitably comes, they'll be the ones who warned everyone.

The Three Futures

Project Deal suggests three possible trajectories for agent commerce:

Future 1: Capability Disclosure Markplaces require agents to disclose their model tier. "This negotiation is conducted by Claude Haiku 4.5" becomes as standard as "seller rating: 4.5 stars." But disclosure alone doesn't fix the asymmetry — it just makes it visible. Buyers with weaker agents still lose deals.

Future 2: Standardized Agent Tiers Markplaces mandate all participants use equivalent agent capabilities. Like weight classes in boxing, agent commerce gets regulated by capability bands. This prevents exploitation but creates its own problems: who enforces the standard? Who certifies agent capabilities? What happens when new models emerge?

Future 3: Wild West No regulation, no disclosure, no standards. Capability asymmetry becomes a competitive advantage. Companies invest in better agents the way they invest in better lawyers — as a cost of doing business. The result: agent quality becomes a new form of economic inequality, where those who can afford frontier models systematically outperform those who can't.

Anthropic's own survey suggests Future 3 is the default: 46% of participants said they'd pay for this service. There's product-market fit for agent-mediated commerce. The market will build it before the regulators understand it.

What Builders Should Do Now

Three immediate takeaways for anyone building agent systems:

1. Model tier is a feature, not an implementation detail If your product involves any negotiation, prioritization, or adversarial interaction, the model you choose directly impacts user outcomes in dollar terms. Don't hide model selection in backend configs. Make it visible. Make it a choice users understand.

2. Build disclosure mechanisms Before regulators force you to, build ways for users to see what model tier their counterparty is using. Transparency won't prevent asymmetry, but it creates market pressure toward standardization.

3. Design for equivalent capabilities in adversarial contexts If you're building a marketplace, a procurement tool, or any system where agents negotiate against each other, design for capability parity. The alternative is systematic exploitation that your users won't even perceive.

The Uncomfortable Truth

Project Deal's most important finding isn't about agents. It's about humans.

We assumed that if AI created economic asymmetries, people would notice. They'd complain. They'd switch providers. Market forces would correct the imbalance.

Anthropic just proved that's wrong. People can't perceive capability-driven price gaps. They rate unfair deals as fair. They prefer weaker agents that "feel" helpful over stronger agents that extract worse outcomes without their knowledge.

This breaks the standard economic assumption that consumers act in their own rational self-interest. When the mechanism of exploitation is invisible, market correction doesn't happen. The invisible hand needs information to work. Project Deal shows that information is missing.

The question isn't whether agent commerce will happen. 46% of Anthropic's own employees would pay for it today. The question is whether we'll build the governance infrastructure before the market creates systematic, invisible exploitation at scale.

Anthropic's answer, translated from corporate research-speak: probably not.

Demystify AI is an independent publication covering the business, technology, and strategy of artificial intelligence. We don't take sponsorships from AI labs. Our analysis is funded by readers like you.

DEMYSTIFY