Why the Harness Matters More Than the Model in Coding Agents
TECH

Why the Harness Matters More Than the Model in Coding Agents

27+
Signals

Strategic Overview

  • 01.
    Coding agent architecture consists of six core components: Live Repo Context, Prompt Shape and Cache Reuse, Structured Tools with Validation and Permissions, Context Reduction and Output Management, Transcripts/Memory/Resumption, and Delegation with Bounded Subagents.
  • 02.
    The agent harness -- the software infrastructure surrounding the model -- is considered as important or more important than the underlying model itself, with every major coding agent implementing the ReAct (Reason + Act) pattern of Read, Plan, Act, Observe.
  • 03.
    The industry is shifting from a conductor model (single agent, synchronous) to an orchestrator model (multiple specialized agents, asynchronous, parallel), with Gartner reporting a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025.
  • 04.
    95% of engineers now use AI tools weekly and 41% of code in 2025 was AI-generated, but AI-coauthored pull requests show 1.7x more issues, PR sizes up 150%, and bugs up 9%.

You Don't Ship a CPU: Why the Harness Became the Product

The most consequential insight in coding agent development is a counterintuitive one: the model matters less than the infrastructure wrapping it. Phil Schmid's analogy -- the model is the CPU, the harness is the operating system -- captures a fundamental shift in how the industry thinks about AI coding tools. This explains why Claude Code, Cursor, and Copilot can use overlapping foundation models yet deliver starkly different developer experiences and satisfaction ratings.

The harness encompasses everything the developer never sees but always feels: how repository context is gathered and compressed, how tools are validated and permissioned, how conversation history is compacted when context windows fill up, and how long-running sessions maintain continuity. Anthropic's published engineering guidance reveals that their approach to session continuity uses an Initializer Agent that reads progress notes and git history before handing off to a Coding Agent -- a pattern that solves the common problem of agents forgetting what they were doing after context resets. Sebastian Raschka's six-component taxonomy formalizes these concerns into a reference architecture: Live Repo Context, Prompt Shape and Cache Reuse, Structured Tools with Validation and Permissions, Context Reduction and Output Management, Transcripts/Memory/Resumption, and Delegation with Bounded Subagents. The taxonomy signals that coding agent development has matured from ad-hoc experimentation into an engineering discipline with identifiable, repeatable components.

From Conductors to Orchestrators: The Multi-Agent Inflection Point

The coding agent industry is undergoing a structural transition from what Addy Osmani calls the conductor model -- a single agent handling tasks synchronously -- to an orchestrator model where multiple specialized agents work asynchronously in parallel. This is not a theoretical shift. Gartner's 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025 reflects real enterprise demand. Stripe already processes over 1,000 pull requests per week from AI agents, demonstrating that multi-agent coding is operating at production scale.

The Deep Agent architecture provides a concrete blueprint for this transition. It separates concerns into three specialized agent types: an Orchestrator that coordinates strategy but has no direct code access, an Explorer with read-only repository access for analysis, and a Coder with read-write access for implementation. This separation of privileges mirrors established software engineering principles like least-privilege access, applied at the agent level. The orchestrator pattern also unlocks parallelism -- an Explorer can analyze one part of the codebase while a Coder implements changes in another, coordinated by the Orchestrator.

The Verification Bottleneck Nobody Planned For

Addy Osmani's observation that the bottleneck is no longer generation -- it's verification points to an uncomfortable truth the industry is still reckoning with. AI coding agents can now produce code at extraordinary speed and volume -- 41% of code in 2025 was AI-generated, projected to exceed 50% by late 2026. But the data on quality tells a more sobering story: AI-coauthored pull requests exhibit 1.7x more issues, PR sizes have increased 150%, and bugs are up 9%. The throughput gains are real, but they shift the burden from writing code to reviewing it.

This verification gap is reshaping both agent architecture and development workflows. Simon Willison's advocacy for a red/green test-first pattern -- where agents write failing tests before implementation -- represents one architectural response. The IMPACT framework (Intent, Memory, Planning, Authority, Control Flow, Tools) addresses it through the Authority and Control Flow components, emphasizing that agents need bounded permissions and clear success criteria rather than open-ended generation authority. At the workforce level, the consequences are already visible: junior developer employment (ages 22-25) fell approximately 20% between 2022 and 2025, while staff-plus engineers lead agent adoption at 63.5%. The emerging pattern suggests that coding agents amplify senior developers' leverage while reducing demand for junior code production -- precisely because verification, code review, and architectural judgment remain bottlenecks that require human expertise.

Context Engineering: The Invisible Art That Separates Good Agents from Great Ones

If the harness is the operating system, then context engineering is its memory management layer. Every coding agent faces a fundamental constraint: the model can only reason about what fits in its context window. How that window is populated -- what repository knowledge is surfaced, how conversation history is compressed, what gets retrieved just-in-time versus pre-loaded -- determines the quality of every action the agent takes. Barry Zhang of Anthropic recommends that agent builders simulate the agent's limited-context perspective to understand what information the agent actually has access to at each decision point.

The industry has converged on several context engineering patterns. CLAUDE.md files provide persistent project-level instructions that survive across sessions. Just-in-time retrieval pulls relevant code and documentation only when needed, avoiding context window bloat. Conversation compaction summarizes older exchanges to free space for new information. Sub-agent isolation gives each specialized agent its own context window, preventing cross-contamination between exploration and implementation tasks. Perhaps most pragmatically, the filesystem-as-context pattern treats the repository itself as extended memory -- agents write notes, progress markers, and intermediate results to files rather than trying to hold everything in the conversation. A freeCodeCamp tutorial demonstrated that a functional coding agent needs only four core tools (read, write, info, execute), but the sophistication lies entirely in how context flows between those tool invocations.

Historical Context

2024-Q1
Beginning of a 1,445% surge in multi-agent system inquiries from enterprise clients.
2025-05
Released Claude Code, which became the top-rated AI coding tool with 46% developer approval.
2025
Coined agent engineering at the AI Engineer Summit, formalizing the discipline of building production-grade agent infrastructure.
2026-02
Published Conductors to Orchestrators on O'Reilly Radar, framing the industry shift.
2026-04
Published comprehensive six-component taxonomy of coding agent architecture.

Power Map

Key Players
Subject

Why the Harness Matters More Than the Model in Coding Agents

AN

Anthropic (Claude Code)

Leading AI coding tool with 46% love it rating among developers. Their published engineering guidance on effective harnesses and context engineering has become a de facto reference for the industry.

GI

GitHub (Copilot)

Dominant in enterprise adoption with 56% penetration at companies with 10,000+ employees. Pioneered the AI pair-programming category but now faces pressure from agent-native competitors.

CU

Cursor

Fastest-growing AI code editor with 35% growth and launching background agents, representing the IDE-integrated approach to coding agent architecture.

OP

OpenAI (Codex)

Late entrant to the coding agent space but already reaching 60% of Cursor usage, leveraging its model ecosystem to compete on the agent infrastructure layer.

CO

Cognition Labs (Devin)

Pioneer of the AI software engineer concept, pushing the boundary from copilot-style assistance toward fully autonomous coding agents with orchestrator-style architecture.

THE SIGNAL.

Analysts

"Much of the recent progress in practical LLM systems is not just about better models, but about how we use them. Published a six-component taxonomy of coding agent architecture."

Sebastian Raschka
AI Researcher and Author

"The model is the CPU, the harness is the operating system. You do not ship a CPU to end users. Argues that the software infrastructure wrapping the model is the true differentiator."

Phil Schmid
Agent Engineering Practitioner

"The bottleneck is no longer generation. It is verification. Advocates for delegating tasks with clear pass/fail criteria, and describes the conductor to orchestrator shift."

Addy Osmani
Engineering Leader and Author

"Coined the term agent engineering at the AI Engineer Summit, arguing that LLM + tools + loop omits critical production components."

swyx (Shawn Wang)
AI Engineer, coined Agent Engineering

"Advocates for a red/green test-first pattern for coding agents, where agents write failing tests before implementation."

Simon Willison
Developer and Open Source Advocate
The Crowd

"Components of a coding agent: a little write-up on the building blocks behind coding agents, from repo context and tool use to memory and delegation."

@@rasbt661

"Anthropic just accidentally taught you how to build the best AI agent harness. Here's everything inside Claude Code's source code and how you can use it to build something smarter."

@@rohit4verse442

"Stanford paper shows that AI agents get better when you optimize the harness around the model, not just the model itself. On TerminalBench-2 with Claude Haiku 4.5, the optimized harness scored 37.6%, ahead of Claude Code at 27.5%."

@@rohanpaul_ai152
Broadcast
How We Build Effective Agents: Barry Zhang, Anthropic

How We Build Effective Agents: Barry Zhang, Anthropic

Building Agents with Model Context Protocol - Full Workshop with Mahesh Murag of Anthropic

Building Agents with Model Context Protocol - Full Workshop with Mahesh Murag of Anthropic

Guide to Agentic AI - Build a Python Coding Agent with Gemini

Guide to Agentic AI - Build a Python Coding Agent with Gemini