AI Engineering for Agentic Systems
TECH

AI Engineering for Agentic Systems

40+
Signals

Strategic Overview

  • 01.
    Garry Tan's 'Thin Harness, Fat Skills, Fat Code' framework has become the dominant architectural philosophy for agentic engineering, pushing fuzzy judgment into markdown skills and deterministic operations into code while keeping the runtime loop minimal.
  • 02.
    AI engineering guidebooks now codify a canonical decision order — prompt, then RAG, then fine-tune — alongside a stack of MCP, evaluation, and LLMOps as the foundational concepts every agent builder needs.
  • 03.
    The model layer is catching up to agentic workloads: Ant Group's Ling-2.6-flash (104B total / 7.4B active MoE) was post-trained with Agentic RL specifically for tool calling and multi-step planning.
  • 04.
    Despite the framework and model momentum, only ~5% of surveyed teams have agents in production, and 86-89% of agent pilots fail before shipping — fueling skepticism about jumping straight to multi-agent architectures.

The Harness Wars: Why 'Thin Harness, Fat Skills' Won the Argument

Garry Tan's framework crystallizes a debate that has been simmering across agent engineering: where should intelligence actually live? In Tan's formulation, the harness — the program that runs the LLM — does only four things: runs the model in a loop, reads and writes files, manages context, and enforces safety. Everything else is pushed up into markdown 'skills' (fuzzy human judgment expressed as prompts and playbooks) or down into deterministic code. The provocation is sharp: 'The secret sauce isn't the model. It's the thing wrapping the model: the harness' — and that thing should be deliberately small.

The framework's appeal is partly architectural and partly economic. A fat harness with bespoke orchestration logic gets out of date the moment the underlying model improves; thin harnesses ride the model curve. The Hex engineering team's experience reinforces this from another angle — Izzy Miller noted that harness decisions, not model selection, dominate architecture quality, and Hex's production system carries roughly 100K tokens of tool definitions, suggesting the real surface area is in how skills and tools are designed, not in the runtime.

The Reliability Wall: When 90% Per Step Becomes 35% End-to-End

The most circulated piece of agent-skeptic math this month came from a Reddit comment on a 10-agent Obsidian system: 'Chain 10 steps at 90% accuracy each and your overall pipeline success rate is ~35%.' That single line explains why 86-89% of enterprise agent pilots fail before reaching production and why Shopify's Andrew McNamara explicitly tells teams to 'avoid multi-agent architectures early.'

The data backs the skepticism: only 95 of 1,837 surveyed teams (~5%) report agents live in production, and 70% of regulated enterprises rebuild their stack every three months or faster. The most-skipped engineering piece, according to practitioners who have shipped agents to 20+ startups, is observability and decision-trace logging — agentic workflows fail silently with 'plausible-but-wrong actions' that pass shallow eval but corrupt downstream state. Bri Kopecki's 7-skill framing on IBM Technology lands the same point from a curriculum angle: reliability, evaluation, and security are listed as first-class disciplines alongside system design and retrieval, not as afterthoughts.

The Model Layer Finally Optimizes For Agents

For most of 2024-2025, agent builders treated frontier models as general-purpose engines and bolted on tool-calling via prompting and fine-tuning. Ant Group's Ling-2.6 family inverts that. Ling-2.6-flash is a 104B-parameter Mixture-of-Experts with only 7.4B active per token, and crucially it was post-trained with Agentic RL specifically for tool calling and multi-step planning — described by analysts as 'sharpened for AI agent scenarios.' The trillion-parameter Ling-1T sibling already hits ~70% tool-call accuracy on the BFCL V3 benchmark with only light instruction tuning.

The economic story is the intelligence-to-token ratio: agentic workloads burn through context and tool calls aggressively, and a sparse MoE optimized for that pattern costs less to run at scale than a dense generalist. Combined with the 'thin harness' philosophy, this suggests an architectural convergence — a small, fast harness orchestrating skills against a model that was itself trained to be the agent, rather than coaxed into being one.

MCP: The Quiet Infrastructure Win

Amid the louder debates about frameworks and multi-agent orchestration, the Model Context Protocol has done something rare in AI infrastructure: it became a de facto standard without an obvious fight. Anthropic shipped MCP in late November 2024 as a way to standardize how LLMs talk to tools — guidebooks now describe it as 'USB-C for AI.' By February 2026 the protocol had reached 97 million monthly SDK downloads and over 10,000 active public MCP servers.

Google's A2A protocol followed in April 2025, and the Linux Foundation's Agentic AI Foundation arrived in late 2025 to formalize the standards layer. The practical implication for AI engineers is that the 'tool integration' skill on Kopecki's 7-skill list is increasingly a configuration problem rather than a bespoke integration problem — you wire your agent to a registry of MCP servers rather than writing one-off adapters. That, more than any single framework release, is what makes the agent stack feel finally portable.

The Hype-Reality Gap, By The Numbers

The Hype-Reality Gap, By The Numbers
Despite Gartner forecasts, only 5% of teams have AI agents in production while 86-89% of pilots fail.

The numbers tell two stories at once. Gartner projects 40% of enterprise applications will embed AI agents by end of 2026, up from less than 5% in 2025 — an 8x jump that anchors most vendor decks. Cleanlab's production survey paints the opposite picture: ~5% of teams have agents live, 70% of regulated enterprises rebuild their stack every quarter, and 'only five percent of the five percent of companies with agents in production even worry about accurate tool calling.' k4i.com's tracking of pilots puts the failure rate at 86-89%.

The split isn't necessarily contradictory — Gartner is measuring application-level embedding (often shallow copilots), while Cleanlab and k4i are measuring genuine autonomous-loop deployments. Reddit's reception of a 424-page 'Agentic Design Patterns' PDF from a senior Google engineer captured the mood: half the comments called it a well-structured introduction; the other half called it AI slop rehashing known concepts. That bimodal reception is the cluster's tell — the field has clearly accumulated enough best practices to fill a 424-page book, but not enough production-grade systems to validate which of those practices actually matter.

Historical Context

2024-11-25
Anthropic released the Model Context Protocol in late 2024, providing a standardized way for LLMs to connect to external tools — what guidebooks now call 'USB-C for AI.'
2025-04-01
Google followed MCP with the Agent2Agent (A2A) protocol introduced in April 2025, expanding agent interoperability standards beyond a single vendor's stack.
2025-10-01
LangGraph reached 1.0 GA in October 2025, marking the maturation of stateful agent orchestration tooling for production workloads.
2025-12-01
The Linux Foundation announced the Agentic AI Foundation in late 2025 to establish shared standards akin to W3C for the agent ecosystem.
2026-04-09
Garry Tan published the 'Thin Harness, Fat Skills' essay that hit nearly a million views on X and crystallized an architectural philosophy for agent engineering.
2026-04-21
Ant Group officially released Ling-2.6-flash, a 104B-param MoE with 7.4B active parameters tuned with Agentic RL for tool use and multi-step planning.

Power Map

Key Players
Subject

AI Engineering for Agentic Systems

AN

Anthropic

Creator of the Model Context Protocol (MCP) and the Claude family used as leading reasoning models for multi-step agent workflows; also publishes an Agent SDK.

GA

Garry Tan (Y Combinator)

President and CEO of Y Combinator; popularized the 'Thin Harness, Fat Skills, Fat Code' framework now influencing how startups architect agents.

AN

Ant Group / inclusionAI

Releases the open-weight Ling family (Ling-1T, Ling-2.5-1T, Ling-2.6-1T, Ling-2.6-flash) trained with Agentic RL specifically for tool-use and multi-step agent workflows.

LA

LangChain / LangGraph

Major agent engineering platform with 97k+ GitHub stars; LangGraph hit 1.0 GA in October 2025 and shipped deep agent templates and distributed runtime support by March 2026.

LI

Linux Foundation

In late 2025 announced the Agentic AI Foundation to establish shared standards and best practices for an interoperable agent ecosystem.

SH

Shopify

Engineering org that publicly cautions against premature multi-agent architectures and provides production guidance to other teams.

Source Articles

Top 5

THE SIGNAL.

Analysts

"Argues the value-creating layer in agentic systems is the combination of fat skills and fat code — push intelligence up into markdown skills, push execution down into deterministic tooling, and keep the harness thin."

Garry Tan
President & CEO, Y Combinator

"Recommends teams avoid multi-agent architectures in early stages and favor simpler single-agent designs first, citing production complexity and reliability cost."

Andrew McNamara
Engineering, Shopify

"Argues agentic systems require new architecture and security thinking because they take actions, not just suggest them — you are no longer securing software that suggests, you are securing software that acts."

Anurag Gurtu
CEO, AIRRIVED

"Skeptical that 2025 was truly the 'year of AI agents' — he has only seen three or four use cases with agents in production, while most companies remain in evaluation or development."

Michael Hannecke
Bluetuple.ai

"Quality of curated data matters far more than data volume when building production agentic systems; thoughtful curation outperforms raw scale."

Jackie Brosamer
Block
The Crowd

"A senior Google engineer dropped a 424-page doc called Agentic Design Patterns"

@u/sibraan_1600

"I am a PhD student in AI and I built a 10-agent Obsidian crew because my brain could not keep up with my life anymore"

@u/Routine_Round_84911200

"I built AI agents for 20+ startups this year. Here is the engineering roadmap to actually getting started."

@u/Warm-Reaction-456553
Broadcast
Building AI Agents that actually work (Full Course)

Building AI Agents that actually work (Full Course)

The 7 Skills You Need to Build AI Agents

The 7 Skills You Need to Build AI Agents

How Hex Builds AI Agents: Making Agents Reason Like Human Data Analysts | Izzy Miller, AI Engineer

How Hex Builds AI Agents: Making Agents Reason Like Human Data Analysts | Izzy Miller, AI Engineer