The Harness Wars: Why 'Thin Harness, Fat Skills' Won the Argument
Garry Tan's framework crystallizes a debate that has been simmering across agent engineering: where should intelligence actually live? In Tan's formulation, the harness — the program that runs the LLM — does only four things: runs the model in a loop, reads and writes files, manages context, and enforces safety. Everything else is pushed up into markdown 'skills' (fuzzy human judgment expressed as prompts and playbooks) or down into deterministic code. The provocation is sharp: 'The secret sauce isn't the model. It's the thing wrapping the model: the harness' — and that thing should be deliberately small.
The framework's appeal is partly architectural and partly economic. A fat harness with bespoke orchestration logic gets out of date the moment the underlying model improves; thin harnesses ride the model curve. The Hex engineering team's experience reinforces this from another angle — Izzy Miller noted that harness decisions, not model selection, dominate architecture quality, and Hex's production system carries roughly 100K tokens of tool definitions, suggesting the real surface area is in how skills and tools are designed, not in the runtime.



