The 75% Illusion: Why Trimming Output Barely Dents Your Bill
The headline numbers are seductive. Caveman benchmarks show 1,214 tokens shrinking to 294 — a 75% reduction. The claude-token-efficient CLAUDE.md cuts output words from 465 to 170 — a 63% reduction. These figures are real, reproducible, and almost entirely beside the point.
The disconnect lies in what developers actually pay for. Claude Code’s system prompt alone consumes roughly 19,000 tokens — about 10% of the 200,000 token context window — before a single user message is processed. Each tool invocation injects its results into the context. As Monali Dambre pointed out, the hidden payload per message runs 15,000 to 40,000+ tokens. Against that backdrop, shaving a few hundred tokens off the visible response text is rearranging deck chairs. The claude-token-efficient project’s own independent benchmark confirms this: despite 63% fewer output words, actual cost savings landed at just 17.4%. The CLAUDE.md file itself consumes input tokens on every message, so the net benefit only materializes when output volume is high enough to offset that persistent input cost.


