What the 80%, 8x and 52x numbers actually measure
Anthropic's case for taking recursive self-improvement seriously rests on a stack of internal metrics that look much sharper when unpacked individually. The headline figure is that Claude now authors more than 80% of code merged into Anthropic's codebase, up from low single digits before Claude Code launched in research preview in February 2025 [1]. Sitting beneath that is an 8x quarter-over-quarter increase in code shipped per engineer between 2024 and Q2 2026, plus 800 internal API fixes that Anthropic estimates would have taken about four years of human effort to complete [1][6]. None of these are public benchmarks; they are productivity readings from a single lab on its own product, so the absolute level matters less than the slope.
The more load-bearing number is Mythos Preview's 52x mean speedup on a CPU-only LM training-optimization task, where skilled human engineers manage roughly 3x and take four to eight hours to do it [4][8]. That benchmark is the closest thing in the post to an actual recursive-self-improvement signal: an AI system optimizing the code that trains AI systems. Outside Anthropic, METR independently places Claude Mythos Preview's 50% time horizon at 'at least 16 hours' — the ceiling of METR's current task suite — with a 95% confidence interval from 8.5 to 55 hours, and notes the horizon has been roughly doubling every four months [2]. Put together, the picture is less 'AI replacing engineers' and more 'AI compressing the iteration loop that produces better AI', which is the specific mechanism Anthropic is asking regulators to think about.


