Capability jump: what Mythos actually does that prior Claude models could not

Mythos is not a marginal improvement to Claude's coding chops; it is a step-change in autonomous offensive capability. In one benchmark, the model built working exploits 181 times and gained register control on 29 additional tries, shredding Opus 4.6's baseline. The U.K. AI Security Institute clocked it at a 73% success rate on expert-level hacking tasks, and Anthropic reports an 89% agreement between Mythos's severity judgments and human validators — meaning it not only finds bugs but triages them credibly.
The depth is as striking as the breadth: a 27-year-old OpenBSD vulnerability, a 17-year-old FreeBSD RCE, and a 16-year-old FFmpeg flaw all surfaced in testing. Mozilla's Firefox 150 shipped fixes for 271 Mythos-found vulnerabilities in a single release cycle. The economic signal is just as important as the technical one: more than 1,000 runs through Anthropic's scaffold cost under $20,000 in compute. Exploit generation has historically been bottlenecked by scarce senior talent; Mythos collapses that bottleneck into a line item that any motivated actor can afford.



