The Capability Jump: From 2 Exploits to 181
The most striking data point in the Mythos announcement is the sheer scale of improvement over previous models. On Firefox vulnerability exploitation, Mythos Preview produced 181 working exploits compared to just 2 from Opus 4.6 — a 90x increase in a single generational jump. Its exploitation success rate reached 84% compared to the previous best model's 15%. On the CyberGym benchmark, Mythos scored 83.1% versus Opus 4.6's 66.6%. These are not incremental gains; they represent a qualitative shift in what AI can do autonomously in the cybersecurity domain.
Beyond raw numbers, the nature of the vulnerabilities discovered underscores the model's sophistication. Mythos found a 27-year-old flaw in OpenBSD, a 16-year-old FFmpeg vulnerability that had evaded 5 million automated test iterations, and a 17-year-old FreeBSD NFS remote code execution bug. In one case, it wrote a browser exploit chaining four vulnerabilities together using a complex JIT heap spray that escaped both renderer and OS sandboxes. Anthropic's red team reported that expert penetration testers said exploits Mythos wrote in hours would have taken them weeks to develop. The model also performs strongly beyond security — scoring 93.9% on SWE-bench Verified (vs. 80.8%) and 97.6% on USAMO 2026 (vs. 42.3%) — suggesting the cybersecurity capability emerges from broadly superior reasoning rather than narrow specialization.



