From 22 to 271: What Actually Changed Between Opus 4.6 and Mythos

The most concrete way to read the Mythos jump is to put the two Mozilla collaborations side by side. In 2025, Anthropic's Claude Opus 4.6 helped Mozilla close 22 security-sensitive bugs in Firefox 148. Six months later, Mythos Preview helped close 271 vulnerabilities in Firefox 150 — an order-of-magnitude move on the same target, with the same defender, in the same harness. Anthropic's own red-team note pushes the contrast harder on the offensive side: where Opus 4.6 had near-zero success producing working Firefox exploits, Mythos 'developed working exploits 181 times.' Mozilla's CTO Bobby Holley sums up the lived experience as 'Defenders finally have a chance to win, decisively,' while still cautioning that none of the bugs were beyond an elite human researcher's reach.
What changed is not really the class of bug being found, but the unit economics of finding them. AISI's independent evaluation puts numbers on this: Mythos solves expert-level CTF tasks 73% of the time — tasks AISI says no model could complete before April 2025 — and is the first model to fully solve a simulated network takeover (TLO) range, succeeding in 3 of 10 attempts and averaging 22 of 32 attack steps. That is the inflection. A capability that previously required either months of expert human labor or an orchestrated, brittle pipeline of older models has compressed into a single prompt-and-tool-loop run. Reports referenced by Fox News claim Mythos surfaced over 2,000 previously unknown software vulnerabilities in roughly seven weeks. Whether all of those are 'critical' is a separate fight — the most-upvoted r/Anthropic critique points out that Anthropic only manually verified under 200 of them — but even the verified subset reframes vulnerability research from artisanal craft to industrial throughput.



