The whole crisis hinges on three words: 'fix this code'
Strip away the national-security framing and the trigger is startlingly mundane. The government's concern was a method of 'jailbreaking' Fable 5 to bypass the safeguards that gate the underlying Mythos model's cybersecurity abilities, and Anthropic says that technique amounted to asking the model to read a codebase and fix its software flaws [1]. Amazon's researchers reportedly distilled the exploit down to a 'fix this code' prompt, which Jassy then escalated to the White House and Treasury [3]. The framing matters because the same capability looks like an attack from one chair and like routine work from another.
That split is exactly what the cybersecurity community seized on. Katie Moussouris, founder of Luta Security, argued the flagged behavior is core defensive security and cannot be 'fixed' away without crippling defense itself: 'Defenders need to be able to ask AI to fix the bugs in a file, explain why the fix matters, and write tests that confirm the patch works. That is not a guardrail bypass. It is the most valuable thing an AI model can do for defensive security' [4]. Anthropic, for its part, says the government supplied only verbal evidence and that its own review found the vulnerabilities to be minor, previously known, and reproducible with other publicly available models — disputing that 'the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people' [1]. The disagreement is not really about whether the jailbreak exists; it is about whether asking an AI to patch software should ever have been gated in the first place.


