The Three Words That Took Down a Frontier Model
The entire crisis turned on a phrasing trick. Amazon researchers found that when they asked Anthropic's Fable 5 to 'review the code for security issues,' the model refused, behaving exactly as its guardrails intended. But when they re-asked the model to 'fix this code,' it produced working patches for the same vulnerabilities [4]. That swap of three words was enough for the administration to frame the model as a cybersecurity national-security risk and pull it offline. It is a rare case where a deployed frontier system was recalled not because of what it did wrong, but because of a single prompt that exposed how thin the line is between defensive remediation and offensive capability.
Anthropic's rebuttal cut straight at the logic. The company argued it disagreed that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people [1], warning the standard would halt every frontier deployment in the industry. Crucially, Anthropic and outside experts said the same code-review behavior exists in competing models that face no controls at all — OpenAI's GPT-5.5, Anthropic's own Claude Opus and Sonnet, and Chinese models such as Moonshot AI's Kimi 2.7 can all perform similar reviews of code for security flaws [4]. If the capability is universal, the enforcement was not, and that asymmetry is what transformed a security debate into a fairness debate.


