The Safety Paradox: Softening Internal Pledges While Fighting the Government Over External Ones
Anthropic's safety positioning in early 2026 presents a striking contradiction that reveals the strategic complexity of operating as a safety-branded AI company at scale. In February, the company quietly removed its previous hard commitment to pause training if model capabilities outpaced safety measures, replacing binding pledges with what Chief Science Officer Jared Kaplan described as non-binding public goals. The rationale — that pausing training 'wouldn't actually help anyone' — represents a significant philosophical shift from the company's founding premise that it would accept commercial disadvantage in exchange for safety assurance. CEO Dario Amodei himself articulated this tension in a widely viewed 60 Minutes interview (913K views on YouTube), where he framed Anthropic's mission as navigating between the genuine existential risks of AI and the competitive necessity of continued development — a message carefully calibrated for mainstream audiences even as the company loosened its internal constraints.
Yet just one day later, Anthropic took the opposite posture toward external actors, refusing to remove two specific safety guardrails — no mass surveillance of U.S. citizens and no lethal autonomous warfare — from Claude's military deployment, even under direct Pentagon pressure with a hard deadline. The Kobeissi Letter's tweet highlighting the Pentagon's ultimatum to Anthropic amplified public awareness of the standoff, framing it as a pivotal moment for corporate resistance to government AI overreach. This created the paradox: internally, Anthropic gave itself more flexibility to push capability boundaries; externally, it drew a line that cost the company its entire federal government business. Dr. Heidy Khlaaf of the AI Now Institute drew a direct comparison to OpenAI's trajectory, calling it 'the very same bait and switch playbook... where safety is a PR tool to gain public trust before profits are prioritised.' Max Tegmark's viral tweet — 'Anthropic 2024: You can trust that we'll keep all our safety promises. Anthropic 2026: Nvm' — captured the growing skepticism among the safety research community. The question this paradox raises is whether Anthropic's safety brand is a genuine constraint on behavior or a strategically deployed asset — one that can be relaxed internally when it slows growth but wielded publicly when it generates sympathetic headlines.


