Google, Microsoft, xAI grant US government pre-release access to AI models via CAISI
TECH

Google, Microsoft, xAI grant US government pre-release access to AI models via CAISI

36+
Signals

Strategic Overview

  • 01.
    The Center for AI Standards and Innovation (CAISI), housed inside NIST under the Commerce Department, signed agreements with Google DeepMind, Microsoft, and xAI for pre-deployment evaluations of frontier AI models, expanding earlier partnerships with OpenAI and Anthropic so that every major US frontier lab now participates.
  • 02.
    CAISI will study models with reduced or removed safeguards in order to probe unmitigated capabilities relevant to cybersecurity, biosecurity, and chemical weapons risks, and has completed more than 40 evaluations to date including on unreleased state-of-the-art systems.
  • 03.
    The deals coincide with the White House publicly studying an FDA-style executive order that would require frontier AI to be 'proven safe' through pre-release review before deployment.
  • 04.
    OpenAI and Anthropic, which had partnered with the predecessor US AI Safety Institute in 2024, renegotiated their agreements to align with Trump's AI Action Plan and Commerce Secretary Howard Lutnick's reorientation of the body toward national security risks.

Deep Analysis

The Mythos Moment: How One Cyber Benchmark Rewired Washington

The Mythos Moment: How One Cyber Benchmark Rewired Washington
Anthropic's Mythos Preview produced 181 working Firefox 147 exploits in internal benchmarks vs 2 for Opus 4.6 — the capability jump that triggered the federal pre-release oversight push.

The trigger event for this policy shift is hiding in plain sight in an Anthropic red-team blog post. In April 2026, Anthropic previewed Claude Mythos, and the cyber capability jump was not incremental. Anthropic's internal benchmark recorded the model producing working JavaScript exploits against Firefox 147 in 181 of its trials, compared with just 2 for the prior Opus 4.6 generation. Anthropic engineers also reported asking Mythos overnight for remote code execution exploits and finding complete working ones by morning. More than 99% of the vulnerabilities Anthropic identified with Mythos remain unpatched, and the company says it has surfaced thousands of additional high- or critical-severity flaws.

That is the capability profile that landed on policymakers' desks weeks before the CAISI deals were signed. White House National Economic Council Director Kevin Hassett's pitch for an FDA-style executive order on AI lands very differently when read against a model that can produce browser exploits at industrial scale. The Mythos preview did not just spook safety researchers; it gave an administration that had spent two years framing AI rules as an innovation tax a politically usable national security justification to pull frontier models into a federal review queue.

A 30-Person Agency Auditing the Frontier

The mechanics of these deals only make sense once you grasp how thinly resourced CAISI actually is. The agency runs on roughly 30 staff and has received about $30 million in total funding since 2024. Congress added another $10 million earmarked specifically for CAISI expansion in January 2026, on top of $55 million for broader NIST AI research. That is a rounding error against the compute footprints of the labs CAISI is meant to evaluate.

This is why the agreements are structured the way they are. As Georgetown's Jessica Ji puts it, the partnerships exist precisely because CAISI lacks the manpower, technical staff, and compute access that big tech companies have for rigorous testing. The labs supply the model, and crucially they supply versions with safeguards reduced or removed so CAISI can probe the unmitigated capability surface. CAISI brings methodology, classified testing environments, and the federal imprimatur. It is less a regulator inspecting a factory and more a small measurement lab borrowing the factory's tools to do the inspection. That dependency shapes everything that follows: enforcement leverage is limited, but methodological learning compounds with every model passed through the pipeline.

From AISI to CAISI: The Quiet Renaming That Reshaped the Mission

The institutional story behind these deals is a rebrand that changed the substance, not just the letterhead. The Biden-era US AI Safety Institute, which signed its first MOUs with Anthropic and OpenAI in August 2024, was reorganized in mid-2025 by Commerce Secretary Howard Lutnick into the Center for AI Standards and Innovation. The word 'safety' was dropped. The new mandate centers on demonstrable national security risks: cyber, bio, chemical, and adversary AI influence operations.

That reframing is what allowed OpenAI and Anthropic to renegotiate their old AISI agreements without political friction, and it is what made the new Google DeepMind, Microsoft, and xAI deals possible in a Trump administration. By recasting evaluation as national security measurement rather than 'AI safety,' the Commerce Department moved the same underlying activity into a category Republicans broadly support. Lutnick's designation of CAISI as industry's primary government point of contact for AI testing then concentrated a fragmented set of bilateral conversations into one funnel. The labs got a single counterparty; the administration got a single chokepoint where it could observe what frontier models can actually do.

Voluntary Today, Mandatory Tomorrow? The Executive Order Question

Every signature on a CAISI MOU today is voluntary, and the Reddit policy-watcher discourse has split between cautious optimism that the structure will hold and skepticism that anything voluntary can constrain capability releases when commercial pressure mounts. The policy ground is shifting underneath that debate. Hassett's public comments about studying an FDA-style executive order are not abstract: he frames frontier AI as something that should be 'released in the wild after they've been proven safe, just like an FDA drug.' That is a regulatory model with mandatory pre-market review, not a voluntary partnership.

If such an order materializes, the existing CAISI agreements become the de facto template. Google, Microsoft, xAI, OpenAI, and Anthropic have already accepted the operational pattern: hand over an unreleased model, often with safeguards stripped, accept a federal evaluation. Codifying that into binding rule would not require building a new institution, just upgrading the legal status of the one labs are already cooperating with. Humane Intelligence CEO Rumman Chowdhury captured the political whiplash bluntly, calling it 'a 180 for the Trump administration.' The 180 is significant precisely because it suggests the voluntary phase may be a runway, not a destination.

The Contradiction: Safety Partner, Security Risk

The cleanest illustration that CAISI participation is not a free pass came in March, when the Department of Defense designated Anthropic a security risk despite the company's then-active CAISI partnership and its status as one of the original 2024 AISI signatories. Cooperating on pre-release evaluation buys a seat at the measurement table. It does not buy procurement protection or exempt a lab from broader national security determinations made elsewhere in government.

That split signal matters for how every other lab reads these deals. xAI, Microsoft, and Google DeepMind are now inside the same evaluation tent as Anthropic, but the Anthropic precedent shows the tent does not insulate them from adversarial decisions on contracts, export controls, or downstream restrictions. OWASP AI Exchange founder Rob van der Veer's framing is the realistic posture: 'AI models will remain fragile, no matter how much we test them…so yes, test the models. Vet them. Improve them.' Testing is becoming a prerequisite for deployment legitimacy, not a guarantee of government favor. The labs that signed this week did so understanding that distinction, which is itself the most underappreciated tell about how much the political economy of frontier AI has changed.

Historical Context

2024-08
AISI signed initial MOUs with Anthropic and OpenAI granting access to major models pre- and post-release for safety research.
2025-06
AISI was rebranded and restructured as CAISI, dropping the word 'safety' and reorienting toward national security risks under Trump's AI Action Plan.
2026-01
Congress approved $55M for NIST AI research and an additional $10M earmarked specifically for CAISI expansion.
2026-04-07
Anthropic released a limited preview of Claude Mythos showing markedly elevated cybersecurity capabilities, including discovery of working browser exploits and high/critical-severity vulnerabilities at scale.
2026-05-05
CAISI announced new pre-deployment evaluation agreements with Google DeepMind, Microsoft, and xAI, completing the roster of major US frontier labs participating in voluntary government testing.

Power Map

Key Players
Subject

Google, Microsoft, xAI grant US government pre-release access to AI models via CAISI

CA

CAISI (Center for AI Standards and Innovation)

NIST/Commerce body running pre-deployment national security evaluations of frontier AI; designated by Secretary Lutnick as industry's primary government point of contact for AI testing.

GO

Google DeepMind, Microsoft, and xAI

New CAISI signatories granting pre-release access to frontier models, in some cases with safeguards reduced, for federal probing of unexpected behaviors before public deployment.

OP

OpenAI and Anthropic

Earlier 2024 AISI partners that renegotiated their MOUs to align with Trump's AI Action Plan and CAISI's updated national security mandate.

HO

Howard Lutnick (Commerce Secretary)

Reorganized AISI into CAISI in mid-2025 and directs the agency's focus toward demonstrable cyber, bio, and chemical risks plus adversary AI influence.

KE

Kevin Hassett (NEC Director)

Public face of the White House push to formalize pre-release AI safety review through possible executive action modeled on FDA drug approval.

CH

Chris Fall (CAISI Director)

Former Energy Department official and ex-MITRE applied sciences VP now leading CAISI's measurement science program and external industry engagement.

Source Articles

Top 5

THE SIGNAL.

Analysts

"Independent measurement science is essential for understanding the national security implications of frontier AI, and expanded industry collaborations let CAISI scale public-interest work at a critical moment. Quote: 'Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications.'"

Chris Fall
Director, CAISI

"The administration is weighing an FDA-style executive order so that frontier AI is released only after a defined safety process. Quote: 'We're studying possibly an executive order to give a clear road map to everybody about how this is going to go and how future AI that also potentially create vulnerabilities should go through a process so that they're released in the wild after they've been proven safe, just like an FDA drug.'"

Kevin Hassett
Director, White House National Economic Council

"The administration's openness to pre-release AI testing is a sharp reversal of its previously anti-regulation posture. Quote: 'The is a 180 for the Trump administration, that has very explicitly been anti-any sort of regulation.'"

Rumman Chowdhury
CEO, Humane Intelligence

"Industry partnerships could supply CAISI with the manpower, technical staff, and compute it lacks for rigorous testing. Quote: 'the partnerships could make it easier for CAISI to test AI by providing more resources, as they lack the manpower, technical staff, and compute access that big tech companies have for rigorous testing.'"

Jessica Ji
Senior Research Analyst, Georgetown Center for Security and Emerging Technology

"CAISI is the right institutional home for federal-level frontier model evaluation and can advance global safety and security alignment. Quote: 'Today's announcement reinforces CAISI's role as the right institutional home within government for advancing evaluation and measurement science and convening AI companies and stakeholders on a voluntary basis around responsible practices.'"

Aaron Cooper
SVP of Global Policy, Business Software Alliance
The Crowd

"NIST's Center for AI Standards and Innovation (CAISI) signs expanded collaborations with @GoogleDeepMind, @Microsoft, and @xai for pre-deployment evaluations and other research to support frontier AI national security testing. Learn more:"

@@NIST0

"The Commerce Department on Tuesday announced it has signed agreements with Google, Microsoft and xAI to test the companies' frontier artificial intelligence models to understand their "national security implications." @BenBrodyDC and @Dareasmunhoz have the story:"

@@PunchbowlNews0

"Our evaluations benefited from both OpenAI and Anthropic providing us with the in-depth model access necessary to carry out this work. We are excited to continue our collaboration with CAISI and frontier AI companies to measure and strengthen the security of AI systems."

@@AISecurityInst0

"CAISI [Center for AI Standards and Innovation] Signs Agreements Regarding Frontier AI National Security Testing With Google DeepMind, Microsoft and xAI"

@u/Tinac448
Broadcast
Trump Admin Will Test New AI Models From Google, Microsoft And XAI Before Release Under New Deal

Trump Admin Will Test New AI Models From Google, Microsoft And XAI Before Release Under New Deal

White House Ramps Up Frontier AI Testing; Anthropic-Google Cloud Ink $200 Billion Deal Over 5 Years

White House Ramps Up Frontier AI Testing; Anthropic-Google Cloud Ink $200 Billion Deal Over 5 Years

State of Evals: Lessons from U.S CAISI's Evaluations of Cyber Capabilities and Security in AI Models

State of Evals: Lessons from U.S CAISI's Evaluations of Cyber Capabilities and Security in AI Models