TECH

Zhipu GLM-5.2 matches Claude Mythos in cybersecurity

36+

Signals

Strategic Overview

01.
China's Zhipu AI (Z.ai) released GLM-5.2 on June 13, 2026 under a permissive open-weight license, freely downloadable and runnable on consumer-grade hardware worldwide, unlike Anthropic's export-controlled Claude Mythos.
02.
In Semgrep's IDOR vulnerability-detection benchmark, GLM-5.2 scored 39% F1 under a minimal prompt and harness, beating Claude Code (37% on Opus 4.6, 28% on Opus 4.8/4.7) at roughly $0.17 per vulnerability found - about one-sixth the cost of frontier models.
03.
The Wall Street Journal reported that security researchers found GLM-5.2 can match the latest US models at finding security bugs, though the comparison is specifically about vulnerability discovery rather than broad cybersecurity capability.
04.
Graphistry's Botsbench evaluation found GLM-5.2 solved 28 of 59 agentic cybersecurity investigations, making it the top open-weight model on that benchmark and tying proprietary closed models.

The Headline Says Mythos. The Footnote Says the Harness Won.

The viral framing - a Chinese open-weight model 'matched Claude Mythos in cybersecurity' - compresses a much narrower result into a geopolitical headline. The benchmark that everyone is citing comes from Semgrep, and Semgrep was unusually careful to fence off what it actually showed: GLM-5.2 beat Claude Code on exactly one vulnerability class, IDOR (insecure direct object reference), when both models were handed the same minimal prompt and the same bare harness ^[1].

The nuance that gets lost is that the model is only one ingredient. Semgrep's own production pipeline - the same underlying frontier models wrapped in endpoint discovery and multimodal scaffolding - scored far higher than any bare-model run, with the GPT 5.5 configuration at 61% F1 and the Opus 4.8 configuration at 53% ^[1]. In other words, the scaffolding around a model contributed more lift than the gap between GLM-5.2 and Claude Code. Semgrep states plainly that 'this is not an apples-to-apples comparison of raw model ability' ^[1]. The honest read is not 'China matched Mythos' but 'a cheap open model is now good enough that the harness, not the model, is the real differentiator in agentic security work.'

By The Numbers

On Semgrep's IDOR detection benchmark, the open-weight GLM-5.2 posted a 39% F1 score under a minimal prompt-and-harness setup, edging out Claude Code running Opus 4.6 at 37% and clearly beating the Opus 4.8 and 4.7 configurations at 28% ^[1]. That bare-model ranking is the basis for the 'beats Claude' claim, and it held while GLM-5.2 cost roughly $0.17 per vulnerability found - about one-sixth the cost of the frontier alternatives ^[1].

The same chart that makes GLM-5.2 look dominant also shows where the ceiling really is. Semgrep's fully scaffolded multimodal pipeline scored 61% F1 on GPT 5.5 and 53% on Opus 4.8 - well above every bare-model result ^[1]. A separate evaluation, Graphistry's Botsbench, found GLM-5.2 solving 28 of 59 agentic cybersecurity investigations, the top score among open-weight models and a tie with closed proprietary ones ^[2]. Read together, the numbers tell a two-part story: GLM-5.2 wins the cost-adjusted bare-model race, but full scaffolding still belongs to the frontier stacks.

Did It Copy Mythos, or Out-Train It?

Because GLM-5.2 lands so close to Claude on a security benchmark, the first accusation is the obvious one - that Zhipu simply distilled the frontier labs, training its model to imitate Claude and GPT-5.5 outputs. One theory in the discourse even attributes the progress to data captured by grey-market API 'transfer stations' ^[1]. But the more technical reading pushes back. Researcher Patrick C. Toulme argues that distillation only solved the cold-start problem in reinforcement learning, not the quality match itself; the real climb, he says, came from RL on the model's own self-generated trajectories ^[3].

That distinction matters for anyone trying to predict what comes next. If GLM-5.2's capability were borrowed wholesale through distillation, it would plateau the moment frontier labs cut off the data. If, instead, the gains come from a reproducible post-training method - and independent analyst Weijin Research credits a 'slime' framework with online policy distillation that compressed RL training into roughly two days ^[4]- then the parity is durable and will keep improving without further copying. The export-control logic assumes the first story. The engineering evidence points at the second.

Follow the Money - and the Weights

The most disruptive fact about GLM-5.2 is not its benchmark line, it is its price tag and its license. At roughly $0.17 per vulnerability and about one-sixth the cost of frontier models, the economics flip from 'nice to have' to 'why are we paying six times more' for any team running security agents at scale ^[1]. Developer sentiment on the cost angle is already loud: the strongest commercial proof point circulating is that Coinbase moved to open-weight Chinese models and cut its internal AI spending by nearly half, and the broader developer community is treating GLM-5.2 as a genuine cost-competitive contender even as skeptics warn it sits a tier below the true frontier and caution about a well-funded hype cycle.

The license is where the geopolitics gets paradoxical. The US restricts exports of Mythos and Fable specifically to deny adversaries autonomous vulnerability-detection capability, but an open-weight release routes around those controls entirely - anyone can download GLM-5.2 and run it on consumer hardware ^[2]. The export regime was built to gate access to capability through a small number of American firms; a freely downloadable Chinese model that matches them on a security task makes that gate irrelevant. That is precisely why the result registered in Washington - and why the prediction-market odds of a Chinese company leading AI by the end of 2026 climbed to around 14% ^[5].

Historical Context

2019

Founded as a commercial spin-out of Tsinghua University's Knowledge Engineering Group by professors Tang Jie and Li Juanzi.

2025-07

Released GLM-4.5 and GLM-4.5 Air and rebranded internationally as Z.ai.

2026-01

Completed a Hong Kong IPO raising roughly HKD 4.35 billion (about $558M), directly funding GLM-5's development.

2026-02-11

Released GLM-5, its frontier large language model, ahead of the GLM-5.1 (April 2026) and GLM-5.2 (June 2026) iterations.

2026-06-13

Released GLM-5.2 as an open-weight model that reportedly matches Claude Mythos on cybersecurity vulnerability-detection benchmarks.

Power Map

Key Players

Subject

Zhipu GLM-5.2 matches Claude Mythos in cybersecurity

Zhipu AI (Z.ai)

Beijing-based developer of GLM-5.2 and a 2019 Tsinghua University spinout. Its open-weight releases narrow the US-China frontier gap and undercut US firms on cost, which is the core of the competitive shock.

Anthropic

Maker of Claude Mythos, positioned as the gold standard for security tasks. Its model is subject to US export controls, and the US government reportedly ordered a halt on exports of a less-capable variant on national-security grounds.

US Government / White House

Treats advanced models like Mythos and Fable as national-security assets and uses export controls to deny adversaries autonomous vulnerability-detection capability. An open-weight release that matches those models directly challenges that strategy.

Semgrep

Security tooling firm that ran the IDOR benchmark. Its goal was to separate raw model ability from the contribution of harness and scaffolding for customers deploying AI agents in security workflows - making its caveats as important as its headline.

360 Security Technology

Chinese provider that unveiled its 'Yitian Tulong' cyber tools (Tulongfeng vulnerability detection and Yitianzhen cyber defense), signaling a broader national push into AI-driven cybersecurity beyond a single model.

Fact Check

5 cited

Source Articles

Top 5

THE SIGNAL.

Analysts

"Argues a Chinese open-weight model now equals available US frontier models: 'We now have a Chinese open-weight model that is as good as the currently available models from OpenAI and Anthropic.' He frames US regulatory constraints as a competitive disadvantage."

David Sacks

White House AI and Crypto Czar

"Believes China is steadily closing the capability gap with US frontier models on security tasks - 'China is steadily closing the gap.'"

Lior Div

CEO, security firm 7AI

"Says GLM-5.2 used distillation from Claude and GPT-5.5, but 'distillation is not how they matched Opus quality. Distillation only fixed the cold start problem in RL.' He argues the real gains came from reinforcement learning on self-generated trajectories."

Patrick C. Toulme (teortaxesTex)

AI researcher and commentator

"Credits GLM-5.2's 'slime' post-training framework with online policy distillation for enabling RL training in roughly two days - 'OPD plus slime is a perfect marriage of algorithm and engineering' - and argues Chinese open-source models may shift from optional to essential under US export restrictions."

Weijin Research

Independent analyst (Substack)

The Crowd

"JUST IN: China's new open-source AI model, GLM-5.2 by Zhipu AI, reportedly matches Anthropic's Claude Mythos in security bug detection. The model outperformed some Claude versions in independent cybersecurity benchmarks and can achieve Mythos-level performance. Unlike Claude Mythos, GLM-5.2 is open-source and significantly cheaper to run."

@@CryptoTweets629

"This is the moment Chinese AI beat American AI. One of the largest public crypto companies in the world just DUMPED OpenAI and Anthropic. Coinbase switched to open-weight Chinese models from Zhipu and DeepSeek, and shaved nearly 50% off the company's internal AI spending."

@@Ric_RTP2343

"China's New Zhipu AI Reportedly Matches Claude Mythos in Vulnerability Detection Zhipu AI's open-weight GLM-5.2 model is reportedly performing on par with Anthropic's restricted Claude Mythos in specific cybersecurity and software vulnerability detection benchmarks."

@@The_Cyber_News73

"GLM 5.2 via Claude Code is the first non-Claude model that feels close to Opus"

@u/nseavia71501285

Broadcast

China's GLM 5.2 vs Claude Fable 5 - The 90-Minute AI Shutdown That Changed Everything

China launches new AI: Experts say it rivals Claude Mythos from Anthropic

GLM 5.2 vs Claude: Which AI Finds Bugs Better?