TECH

DeepSeek V4 Launch

76+

Signals

Strategic Overview

01.
DeepSeek released preview versions of V4-Pro (1.6T parameters, 49B activated) and V4-Flash (284B parameters, 13B activated) on April 24, 2026, both supporting 1M-token context windows and shipped under the MIT license.
02.
V4 introduces a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), cutting single-token inference FLOPs to 27% and KV cache to 10% of V3.2 at 1M context.
03.
The launch was paired with same-day Huawei Ascend supernode optimization and Cambricon support, while Nvidia hardware was conspicuously absent from the rollout.
04.
V4 dropped one day after the White House Office of Science and Technology Policy issued a memo accusing Chinese labs of 'industrial-scale' distillation of US frontier models.

How Hybrid Attention Drained the Cost Out of a Million Tokens

The technical heart of V4 is a new hybrid attention design that pairs Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA). In conventional transformer attention, each new token attends to the full prior context, which is why long-context inference has historically been quadratic in pain — both in compute (FLOPs) and in the KV cache (the memory of prior tokens that the GPU must hold). DeepSeek's claim is that at a 1M-token context, V4-Pro needs only 27% of the per-token FLOPs and just 10% of the KV cache that V3.2 required. That is not a tuning win; it is a structural rewrite of where attention spends its budget.

The knock-on effect is that 'long context' stops being a premium tier. DeepSeek made 1M tokens the default across all of its official services on launch day, rather than a paid add-on you have to opt into. For builders, this changes the unit economics of agentic workloads — codebase-wide refactors, multi-document legal review, long conversation memory — that previously made sense only as research demos. The architecture, in other words, is what makes the headline price possible. CSA + HCA is the mechanism; the $0.28-per-million-output-tokens sticker on V4-Flash is the receipt.

The Order-of-Magnitude Price Cut That Reframes the Market

V4-Flash is listed at $0.28 per million output tokens and V4-Pro at $3.48 — against roughly $30 for OpenAI's comparable tier and $25 for Anthropic's. That is not a discount; it is an order-of-magnitude undercut, and it lands in a market where API pricing has been the industry's quiet pricing power. Simon Willison flagged the cost — not the benchmark numbers — as the truly notable element of the release, noting that V4-Pro is likely the largest open-weights model ever shipped while still being 'a very, very inexpensive model.' The Reddit reaction crystallized the same point bluntly: V4 prices at roughly a third of Opus 4.7 and a quarter of GPT-5.5, and the dominant community read is that DeepSeek 'keeps US companies honest.'

The second-order question is who actually pays for that gap. DeepSeek absorbs compute that has been re-routed onto domestic Huawei Ascend silicon, sidestepping the Nvidia margin stack that anchors US inference economics. It also ships the weights under MIT, which means high-volume customers can self-host and bypass the API entirely — pulling the floor price toward the cost of whatever GPU pool a buyer already owns. For US labs that have spent the past year selling premium 'frontier-tier' API access, the strategic problem isn't matching V4's quality. It's defending a price point that an open-weights model with a credible non-Nvidia inference path has just publicly stripped out.

Day-Zero on Huawei: Jensen's Warning, Now in Production

Months before V4, Nvidia CEO Jensen Huang publicly framed exactly this scenario as a strategic defeat: 'the day that DeepSeek comes out on Huawei first, that is a horrible outcome' for the United States. April 24 is that day. Huawei announced 'day zero' adaptation of its Ascend supernodes for V4 the same morning the model launched, and Cambricon was named alongside Huawei as a second domestic optimization target. Nvidia, conversely, was conspicuously absent from the launch materials.

For China, this is the substantive payoff of three years of hardware self-reliance work: a frontier-tier model whose flagship inference path does not depend on US-controlled GPUs or US export-licensed silicon. For Nvidia, the issue is narrower than market share — China is not the company's largest revenue line — but it is symbolically corrosive. The narrative that frontier AI requires Nvidia to run is the lever Nvidia uses to defend valuation, and a 1.6T-parameter MoE model with 1M context running on Ascend supernodes is precisely the counter-example the company spent the past year arguing was impossible. The early architectural read on X — circulating the term 'deCUDAzation' — captures the framing that has now stuck.

The 24-Hour Window That Wasn't a Coincidence

The White House Office of Science and Technology Policy released its 'industrial-scale distillation' memo on April 23 — one day before V4 went live. Director Michael Kratsios's framing was specific: foreign actors using surreptitious distillation to release products that 'appear to perform comparably on select benchmarks at a fraction of the cost.' That is a remarkably exact pre-rebuttal of the V4 pitch deck. Anthropic supplied the supporting case, alleging that DeepSeek, Moonshot, and MiniMax generated 'over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts.' Beijing's foreign ministry called the accusations 'groundless.'

The useful read here is not who is right on distillation — that argument will run for years — but what the timing tells us about how the US now treats Chinese model releases. The OSTP memo is a pre-positioning document: it gives US enterprise buyers, cloud platforms, and federal procurement officers a ready-made reason to keep V4 out of regulated stacks regardless of price or benchmarks. Expect the playbook for every subsequent Chinese frontier release to look like this — a national-security framing landed 24 hours ahead of the technical launch, designed to convert a pricing story into a compliance story before the first enterprise pilot is signed.

What the Skeptics Are Catching That the Headline Misses

Inside the developer community the reception is more nuanced than the price-collapse headline suggests. Hands-on reviewers have framed V4 less as a capability leap and more as a compute-economics event tied to Ascend optimization, with one widely-watched walkthrough explicitly characterizing it as part of an intensifying 'compute war' rather than a pure model-quality release. Independent testing has also surfaced specific weaknesses: V4 looks strong on reasoning and knowledge but uneven on creativity, and at least one comparison flagged it as underperforming Kimi K2.6 and GLM 5.1 on certain agentic workflows. DeepSeek's own positioning concedes the ceiling, describing V4-Pro as still trailing GPT-5.4 and Gemini 3.1 Pro by roughly three to six months of development.

The sharper contrarian thread came out of r/LocalLLaMA, where a viral test asked V4 a tire-switching riddle and the resulting debate centered on whether the model was reasoning through the problem or pattern-matching against training data it had clearly seen. That is the real epistemic question hanging over the release: the OSTP memo argues V4 is cheap because it was distilled; community testers are independently asking whether some of its strongest behaviors are recall rather than reasoning. Both critiques converge on the same uncomfortable place — V4 may be a triumph of inference-cost engineering more than of model-capability progress, which would make it disruptive to pricing without necessarily moving the frontier.

Historical Context

2025-01-20

DeepSeek released R1, an MIT-licensed reasoning model, triggering a 'Sputnik moment' for US AI markets.

2025-01

DeepSeek-R1's launch wiped roughly $600 billion from Nvidia's market capitalization in a single day, with R1 reportedly costing under $6M to train.

2026-02-12

OpenAI sent a memorandum to the US House Select Committee on China alleging DeepSeek used distillation to copy frontier-lab capabilities.

2026-02-23

Anthropic published a report alleging DeepSeek, Moonshot, and MiniMax ran industrial-scale distillation campaigns targeting Claude.

2026-04-23

OSTP issued a memo accusing China of deliberate, industrial-scale distillation of US frontier AI models — one day before the V4 launch.

2026-04-24

DeepSeek launched V4-Pro and V4-Flash with same-day Huawei Ascend supernode adaptation, making 1M-token context the default across DeepSeek's official services.

Power Map

Key Players

Subject

DeepSeek V4 Launch

DeepSeek

The Chinese AI lab behind V4-Pro and V4-Flash, using open-weight MIT licensing and aggressive pricing to position itself as a low-cost frontier alternative to US labs.

Huawei

Hardware partner whose Ascend AI processors and Supernode clusters achieved 'day zero' adaptation for V4, providing the Nvidia-free inference substrate that lets DeepSeek scale inside China.

Cambricon

A second domestic Chinese chipmaker engaged to optimize V4 alongside Huawei, broadening the non-Nvidia compute base for Chinese frontier models.

White House Office of Science and Technology Policy

Issued the pre-launch memo framing Chinese AI gains as the product of unauthorized distillation, raising the political stakes around any V4 adoption inside the US.

OpenAI and Anthropic

US frontier labs that have publicly accused DeepSeek and peers of distilling their models, providing the substantive case underlying the OSTP memo.

Nvidia

The dominant AI chipmaker absent from the V4 launch; CEO Jensen Huang previously warned that a Huawei-first DeepSeek release would be a strategic loss for the United States.

Source Articles

Top 5

THE SIGNAL.

Analysts

"Calls V4-Pro likely the largest open-weights model ever released and argues the truly notable element of the launch is not capability but cost: 'DeepSeek V4 is a very, very inexpensive model.'"

Simon Willison

Independent AI researcher and blogger

"Frames Chinese model competitiveness as the result of 'surreptitious, unauthorized distillation campaigns' that let foreign actors release products that 'appear to perform comparably on select benchmarks at a fraction of the cost.'"

Michael Kratsios

Director, White House Office of Science and Technology Policy

"Warned ahead of the launch that 'the day that DeepSeek comes out on Huawei first, that is a horrible outcome' for the United States — a scenario V4 has now made real."

Jensen Huang

CEO, Nvidia

"Says it documented industrial-scale distillation against Claude, alleging Chinese labs 'generated over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts, in violation of our terms of service and regional access restrictions.'"

Anthropic

AI lab

"Dismissed US accusations as 'groundless' and as 'deliberate attacks on China's development and progress in the AI industry.'"

Guo Jiakun

Spokesperson, Chinese Foreign Ministry

The Crowd

"DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. DeepSeek-V4-Flash: 284B total / 13B active params."

@@deepseek_ai0

"deepseek-v4-flash is now available on Ollama's cloud! Hosted in the US. Try it with Claude Code: ollama launch claude --model deepseek-v4-flash:cloud."

@@ollama0

"DeepSeek V4 will be released in late April according to Chinese sources: > 1T param, MoE with ~37B active > 1M context window > used mHC and DSA sparse attention > Engram memory architecture > runs on Huawei Ascend chips as part of Chinese 'deCUDAzation'"

@@Hesamation0

"Deepseek v4 people"

@u/markeus1011836

Broadcast

DeepSeek V4 Is HERE – Testing the LARGEST Open Source Model Ever!

GPT 5.5 Arrives, DeepSeek V4 Drops, and the Compute War Intensifies

Deepseek v4: Best Opensource Model Ever? (Fully Tested)