TECH

DeepSeek V4-Pro permanent 75% price cut

35+

Signals

Strategic Overview

01.
DeepSeek made the 75% V4-Pro discount permanent on May 22, 2026, freezing the promotional rate originally set to expire on May 31.
02.
List pricing now sits at $0.435/M input (cache-miss), $0.003625/M (cache-hit), and $0.87/M output — roughly a quarter of the launch price.
03.
Against Claude Opus 4.7 and GPT-5.5 PRO at roughly $30/M output, V4-Pro now lands roughly 28-34x cheaper on output tokens.
04.
V4-Pro is a 1.6-trillion-parameter model optimized for Huawei Ascend 950 chips rather than Nvidia silicon, tying the price floor to domestic Chinese accelerator supply.

Why this cut is permanent: Ascend 950 supply meets long-context engineering

The 75% reduction is not a promo extended one more cycle — it is a permanent reset of the list price, made possible by two things landing at once. First, Huawei's Ascend 950 ramp. Mass production of the 950PR began in April, with Huawei targeting roughly 750,000 units shipped in 2026 and full-scale volume in H2 ^[1]. ByteDance, Tencent and Alibaba placed new orders within days of the V4 launch, confirming that DeepSeek is not the only customer absorbing the supply ^[1]. Second, V4-Pro was engineered around the dominant cost driver of agentic workloads: long-context inference. Greyhound Research's Sanchit Vir Gogia notes the architecture was engineered to cut the cost of long-context inference ^[2], and the cache-hit input rate of $0.003625/M tokens — 1/120th of the cache-miss rate — points to aggressive KV-cache reuse on the 1M-context window DeepSeek shipped a month earlier ^[3]. The April 26 cut of cache-hit pricing across all DeepSeek models to 1/10 of launch was the foreshadowing ^[4]; the permanent V4-Pro cut is the structural follow-through. The point: this is a hardware-plus-architecture price floor, not a marketing flag.

The Western lab business-model problem

The damage is not that V4-Pro is cheaper. It is that V4-Pro is cheaper at frontier-tier quality. Counterpoint Research's Neil Shah argues V4-Pro has effectively closed the performance gap on critical tasks like complex math and reasoning, while aggressively leading the market on openness and inference costs ^[2]. That collapses the historical defense — yes it's cheap, but you need the smart one for hard work — at exactly the moment Anthropic and OpenAI are pricing premium tiers above $30/M output to justify capex-heavy training budgets ^[5]. Implicator.ai's Marcus Schuler is blunter: Western labs cannot match the price without breaking the revenue models their valuations depend on ^[6]. Pricing matching DeepSeek would zero out the margin enterprise contracts assume; not matching means watching cost-sensitive workloads — agentic loops, code generation, RAG over long documents — migrate to a 28-34x cheaper alternative. The likely Western response is a shift away from per-token monetization toward outcome- or seat-based contracts, which is the only frame in which a $30/M output price can survive next to a $0.87/M one.

The procurement question the price cut cannot answer

There is a second layer no spreadsheet resolves. Buyers cannot simply route production traffic through DeepSeek's API on geopolitical grounds. The model runs on Huawei Ascend silicon at a moment when the White House has escalated AI IP-theft accusations against Chinese labs including DeepSeek ^[7], and export-controls on advanced chipmaking equipment remain the lever Washington uses to cap Chinese accelerator output ^[7]. For US-headquartered enterprises, sending sensitive prompts and customer data through a Chinese-hosted endpoint raises Cloud Act, data-residency, and procurement-review questions that no per-token discount eliminates. Schuler's frame captures the bind: enterprise buyers just got a new benchmark and a geopolitical dilemma that no per-token price can resolve ^[6]. Ankura Consulting's Amit Jaju adds the practical pivot — for most enterprises the relevant comparison is not DeepSeek's direct API but the cost of running the open weights themselves ^[2]. That points to the actually-achievable outcome of this price cut for Western buyers: not migration to deepseek.com, but pressure on incumbent labs and a serious look at self-hosted V4 inference on whatever silicon is locally permissible.

What developers are actually doing with it

While the analyst class argues monetization, developers have already moved. The dominant pattern in the community is plugging V4-Pro into the Claude Code harness via OpenRouter or BYOK proxies and running overnight agentic loops that were previously cost-prohibitive at Opus 4.7 rates. Reddit threads on r/DeepSeek are full of receipts — one user reported 6M input plus 80K output tokens billing out at sixteen cents — the kind of number that makes 'just leave the agent running' a viable strategy rather than a luxury. The community's running quality ranking — Opus 4.7 above V4-Pro above Sonnet 4.6 above V4 Flash above Haiku — frames V4-Pro as a credible Sonnet-class daily driver at a Haiku-class price, not a top-shelf Opus replacement for the hardest reasoning. The viral X framing — 5.75M output tokens for the cost of a Starbucks latte, roughly 14,000 pages of generated prose — is the marketing version of the same point: the unit economics of 'let the agent retry, replan, and re-emit' just changed. Bloomberg's Robert Lea reads the broader signal as commoditization rather than a wow moment ^[5], and that is the honest read — V4-Pro is not a leap, it is a reset of what the rest of the curve should cost.

Historical Context

2025-08-21

Published the V3.1 pricing update and ended the long-running off-peak discount schedule at 16:00 UTC.

2026-04-26

Cut the input cache-hit price across all models to 1/10 of launch — foreshadowing the V4-Pro reduction.

2026-04-29

Ascend 950 orders surged after the V4 launch as ByteDance, Tencent and Alibaba placed chip orders within days.

2026-05-22

Announced the 75% V4-Pro discount would become permanent and not roll back on May 31.

2026-05-23

International coverage breaks of the permanent V4-Pro price cut and the resulting pricing-war framing.

Power Map

Key Players

Subject

DeepSeek V4-Pro permanent 75% price cut

DeepSeek

Chinese AI lab pricing V4-Pro inference as a commodity input rather than a premium product, applying structural pressure on incumbents.

Huawei

Supplier of Ascend 950 chips that V4-Pro is optimized for; targeting roughly 750,000 950PR units in 2026 with full-scale shipments in H2.

OpenAI, Anthropic, Google

Western frontier labs whose premium per-token monetization is now structurally challenged at roughly 28-34x the new V4-Pro output rate.

ByteDance, Tencent, Alibaba

Chinese cloud giants placing new Ascend 950 orders within days of V4 launch, validating demand pull onto domestic silicon.

White House / US government

Maintains export controls on advanced chipmaking equipment and has escalated IP-theft accusations against Chinese AI firms including DeepSeek.

Enterprise buyers

Face a new cost benchmark and a procurement decision about routing data through Chinese-hosted inference vs self-hosting open weights.

Fact Check

8 cited

Source Articles

Top 4

THE SIGNAL.

Analysts

"Argues V4-Pro has closed the performance gap on critical tasks like math and reasoning while aggressively leading on openness and inference cost."

Neil Shah

Vice President, Counterpoint Research

"Western labs structurally cannot match the price without breaking the revenue models their valuations depend on."

Marcus Schuler

Editor-in-Chief and Founder, Implicator.ai

"Frames the cut as creating both a new enterprise cost benchmark and a geopolitical procurement dilemma no per-token price can resolve."

Marcus Schuler

Editor-in-Chief and Founder, Implicator.ai

"V4-Pro is purpose-built to drive down the cost of long-context inference."

Sanchit Vir Gogia

Chief Analyst and CEO, Greyhound Research

"For most enterprises the meaningful comparison is not DeepSeek's API but the total cost of running the model themselves."

Amit Jaju

Senior Managing Director, Ankura Consulting

The Crowd

"DeepSeek's pricing is insane. > $0.87 per 1M output tokens > 5.75M output tokens with the price of a Starbucks coffee (~$5) > that's almost 14,000 pages of books"

@@Hesamation3039

"DeepSeek permanently reduced pricing for DeepSeek V4 Pro by 75%! > $0.003625 per million input tokens (with cache) > $0.435 per million input tokens. > $0.87 per million output tokens. Cache is almost free"

@@testingcatalog1842

"DeepSeek has made its temporary 75% price cut on the first-party V4 Pro API permanent, putting V4 Pro on the Pareto frontier of Intelligence Index vs Cost to Run Intelligence Index alongside V4 Flash @deepseek_ai's first-party V4 Pro API is now $0.435/1M input, $0.87/1M output,"

@@ArtificialAnlys786

"deepseek v4 pro price will be reduced to the current price permanently accoding to official website"

@u/InsideElk6329704

Broadcast

DeepSeekV4 + Claude Code = 100X Cheaper

Why DeepSeek V4 Has Everyone Freaking Out

Why DeepSeek V4 Impresses Despite Lack of 'Wow' Factor