TECH

DeepSeek V4-Pro permanent 75% price cut

35+

Signals

Strategic Overview

01.
DeepSeek announced on May 23, 2026 that it is making the 75% discount on its flagship V4-Pro API permanent, with input tokens at $0.435 per million and output tokens at $0.87 per million.
02.
The discount was originally a limited-time promotion scheduled to expire on May 31, 2026, but DeepSeek is now locking it in as the standing rate.
03.
Cached input tokens are priced at $0.003625 per million, making cache hits roughly 120x cheaper than non-cached input and the dominant cost lever for production workloads.
04.
On output tokens V4-Pro is now roughly 34.5x cheaper than GPT-5.5, with The Next Web's comparison putting Claude Opus 4.7 at $5/$25 and GPT-5 at $2.50/$10 per million tokens.

The Cache Discount Is the Real Number; the Headline Buries It

Most coverage led with the 75% figure, but the more consequential line in DeepSeek's pricing table is the cached-input rate: $0.003625 per million tokens, against $0.435 for non-cached input and $0.87 for output ^[1]. That is roughly a 120x discount on input bytes DeepSeek has already seen in a recent context window, and it reshapes the unit economics of any application that reuses a system prompt, a long retrieved document, or a stable codebase across many calls.

Practitioners are already running the math out loud. The dominant thread on r/DeepSeek and r/hermesagent is that real bills land far below what the headline price would imply once roughly 95% of input tokens hit the cache - one developer described feeding 6M input tokens plus 80K output tokens for about sixteen cents. The benchmark account Artificial Analysis framed the same point at a higher altitude, putting V4-Pro on the Pareto frontier of intelligence versus cost to run intelligence alongside V4 Flash ^[2].

The practical consequence is that the V4-Pro price sheet rewards a specific style of application - long-lived sessions, repeated prefixes, RAG with stable retrieval - and quietly penalizes one-shot, low-reuse calls where the cache never warms. Builders who treat V4-Pro as a drop-in for a model billed at a flat per-token rate will see the smaller discount; teams that re-architect prompts around persistent prefixes will see the larger one. The structural lesson for the rest of the industry is that the next round of frontier pricing competition will be fought on cache mechanics, not list prices.

Why Now: Huawei's Silicon Comes Online

The timing of the permanence announcement - one month after V4-Pro launched, and one week before the original May 31 promotion was set to expire - points to a supply-side cause rather than a marketing decision. Yahoo Finance's report tied the move directly to hardware, noting that "Pro pricing was expected to fall sharply once Huawei Ascend 950 supernodes are launched in large quantities in the second half of the year" ^[3]. TheTechPortal made the same connection, attributing DeepSeek's "confidence to sustain lower pricing permanently" to the rising availability of the Ascend 950 and 950PR ^[4].

The Council on Foreign Relations adds a layer of state direction to that story. CFR reports that V4's most novel feature is being "optimized for inference on Huawei's Ascend chips rather than Nvidia's, reportedly at Beijing's direction" ^[5]. Read together, the picture is of a coordinated industrial bet: DeepSeek tunes its inference stack to domestic silicon, Huawei ramps that silicon, and the cost base for serving a 1.6T-parameter mixture-of-experts model drops enough to make the discount sustainable rather than promotional.

The constraint worth watching is on Huawei's end. Yahoo flagged that "separate curbs on chipmaking equipment exports have limited Huawei's ability to scale up Ascend production," meaning the permanent price floor depends on a supply chain that is itself rate-limited by export controls ^[3]. If Ascend volume falls short of the second-half ramp DeepSeek is implicitly betting on, V4-Pro inference capacity, not pricing, becomes the next bottleneck.

The 60-Day Window for Western Providers

AIWeekly's read on the news is the bluntest forward-looking call in the coverage set: "OpenAI and Anthropic face accelerating enterprise churn to DeepSeek V4-Pro for cost-sensitive workloads if no competitive pricing response emerges within the next 60-90 days" ^[6]. The cost gap is large enough to make that thesis credible - The Next Web's price comparison shows GPT-5 at $2.50 input / $10 output and Claude Opus 4.7 at $5 / $25 per million, putting V4-Pro's output tokens roughly 11x cheaper than GPT-5 and 28x cheaper than Opus 4.7 ^[2]. SCMP reported that the cost per GPT-5.5 conversation is about 32x DeepSeek V4's ^[7].

The more interesting question is what the Western incumbents do instead of matching on price. AIWeekly's same alert sketches the alternative path: "Anthropic and OpenAI have a window to compete on trust, compliance, and data-residency guarantees rather than price, potentially accelerating enterprise-tier features and SOC 2 / FedRAMP positioning" ^[6]. CFR's framing reinforces this - Horowitz argues V4 is "priced for mass deployment at least four times cheaper than American competitors" and that "the country that deploys AI fastest across its economy and government will shape the character of the AI era" ^[5].

There is also a non-pricing overhang that complicates a clean migration: Anthropic's unresolved public accusation that DeepSeek engaged in "distillation attacks" on Claude responses ^[8], plus CFR's note that U.S. officials have asserted V4 was trained on smuggled Nvidia Blackwell chips ^[5]. For regulated buyers, that IP and provenance overhang is exactly the surface area where Western providers can plausibly hold the line - selling not cheaper tokens, but defensible ones.

DeepSeek vs. Itself: Pro, Flash, and the Collapse of an Internal Tier

The angle that the broader coverage almost completely missed: the permanent cut doesn't just compress DeepSeek's lead over OpenAI and Anthropic, it compresses DeepSeek's lead over its own cheaper sibling. The benchmark account Artificial Analysis explicitly noted that the move puts V4-Pro on the Pareto frontier of intelligence versus cost alongside V4 Flash ^[2]- same frontier, not above it. At launch on April 24, Pro initially cost up to 12x Flash ^[3]; the permanent rate collapses much of that gap.

Community discussion on r/DeepSeek and r/hermesagent flags the same tension from the user side. The recurring observation is that V4 Flash with the xhigh reasoning setting often outperforms V4 Pro max on everyday tasks, particularly coding, while costing significantly less and avoiding Pro's slower latency. Several heavy users describe monthly Flash bills in the $37-40 range, with Pro spend running noticeably higher for ambiguous quality gains. That practitioner view is consistent with The Decoder's read of DeepSeek's broader posture as a "shift strategy: away from the best model and toward the cheapest one that's still good enough" ^[1].

If Flash xhigh meaningfully beats Pro max on the workloads developers actually run, the permanent price cut starts to look as much like cannibalization defense as it does competitive aggression. Pro now has to justify its existence on tasks where the extra parameters, the 1M context window, and the 384K output ceiling pay off - long-document reasoning, large-scale code refactors, planning chains - rather than on a general quality lead ^[9]. That's a narrower moat than the launch pricing implied, and it raises a structural question for DeepSeek's roadmap: is the company quietly converging its own product tiers, and if so, what does a V5 release look like when Flash is already on the frontier?

Historical Context

2023

DeepSeek founded in Hangzhou, China as an AI lab focused on open-weights frontier models.

2025-09-29

Released V3.2-Exp with a roughly 50% price cut, dropping input tokens to under three cents per million and signaling an aggressive pricing trajectory.

2026-04-24

Launched V4 Pro alongside V4 Flash, with Pro initially priced up to 12x Flash due to inference compute constraints.

2026

Chinese rivals Kimi K2.6 and Zhipu GLM-5.1 went the opposite direction and raised prices on their newest flagship models, isolating DeepSeek as the lone Chinese commoditizer.

2026-05-23

Announced the 75% discount on V4-Pro is permanent rather than expiring on May 31, locking the discounted rate in as the standing price.

Power Map

Key Players

Subject

DeepSeek V4-Pro permanent 75% price cut

DeepSeek

Hangzhou-based model developer issuing the permanent price cut and positioning V4-Pro as the cost-effective default for AI agents, prioritizing market share over per-token margin.

Huawei

Supplier of the Ascend 950 and 950PR supernodes whose rising availability gives DeepSeek the inference cost base needed to make the cut permanent.

OpenAI

Most exposed Western incumbent; GPT-5.5 output tokens are about 34x more expensive than V4-Pro, putting cost-sensitive enterprise workloads at churn risk.

Anthropic

Claude Opus 4.7 list pricing of $5/$25 per million tokens places it at the high end of the market; also publicly accused DeepSeek of distillation attacks, an unresolved IP overhang.

Google

Gemini 3.5 Flash at $0.15/$0.60 already competes on price, leaving Google less squeezed than OpenAI or Anthropic but still facing margin compression at the premium tier.

Beijing

Reportedly directed DeepSeek to optimize V4 for inference on Huawei Ascend rather than Nvidia chips, tying the pricing move to a broader industrial-policy push.

Fact Check

10 cited

Source Articles

Top 5

THE SIGNAL.

Analysts

""V4 does contain real engineering achievements...And while DeepSeek V4 is more expensive than its Chinese competitors, it remains cheaper than comparable U.S. models." McGuire also notes the company "appears to still be extraordinarily dependent on U.S. technology" with V4 allegedly trained on smuggled Nvidia Blackwell chips."

Chris McGuire

Senior Fellow for China and Emerging Technologies, Council on Foreign Relations

""V4 is open source...and priced for mass deployment at least four times cheaper than American competitors," arguing that "the country that deploys AI fastest across its economy and government will shape the character of the AI era.""

Michael C. Horowitz

Senior Fellow for Technology and Innovation, Council on Foreign Relations

""The era of high-margin AI tokens may be ending faster than anyone expected" - framing the V4-Pro move as a structural shift rather than a one-off promotion."

The Next Web (analysis)

Technology news outlet

"Reads DeepSeek's move as a deliberate "shift strategy: away from the best model and toward the cheapest one that's still good enough.""

The Decoder (analysis)

AI industry publication

""OpenAI and Anthropic face accelerating enterprise churn to DeepSeek V4-Pro for cost-sensitive workloads if no competitive pricing response emerges within the next 60-90 days," with the alternative being a pivot to trust, compliance, and data-residency differentiation."

AIWeekly (analysis)

AI industry newsletter

The Crowd

"DeepSeek permanently reduced pricing for DeepSeek V4 Pro by 75%! > $0.003625 per million input tokens (with cache) > $0.435 per million input tokens. > $0.87 per million output tokens. Cache is almost free"

@@testingcatalog1835

"DeepSeek has made its temporary 75% price cut on the first-party V4 Pro API permanent, putting V4 Pro on the Pareto frontier of Intelligence Index vs Cost to Run Intelligence Index alongside V4 Flash. @deepseek_ai's first-party V4 Pro API is now $0.435/1M input, $0.87/1M output"

@@ArtificialAnlys755

"DeepSeek just made its V4-Pro API discount permanent. The 75% promo ends on May 31, 2026. After that, the API stays at 1/4 of the original price. Pricing: Cached input $0.0147 → $0.0037 / 1M tokens Uncached input $1.76 → $0.44 / 1M tokens Output $3.53 → $0.88 / 1M tokens"

@@hqmank101

"deepseek v4 pro price will be reduced to the current price permanently accoding to official website"

@u/InsideElk6329671

Broadcast

UNLIMITED FREE Deepseek-V4 PRO AI Coder: THIS IS CRAZY!

JUST $1 For Billions of Tokens?! (DeepSeek v4 Pro)

Why DeepSeek V4 Has Everyone Freaking Out