V4 is here

DeepSeek V4 status hub

DeepSeek V4 is positioned as the next major step in the DeepSeek lineup, with a large-scale Mixture-of-Experts design and a strong focus on reasoning, code, and long-context workloads. The working profile centers on trillion-class capacity with sparse activation, balancing large knowledge coverage with production-ready throughput.

The expected performance focus spans code generation, math accuracy, and structured reasoning, with evaluation typically framed around benchmarks such as MMLU, HumanEval, GSM8K, and MATH. Multimodal capability remains a core goal, with deep image understanding and a roadmap toward richer video workflows. V4 is now live in active rollout, so this page summarizes the target profile, current verification points, and practical adoption steps.

View release notes status Read V4 updates Visit official platform

Scale and MoE routing

Trillion-class capacity with sparse activation keeps inference practical at production scale.

Long-context reasoning

Targets emphasize 100K-class context for large documents, codebases, and multi-step workflows.

Code and math strength

Evaluation focus remains on code generation, math accuracy, and structured reasoning tasks.

Multimodal expansion

Image understanding is expected to deepen with a roadmap toward richer video capabilities.

Model snapshot

The V4 target profile emphasizes scale without runaway inference cost. A large MoE backbone keeps total capacity high while activating a smaller subset of experts per token. That balance is meant to preserve throughput for production use cases such as retrieval-augmented workflows, long-document analysis, and multi-step reasoning.

Looking for model specifications, multimodal support, or Hugging Face releases? This snapshot highlights tracked signals and current operating guidance.

Total parameters

~1T

Mixture-of-Experts capacity target

Active parameters

~320B

Sparse activation per token

Expert layout

1 shared + 256 routed

Top-k routing (k=8) at inference

Context target

100K-class

Designed for long-form reasoning

Capability profile

Relative emphasis across core evaluation areas.

Reasoning depthMath and logic tasks

Code generationHigh precision outputs

Long-context handlingLarge documents

Multimodal readinessImage + video roadmap

EfficiencySparse compute path

Scale vs. activation

Sparse activation keeps inference efficient while retaining massive capacity.

Active parameters~320B

Total capacity~1T

The active slice is intentionally smaller than total capacity, enabling higher throughput without discarding large-scale knowledge coverage.

V4 briefings

Videos load on click to keep the page fast while you browse.

DeepSeek V4 briefing 01 - AIM Network

Revisits the market shock after DeepSeek R1 and frames V4 as a strategic shift: multimodal support, open access, and optimization for domestic chips (Huawei, Cambricon) rather than NVIDIA. It highlights China's fast-moving model race (Qwen, Seed, Moonshot, Zhipu, MiniMax) and argues that efficiency plus local silicon could reshape the global AI balance.

DeepSeek V4 briefing 02 - The Information

Interview-style coverage emphasizing internal benchmark confidence in coding performance, tracing the V3-to-R1-to-V4 arc, and presenting V4 as a direct open-source challenge to closed-model leaders. It also notes the growing global footprint of open-source model usage led by China-based teams.

DeepSeek V4 briefing 03 - Fahd Mirza

Focuses on launch timing narratives and positions V4 as multimodal with deeper domestic chip collaboration. It revisits V3's scale and cost advantage, rejects unreliable benchmark chatter, and frames the release window as strategically meaningful while noting competitive disputes around distillation and hardware access.

DeepSeek V4 briefing 04 - Universe of AI

A technical update sweep: reported 1M-token context, native multimodal support, potential Blackwell compatibility, and DeepGEMM advances (Manifold constraints, FP4 inference). It also references broader competitive signals and the intersection of capability, hardware, and geopolitics.

Shared themes

All four briefings frame DeepSeek V4 as a 2026 inflection point: top-tier capability with lower compute cost, strong multimodal ambition, and a hardware strategy that leans into domestic silicon. A suggested viewing order is Video 1 for macro impact, Video 2 for benchmark positioning, Video 3 for launch timing narratives, and Video 4 for the latest technical update set.

How to prepare today

V4 is already available, and teams can scale usage by validating workloads against the current DeepSeek lineup. Focus on prompt structures, evaluation harnesses, and routing strategies so that the switch to V4 is a controlled migration rather than a fresh integration. Keep your internal benchmarks aligned with code, math, and long-context tasks to make the eventual comparison straightforward.

Benchmark tasks using V3.1, R1, Math-7B, Janus-Pro-7B, and VL2.
Document latency, cost, and quality trade-offs per model.
Prepare evaluation sets for long-context and tool use.

Start with legacy models

Start with legacy models and scale V4 usage with measured routing.

Legacy models are free for the first 30 days.

Switch models instantly inside the Playground.

Open Playground Explore models