Unified API lineup

DeepSeek models

Choose the right model for the job. Legacy models are ready today, with V4 reserved as the next-generation multimodal release.

The lineup is designed around clear job-to-model matches: V3.1 for general chat and long-context workflows, R1 for deliberate reasoning, Math-7B for cost-efficient numeric accuracy, Janus-Pro-7B for multimodal generation, and VL2 for OCR and document understanding. All models share the same unified API surface, so you can switch with a single parameter change.

Series overview at a glance:

  • V1 dense stack: 7B and 67B, 4K context, ~2T tokens.
  • V2 MoE + latent attention: 236B total, ~21B active, 128K context.
  • V2-Lite trims to ~16B total and ~2.4B active for smaller clusters.
  • V3 MoE: 671B total, ~37B active, 256 experts, ~14T tokens.
  • R1 keeps the V3 backbone and adds reinforcement learning for reasoning.
  • Tracks cover base/chat, coder, math/prover, and vision (VL2/Janus).

The charts below summarize the series-level data from the research report. These figures are reported and can evolve as official documentation updates, so treat them as directional guidance rather than fixed guarantees.

Base model milestones
Series-level shifts summarized from the comparison report.
  • V1 dense stack: 7B and 67B, 4K context, ~2T tokens.
  • V2 MoE: 236B total, ~21B active, 128K context, ~8T tokens.
  • V3 MoE: 671B total, ~37B active, 256 experts, ~14T tokens.
Specialized lines
Task-focused variants that extend the base family.
  • Coder V1/V2 for code generation and debugging tasks.
  • R1 for reinforcement-learning reasoning and math logic.
  • Math and Prover models for formal reasoning workflows.
  • VL2 and Janus for vision, OCR, and multimodal generation.
  • Lite and 16B MoE for constrained deployments.
Deployment cues
Practical guidance for choosing the right tier.
  • Use the smallest model that clears your quality bar.
  • MoE active parameters matter more than total count.
  • Long context requires memory and latency planning.
  • Keep prompts and eval sets consistent across models.
  • Swap models via one API parameter when needed.
Parameter scale
Total vs active parameters in billions (B). MoE models activate a smaller subset per token.
V1-7B7B total / 7B active
V1-67B67B total / 67B active
V2236B total / 21B active
V3671B total / 37B active
R1671B total / 37B active
Context window growth
Reported maximum context sizes in thousands of tokens (K).
V1
4K
V2
128K
V3
128K
R1
128K
Prover-V2
163K
Training highlights
Recurring themes across the reported series progressions.
  • Data scale rises from ~2T tokens (V1) to ~8T (V2) and ~14T (V3).
  • MoE routing and load balancing improve expert utilization.
  • FP8-friendly training is noted for later-stage efficiency.
  • Long-context stability becomes a first-class optimization target.
Evaluation focus
What teams typically compare before choosing a tier.
  • General reasoning tasks and long-context retrieval.
  • Code generation and repo-scale comprehension.
  • Math accuracy and step-by-step verification.
  • Vision OCR, charts, and document understanding.
Selection guide
Start with the model that matches your workload, then iterate.
  • General chat and long documents: DeepSeek V3.1.
  • Multi-step reasoning and verification: DeepSeek R1.
  • Math tutoring and numeric workflows: Math-7B.
  • Text-to-image and multimodal generation: Janus-Pro-7B.
  • OCR, charts, and document QA: DeepSeek VL2.
What to expect from V4
Research notes summarize a larger MoE model and multimodal roadmap.

Public analysis points to trillion-scale capacity with sparse activation, long-context ambitions, and expanded multimodal capabilities. Exact specifications and pricing remain unconfirmed, so V4 stays in waitlist mode until official launch details are available.

Track V4 status
Series architecture notes
Highlights distilled from the comparison report.

The family's progression is defined by architecture shifts. V1 models used dense Transformer stacks at 7B and 67B with 4K context windows and roughly 2T training tokens. V2 introduced Mixture-of-Experts plus multi-head latent attention, enabling 236B total parameters while activating ~21B per token and stretching context to 128K.

V3 expanded to 671B total parameters with ~37B active per token and a 256-expert MoE layout (six experts activated per token), trained on roughly 14T tokens. R1 built on the V3 backbone but added reinforcement learning for deeper reasoning. The coder line reused the MoE stack in both 16B-lite and 236B variants, while math and prover models specialized for formal reasoning.

Model family map
How the series splits by task focus.
  • General base models: V1, V2, V3 for broad language tasks.
  • Chat alignment: DeepSeek-Chat variants tuned for dialogue.
  • Coder line: code generation, debugging, and long repo tasks.
  • Reasoning line: R1 and distills for multi-step logic.
  • Math & prover: formal reasoning and theorem workflows.
  • Lite MoE: smaller footprints for constrained hardware.

Use this map to decide where to pilot: pick the smallest model that matches your workload, then scale up only when evaluations demand it.

Available now

Production-ready models for text, reasoning, math, and vision.

Open Playground
TextAvailable
DeepSeek V3.1
Fast, general-purpose MoE model with long-context variants and strong coding ability.
Best for: General chat, Code generation, Long documents
Pricing$1.00 / 1M tokens
View details
ReasoningAvailable
DeepSeek R1
Reasoning-first MoE model optimized for multi-step logic, math, and complex planning.
Best for: Multi-step reasoning, Math and proofs, Complex planning
Pricing$1.50 / 1M tokens
View details
MathAvailable
DeepSeek Math-7B
Compact math specialist tuned for high-accuracy numeric reasoning and proofs.
Best for: Math tutoring, Symbolic reasoning, Cost-efficient math
Pricing$1.00 / 1M tokens
View details
MultimodalAvailable
DeepSeek Janus-Pro-7B
Unified multimodal model for image understanding and text-to-image generation.
Best for: Text-to-image, Visual reasoning, Creative generation
Pricing$0.02 / image
View details
MultimodalAvailable
DeepSeek VL2
Vision-language model for OCR, documents, charts, and visual Q&A.
Best for: OCR, Document analysis, Chart interpretation
Pricing$0.02 / image
View details

Coming soon

Next-generation multimodal releases.

Track V4 status
MultimodalComing soon
DeepSeek V4
Next-generation multimodal MoE model. Launch details and pricing are coming soon.
Focus: Next-gen multimodal use cases, High-complexity reasoning, Early access exploration
View details