DeepSeek models
Choose the right model for the job. Legacy models are ready today, with V4 reserved as the next-generation multimodal release.
The lineup is designed around clear job-to-model matches: V3.1 for general chat and long-context workflows, R1 for deliberate reasoning, Math-7B for cost-efficient numeric accuracy, Janus-Pro-7B for multimodal generation, and VL2 for OCR and document understanding. All models share the same unified API surface, so you can switch with a single parameter change.
Series overview at a glance:
- V1 dense stack: 7B and 67B, 4K context, ~2T tokens.
- V2 MoE + latent attention: 236B total, ~21B active, 128K context.
- V2-Lite trims to ~16B total and ~2.4B active for smaller clusters.
- V3 MoE: 671B total, ~37B active, 256 experts, ~14T tokens.
- R1 keeps the V3 backbone and adds reinforcement learning for reasoning.
- Tracks cover base/chat, coder, math/prover, and vision (VL2/Janus).
The charts below summarize the series-level data from the research report. These figures are reported and can evolve as official documentation updates, so treat them as directional guidance rather than fixed guarantees.
- V1 dense stack: 7B and 67B, 4K context, ~2T tokens.
- V2 MoE: 236B total, ~21B active, 128K context, ~8T tokens.
- V3 MoE: 671B total, ~37B active, 256 experts, ~14T tokens.
- Coder V1/V2 for code generation and debugging tasks.
- R1 for reinforcement-learning reasoning and math logic.
- Math and Prover models for formal reasoning workflows.
- VL2 and Janus for vision, OCR, and multimodal generation.
- Lite and 16B MoE for constrained deployments.
- Use the smallest model that clears your quality bar.
- MoE active parameters matter more than total count.
- Long context requires memory and latency planning.
- Keep prompts and eval sets consistent across models.
- Swap models via one API parameter when needed.
- Data scale rises from ~2T tokens (V1) to ~8T (V2) and ~14T (V3).
- MoE routing and load balancing improve expert utilization.
- FP8-friendly training is noted for later-stage efficiency.
- Long-context stability becomes a first-class optimization target.
- General reasoning tasks and long-context retrieval.
- Code generation and repo-scale comprehension.
- Math accuracy and step-by-step verification.
- Vision OCR, charts, and document understanding.
- General chat and long documents: DeepSeek V3.1.
- Multi-step reasoning and verification: DeepSeek R1.
- Math tutoring and numeric workflows: Math-7B.
- Text-to-image and multimodal generation: Janus-Pro-7B.
- OCR, charts, and document QA: DeepSeek VL2.
Public analysis points to trillion-scale capacity with sparse activation, long-context ambitions, and expanded multimodal capabilities. Exact specifications and pricing remain unconfirmed, so V4 stays in waitlist mode until official launch details are available.
Track V4 statusThe family's progression is defined by architecture shifts. V1 models used dense Transformer stacks at 7B and 67B with 4K context windows and roughly 2T training tokens. V2 introduced Mixture-of-Experts plus multi-head latent attention, enabling 236B total parameters while activating ~21B per token and stretching context to 128K.
V3 expanded to 671B total parameters with ~37B active per token and a 256-expert MoE layout (six experts activated per token), trained on roughly 14T tokens. R1 built on the V3 backbone but added reinforcement learning for deeper reasoning. The coder line reused the MoE stack in both 16B-lite and 236B variants, while math and prover models specialized for formal reasoning.
- General base models: V1, V2, V3 for broad language tasks.
- Chat alignment: DeepSeek-Chat variants tuned for dialogue.
- Coder line: code generation, debugging, and long repo tasks.
- Reasoning line: R1 and distills for multi-step logic.
- Math & prover: formal reasoning and theorem workflows.
- Lite MoE: smaller footprints for constrained hardware.
Use this map to decide where to pilot: pick the smallest model that matches your workload, then scale up only when evaluations demand it.
Available now
Production-ready models for text, reasoning, math, and vision.
Coming soon
Next-generation multimodal releases.