DeepSeek releases DeepSeek-V2, a 236B MoE model with novel attention architecture

The open-weight model, which activates only 21B parameters per token and claims to cut KV cache by 93.3%, also comes with an OpenAI-compatible API and commercial-use terms.

DeepSeek today introduced DeepSeek-V2, a 236-billion-parameter Mixture-of-Experts language model that activates only 21B parameters per token, positioning it as a direct competitor to models like LLaMA 3 70B and Mixtral 8x22B. The code repository is MIT-licensed, and the model weights are available on Hugging Face under DeepSeek’s model license. The accompanying paper — submitted to arXiv on May 7 — details two architectural innovations: Multi-head Latent Attention, which compresses the key-value cache into a latent vector to reduce KV cache by 93.3%, and the DeepSeekMoE architecture for economical sparse computation.

Pre-trained on 8.1 trillion tokens with a 128K context window, DeepSeek-V2 posts competitive scores on English (MMLU 78.5), Chinese (C-Eval 81.7), and math (GSM8K 79.2) benchmarks. The chat version, fine-tuned with SFT and RL, scores 7.91 on Alignbench, trailing only GPT-4-1106-preview among evaluated models. DeepSeek also offers an OpenAI-compatible API through DeepSeek Platform, and says pay-as-you-go access is available at “an unbeatable price.”

The model series also supports commercial use, according to the model card.

The record

The room reactsas it happened

DeepSeek

The team claims the model achieves 'top-tier performance among open-source models' while saving 42.5% of training costs and boosting throughput 5.76× over its predecessor.

One year later — open only if you can handle spoilers

DeepSeek-V2 indeed kicked off a dramatic price war among Chinese AI providers, with ByteDance, Alibaba, Baidu and Tencent all cutting API prices within weeks. The model’s latent attention mechanism later influenced several open-source projects, though DeepSeek remained a niche player in the West until the release of DeepSeek-V3 in late 2025.

Replay thisPost on X Reddit HN LinkedIn