In January 2025, a Chinese AI lab most people had never heard of dropped a model that sent Nvidia’s stock down 17% in a single day — wiping out nearly $600 billion in market cap. That’s not a typo. DeepSeek released DeepSeek-R1, a reasoning model that matched or beat OpenAI’s o1 on several benchmarks, and did it at a fraction of the training cost. The AI world, which had largely assumed American labs were in an uncatchable lead, had to reckon with something uncomfortable: the gap was much smaller than anyone thought.
This isn’t a story about hype. DeepSeek is a real, usable, genuinely capable AI system that you can access today — for free or near-free — and it raises serious questions about the future of the AI race, the value of compute, and whether the “moat” that companies like OpenAI and Anthropic have been building is as deep as assumed. Here’s everything you actually need to know.
What Is DeepSeek and Who Built It?
DeepSeek is an AI research lab founded in 2023 by Liang Wenfeng, who also co-founded the Chinese quantitative hedge fund High-Flyer Capital Management. That background matters: quantitative finance is obsessed with doing more with less, finding signal in noise, and optimizing under constraints. That ethos shows up directly in how DeepSeek builds models.
The lab is based in Hangzhou and operates with a relatively small team by the standards of frontier AI. They’re not a product company first — they publish research, release open weights, and seem genuinely motivated by advancing the science. Whether that’s strategic, idealistic, or both is hard to say from the outside.
Their model lineup, as of early 2026, includes:
- DeepSeek-V3 — A massive mixture-of-experts (MoE) model with 671 billion total parameters, but only 37 billion active at any time. Strong general-purpose performance across coding, reasoning, and language tasks.
- DeepSeek-R1 — Their reasoning-focused model, trained using reinforcement learning to “think” through problems step by step. Comparable to OpenAI o1 on math and coding benchmarks.
- DeepSeek-R1-Zero — A research artifact showing that chain-of-thought reasoning can emerge from pure RL without supervised fine-tuning. Andrej Karpathy called this “fascinating” and noted it as a significant result worth studying.
- DeepSeek-V3-0324 and subsequent updates — Iterative improvements released throughout 2025, continuing to push the performance envelope.
All of these models are open-weight, meaning you can download and run them yourself. That’s a deliberate choice that has made DeepSeek enormously influential beyond just the model quality itself.
The Thing That Actually Shocked Everyone: The Cost
The benchmark numbers were impressive. The cost numbers were what caused the real vertigo.
DeepSeek claimed to have trained DeepSeek-V3 for approximately $5.5 million in compute costs. For context, estimates for training GPT-4 ranged from $50 million to over $100 million. Meta’s Llama 3 training runs were in similar territory. OpenAI CEO Sam Altman had been publicly discussing the need for hundreds of billions of dollars in infrastructure investment — the Stargate project — to stay at the frontier.
DeepSeek appeared to get frontier-level results for 1-2% of that cost. Even if you’re skeptical of their numbers (and some researchers were — they may not include all costs, including hardware acquisition and prior research runs), the efficiency delta is real and significant.
How did they do it? A few key techniques:
- Mixture-of-Experts (MoE) architecture — Instead of activating all 671B parameters for every token, the model routes each token to a small subset of “expert” networks. This massively reduces the compute needed per forward pass.
- Multi-head Latent Attention (MLA) — A novel attention mechanism that compresses the key-value cache, reducing memory requirements significantly during inference.
- FP8 mixed precision training — Training at lower numerical precision to reduce memory bandwidth and speed up computation, without meaningful quality loss.
- Efficient use of constrained hardware — Due to US export controls limiting China’s access to high-end Nvidia H100 chips, DeepSeek had to be creative with older H800 hardware. Constraints, it turns out, can drive innovation.
The broader implication, which shook markets, was this: if you don’t need hundreds of billions in compute to reach the frontier, the entire investment thesis behind Stargate and similar buildouts gets complicated. It doesn’t invalidate them — inference at scale still requires massive infrastructure — but it changes the calculus.
How Does DeepSeek Actually Perform? Real Capabilities and Honest Limits
Let’s be specific about where DeepSeek is strong and where it isn’t.
Where it’s genuinely strong
- Coding: DeepSeek-V3 and R1 are among the best available models for code generation, debugging, and explanation. On HumanEval and similar benchmarks, they compete directly with GPT-4o and Claude 3.5 Sonnet. Developers who’ve used it in production report it handles complex multi-file refactors and architecture questions well.
- Mathematics and logical reasoning: R1 was specifically designed for this. On MATH and AIME benchmarks, it performs at or above o1 levels. If you’re doing quantitative research, financial modeling, or anything that requires rigorous step-by-step reasoning, R1 is a serious option.
- Long-context tasks: V3 supports a 128K token context window, making it viable for processing long documents, large codebases, and extended conversations.
- Cost efficiency: Via the DeepSeek API, pricing is dramatically cheaper than OpenAI or Anthropic equivalents. As of early 2026, input tokens on V3 were priced around $0.27 per million tokens — compare that to GPT-4o at several dollars per million. Pricing changes frequently; check platform.deepseek.com for current rates.
Where it has real limitations
- Censorship on sensitive topics: DeepSeek will refuse to discuss Tiananmen Square, Taiwan’s political status, criticism of the CCP, and a range of other topics that Chinese regulations require suppressing. This is not subtle or ambiguous — it’s a hard constraint baked into the model. For most use cases it doesn’t matter. For journalism, research, or anything touching on Chinese politics, it matters a lot.
- Data privacy concerns: DeepSeek’s privacy policy is written under Chinese law, meaning user data could be subject to government access requests. For enterprise use cases with sensitive data, this is a serious consideration. Running open-weight versions locally mitigates this, but adds infrastructure complexity.
- Reliability and uptime: The hosted API has experienced significant capacity constraints and outages
