Conclusion: The US-China Model Gap Is Being Quantified and Tracked
A key finding in the latest AI model evaluation report from the US National Institute of Standards and Technology (NIST) has drawn industry attention: DeepSeek V4 performance on multiple core benchmarks has reached the level of GPT-5, which was released 8 months ago.
This is not a one-sided conclusion from some evaluation agency, but an independent assessment from an official US technical institution. If the current catching-up trend continues, the report predicts Chinese models could reach GPT-5.5 (approximately Mythos level) by February 2027.
Benchmark Breakdown
NIST report comparison across key dimensions:
| Dimension | DeepSeek V4 | GPT-5 (8 months ago) | Gap |
|---|---|---|---|
| General Reasoning | Close | Baseline | ≈ Parity |
| Code Generation | Close | Baseline | ≈ Parity |
| Mathematical Reasoning | Slightly lower | Baseline | -3 to -5 points |
| Multimodal Understanding | Significantly behind | Baseline | -8 to -10 points |
| Long Context | Close | Baseline | ≈ Parity |
| Chinese Language | Significantly ahead | — | Chinese model advantage |
Key finding: In the two most practical dimensions—general reasoning and code generation—DeepSeek V4 has already caught up to GPT-5. The gap is primarily in multimodal understanding, which is precisely DeepSeek V4 design trade-off (focusing on text reasoning efficiency).
Catching-Up Trend: A Predictable Timeline
The report provides a noteworthy extrapolation:
2025.09 — GPT-5 Release (US baseline)
2026.01 — DeepSeek V4 reaches GPT-5 level (~4 months lag)
2026.09 — GPT-5.5 Release (expected)
2027.02 — Chinese models reach GPT-5.5 level (expected ~5 months lag)
If this trend is accurate, it means:
- Catching-up speed is accelerating: From early model lag of 12-18 months shortened to 4-5 months
- Gap is narrowing but will not disappear: US models maintain a one-iteration-cycle lead
- Huge cost-performance advantage: Chinese models deliver near-parity capability at significantly lower cost
Behind the Technical Path Differences
DeepSeek V4 catching up was not achieved by “throwing compute” at the problem, but through a different technical route:
| Comparison | US Model Path | DeepSeek Path |
|---|---|---|
| Architecture | Dense Transformer | Sparse MoE (Mixture of Experts) |
| Training Strategy | Massive data + post-training | Efficient data selection + reinforcement learning |
| Compute Dependency | 10,000+ GPU clusters | 1,000+ GPUs, efficiency optimization |
| Cost | Hundreds of millions per round | Significantly lower than US peers |
Long-term implications of this path difference:
- DeepSeek MoE architecture activates only partial parameters during inference, lower running costs
- US models dense architecture may learn faster during training but higher inference costs
- If the MoE route proves sustainable for catching up, it could change the underlying logic of global AI competition
Implications for Chinese Developers
- Production deployment window is open: DeepSeek V4 performance in general reasoning and code generation is sufficient for most production scenarios
- Multimodal remains a weakness: Strong multimodal capability requires waiting for next-generation models or combining with dedicated vision models
- Price advantage is significant: Combined with DeepSeek V4 Pro 75% limited-time discount (extended to May 31), this is the optimal deployment window
Implications for US Developers
- Competitive pressure is increasing: If Chinese models deliver near-parity capability at 1/10th the cost, API pricing will face long-term downward pressure
- MoE architecture deserves attention: DeepSeek technical route may represent a more sustainable development direction
- Do not underestimate catching-up speed: The 8-month-ago capability gap has already closed to zero—what will happen in the next 8 months?
Uncertainties
NIST report extrapolation is based on historical trends, but the following factors could change the catching-up rhythm:
- Compute limitations: DeepSeek catching-up may be limited by high-end chip access
- Data quality: Access to high-quality English data may become a bottleneck
- Algorithm breakthroughs: Any architectural innovation from either side could break the current trend
- Geopolitics: Export controls and policy changes could accelerate or delay catching up
The significance of this NIST report lies not only in quantifying the capability gap between US and Chinese models, but more importantly, in confirming a trend: Chinese model catching-up has shifted from “can they catch up” to “how long until they catch up.”