C
ChaoBro

Chinese Open-Source Models Tie Claude/GPT on SWE-Bench: Equal Performance at One-Third the Cost

Chinese Open-Source Models Tie Claude/GPT on SWE-Bench: Equal Performance at One-Third the Cost

Core Conclusion

“Chinese AI is two years behind” — this claim no longer holds true in May 2026.

The State of AI May 2026 report disclosed a severely underestimated fact: Chinese open-source models like DeepSeek V4 and Kimi K2.6 have matched Claude Opus 4.7 and GPT-5.5 scores on SWE-Bench Pro, while API costs are only one-third of theirs. This is not “close” — it’s “tied.” More importantly, frontier model cyberattack capabilities are doubling every 4 months, but Chinese models are not lagging in catch-up speed.

SWE-Bench Pro Score Comparison

ModelSWE-Bench ProAPI Cost (Relative)Open Status
Claude Opus 4.7Baseline1.0xClosed
GPT-5.5Baseline1.0xClosed
DeepSeek V4≈ Baseline~0.33xOpen Source
Kimi K2.6≈ Baseline~0.33xOpen Weights
Gemini 3.1 ProNear baseline0.8xClosed
Grok 4.3Slightly lower0.4xClosed

Note: SWE-Bench Pro measures AI’s ability to fix issues in real GitHub repositories — currently the most practically valuable coding benchmark.

Why This Catch-Up Matters

1. Cost Advantage Is Structural

Chinese models’ cost advantage is not a temporary price war — it stems from:

  • Mature MoE architecture: DeepSeek V4 and Kimi K2.6 both use Mixture of Experts, with activated parameters far below total parameters
  • Domestic compute adaptation: DeepSeek’s deep collaboration with Huawei Ascend reduces inference costs
  • Engineering optimization: Chinese models generally have better token efficiency than American counterparts

2. Open Source vs Closed Source Paradigm Difference

DimensionChinese Open-SourceAmerican Closed-Source
AuditabilityFully auditableBlack box
Local DeploymentSupportedNot supported
Custom Fine-tuningFree to fine-tuneRestricted
Supply Chain SecuritySelf-controlledDependent on US suppliers
Community EcosystemRapidly growingClosed

3. Catch-Up Speed Is Accelerating

Frontier model capabilities double every 4 months, and Chinese models’ catch-up speed is not lagging. The leap from DeepSeek V3 to V4 took less than 6 months; Kimi’s iteration from K2.5 to K2.6 was equally rapid.

Landscape Assessment

Impact on American Models

Chinese open-source models’ catch-up is compressing American models’ pricing space. DeepSeek V4 is already the cheapest SOTA model (1/20 the cost of Opus 4.7), and if Kimi K2.6 and other Chinese models join the price war, “high performance + low cost” may become the new label for Chinese models.

Significance for Enterprise Decision-Makers

ScenarioRecommended SolutionReason
Code fix / Agent programmingDeepSeek V4 / Kimi K2.6Performance tied, 1/3 cost, local deployable
Creative writing / MultimodalClaude / GPTStill has advantage
Sensitive data scenariosDeepSeek / Kimi local deployData stays domestic
Large-scale API callsDeepSeek V4Cost-performance dominates

Actionable Advice

  • CTOs/Technical decision-makers: Prioritize testing DeepSeek V4 and Kimi K2.6 in coding and Agent scenarios — cost savings could be significant
  • AI engineers: The fine-tunability of Chinese open-source models means you can deeply optimize for vertical scenarios — something closed-source models cannot do
  • Investors: Watch for Chinese AI model companies’ global expansion opportunities — “cost-effective SOTA” is a powerful global narrative