Chinese Open-Source Models Tie Claude/GPT on SWE-Bench: Equal Performance at One-Third the Cost

Core Conclusion

“Chinese AI is two years behind” — this claim no longer holds true in May 2026.

The State of AI May 2026 report disclosed a severely underestimated fact: Chinese open-source models like DeepSeek V4 and Kimi K2.6 have matched Claude Opus 4.7 and GPT-5.5 scores on SWE-Bench Pro, while API costs are only one-third of theirs. This is not “close” — it’s “tied.” More importantly, frontier model cyberattack capabilities are doubling every 4 months, but Chinese models are not lagging in catch-up speed.

SWE-Bench Pro Score Comparison

Model	SWE-Bench Pro	API Cost (Relative)	Open Status
Claude Opus 4.7	Baseline	1.0x	Closed
GPT-5.5	Baseline	1.0x	Closed
DeepSeek V4	≈ Baseline	~0.33x	Open Source
Kimi K2.6	≈ Baseline	~0.33x	Open Weights
Gemini 3.1 Pro	Near baseline	0.8x	Closed
Grok 4.3	Slightly lower	0.4x	Closed

Note: SWE-Bench Pro measures AI’s ability to fix issues in real GitHub repositories — currently the most practically valuable coding benchmark.

Why This Catch-Up Matters

1. Cost Advantage Is Structural

Chinese models’ cost advantage is not a temporary price war — it stems from:

Mature MoE architecture: DeepSeek V4 and Kimi K2.6 both use Mixture of Experts, with activated parameters far below total parameters
Domestic compute adaptation: DeepSeek’s deep collaboration with Huawei Ascend reduces inference costs
Engineering optimization: Chinese models generally have better token efficiency than American counterparts

2. Open Source vs Closed Source Paradigm Difference

Dimension	Chinese Open-Source	American Closed-Source
Auditability	Fully auditable	Black box
Local Deployment	Supported	Not supported
Custom Fine-tuning	Free to fine-tune	Restricted
Supply Chain Security	Self-controlled	Dependent on US suppliers
Community Ecosystem	Rapidly growing	Closed

3. Catch-Up Speed Is Accelerating

Frontier model capabilities double every 4 months, and Chinese models’ catch-up speed is not lagging. The leap from DeepSeek V3 to V4 took less than 6 months; Kimi’s iteration from K2.5 to K2.6 was equally rapid.

Landscape Assessment

Impact on American Models

Chinese open-source models’ catch-up is compressing American models’ pricing space. DeepSeek V4 is already the cheapest SOTA model (1/20 the cost of Opus 4.7), and if Kimi K2.6 and other Chinese models join the price war, “high performance + low cost” may become the new label for Chinese models.

Significance for Enterprise Decision-Makers

Scenario	Recommended Solution	Reason
Code fix / Agent programming	DeepSeek V4 / Kimi K2.6	Performance tied, 1/3 cost, local deployable
Creative writing / Multimodal	Claude / GPT	Still has advantage
Sensitive data scenarios	DeepSeek / Kimi local deploy	Data stays domestic
Large-scale API calls	DeepSeek V4	Cost-performance dominates

Actionable Advice

CTOs/Technical decision-makers: Prioritize testing DeepSeek V4 and Kimi K2.6 in coding and Agent scenarios — cost savings could be significant
AI engineers: The fine-tunability of Chinese open-source models means you can deeply optimize for vertical scenarios — something closed-source models cannot do
Investors: Watch for Chinese AI model companies’ global expansion opportunities — “cost-effective SOTA” is a powerful global narrative