AI Semiconductor Endgame: When Token Economics Shifts from GPU Compute to HBM Memory

Key Conclusion

The focus of AI infrastructure competition is fundamentally shifting: from GPU compute cores to HBM (High Bandwidth Memory) capacity and bandwidth. This is based on two key signals:

Wuhan 260B RMB storage expansion: YMTC Phase 3 + Wuhan Xinxin expansion, targeting 3D NAND and DRAM, expected to mass produce by end of 2026
Token economics first principles: GPU architecture evolution shows HBM demand per GPU will grow exponentially and won’t stop

Why HBM Is the New Bottleneck

In AI inference and training, GPU compute is no longer the limiting factor. The real bottleneck is the speed of data movement from memory to compute units.

First principles derivation:

Token throughput = HBM capacity × HBM bandwidth / model parameters

Why HBM Demand Won’t Stop

Driver	Explanation	Impact
Model size growth	Frontier model parameters continue growing	Single GPU needs more HBM capacity
Context length expansion	1M token context becoming standard	KV Cache consumes大量 HBM
Multimodal input	Images/video/audio processed simultaneously	Intermediate activations explode
Agent workflows	Multi-round tool calls maintain state	HBM usage accumulates during inference

Investment & Action Recommendations

For chip industry

HBM supply chain is a more certain growth track than GPU chips — all GPU vendors need HBM, but capacity is concentrated in 3 companies

For AI application developers

Consider HBM requirements when choosing models: Bigger isn’t always better if HBM shortage causes swapping
True cost of 1M context: Long context doesn’t just consume more tokens — it needs more HBM for KV Cache

For investors

Storage semiconductor expansion is the “second wave” of AI infrastructure investment — first wave was GPUs, second wave is HBM and storage

Key Conclusion

Why HBM Is the New Bottleneck

Why HBM Demand Won’t Stop

Investment & Action Recommendations

For chip industry

For AI application developers

For investors

相关内容

Four Major AI Agent Breakthroughs in 2026: The Underlying Logic Changed from Copilot to Autopilot

ByteDance Coze 2.5 Launches Agent World: Multi-Agent Collaboration Platform Lets Anyone "Hire" AI Workers

JetBrains Air Released: Multi-Agent Parallel Development Environment Unifying Codex, Claude, and Gemini