Bottom Line First
The four tech giants (Amazon, Google, Meta, Microsoft) are projected to spend $715 billion on AI capex in 2026, with nearly all incremental spend AI-driven.
Meanwhile, the bottleneck for AI compute is shifting from GPUs to HBM (High Bandwidth Memory)—Micron’s CEO admitted on the latest earnings call that 2026 HBM supply is completely sold out, meeting only 50-65% of customer demand.
Data Breakdown
2026 AI Capex by Company
| Company | 2026 Capex Ceiling | YoY Growth | Primary Use |
|---|---|---|---|
| Amazon (AWS) | ~$200B | Accelerating (Bedrock spend reaccelerated to fastest in 15 quarters) | GPU clusters + data centers + power |
| ~$190B | Sustained growth | TPU + GPU + data center infrastructure | |
| Microsoft | ~$190B | Maintaining high levels | Azure AI + OpenAI infrastructure |
| Meta | ~$135B | Significant increase | Llama training + AI ads + metaverse |
| Total | ~$715B | — | — |
Supply Chain Bottleneck Shift
| Phase | Bottleneck | Current Status |
|---|---|---|
| 2023-2024 | GPU capacity (NVIDIA A100/H100) | Massive capacity expansion, easing |
| 2025 | Advanced packaging (CoWoS) | TSMC expanding |
| 2026 | HBM memory | Industry-wide sold out, supply crunch |
HBM Market Landscape
| Supplier | Market Share | 2026 Capacity Status | Notes |
|---|---|---|---|
| SK Hynix | ~50% | Q1 revenue tripled YoY, surpassed 50T KRW for first time | Announced $13B expansion plan |
| Micron | ~25% | Can only meet 50-65% of demand | Multi-year volume and pricing agreements locked |
| Samsung | ~20% | Catching up | HBM3E production ramping |
| Others | ~5% | — | — |
Why AI Is Becoming “Memory-First”
Micron’s CEO delivered a key signal on the earnings call:
“AI is becoming a memory-first industry—because models and agents need longer ‘thinking’ time and more context retention.”
Technical Logic
Token Throughput = HBM Size × HBM Bandwidth
Longer agent thinking → Larger context windows → KV Cache bloat → Exponential HBM demand growth
When models scale from 7B to 70B parameters, and context windows from 8K to 128K, HBM demand grows far beyond linear.
SanDisk’s AI Reversal
SanDisk’s earnings also validate this trend:
- Last year: loss of $0.30 per share; this quarter: $23.41 per share (vs. $14.50 estimate)
- Revenue: $5.95B (vs. $4.70B estimate)
- 5 AI companies signed long-term supply agreements
The storage industry has reversed from losses to windfall profits driven by AI demand.
Landscape Judgment
Short-term Impact (2026)
- HBM supply tightness will persist throughout the year, driving up GPU inference costs
- Model optimization will increasingly focus on memory efficiency: quantization, MoE, KV cache compression
- Domestic alternatives (e.g., CXMT) will receive policy acceleration
Medium-term Trends (2027-2028)
- HBM4 standard release may ease some supply pressure
- CXL memory pooling technology may change memory allocation paradigms
- “Compute-in-memory” chip architectures may become a new competitive dimension
Investment Logic
| Track | Certainty | Upside | Representative Targets |
|---|---|---|---|
| HBM manufacturers | ★★★★★ | ★★★☆☆ | SK Hynix, Micron |
| GPU vendors | ★★★★☆ | ★★★★☆ | NVIDIA, AMD |
| Data center REITs | ★★★★☆ | ★★☆☆☆ | Data center real estate funds |
| Memory optimization software | ★★★☆☆ | ★★★★★ | Quantization/compression toolchains |
Action Recommendations
For AI Application Teams
- Immediately evaluate your model’s memory usage efficiency; prioritize frameworks supporting quantized inference
- Consider MoE architecture models for significantly reduced HBM demand at equivalent performance
- Watch KV cache optimization techniques (PagedAttention, FlashDecoding)
For Hardware Procurement
- HBM supply tightness may last 12-18 months; consider locking in supply contracts early
- Evaluate AMD MI series as an NVIDIA alternative (better price-performance in some scenarios)
For Developers
- Learn model quantization techniques (INT4/INT8) to run larger models on limited hardware
- Watch memory optimization updates in local inference frameworks like llama.cpp and MLX