C
ChaoBro

NVIDIA Rubin 35x Inference Leap: Hyperscaler $600B Capex Shift, Inference Chips Become New Battleground

NVIDIA Rubin 35x Inference Leap: Hyperscaler $600B Capex Shift, Inference Chips Become New Battleground

Core Conclusion

The AI chip market is experiencing a structural shift: from NVIDIA’s monopoly in the training era to multi-chip competition in the inference era. NVIDIA Vera Rubin architecture promises 35x inference throughput improvement, but competitors like AMD, Groq, and Cerebras are eroding share across various market segments. Hyperscalers’ $600B+ annual AI Capex is shifting from “buying GPUs for training” to “buying inference chips for services.”

NVIDIA Rubin: Technical Details of the 35x Leap

NVIDIA disclosed key information about its next-generation inference architecture in late April 2026:

MetricHopper (H200)Blackwell (B200)Vera Rubin (GB300)
Inference throughputBaseline~5x~35x
Power efficiencyBaseline~3x~10x
Memory bandwidth3.35 TB/s8 TB/s12+ TB/s
Shipping2024 Q12025 Q22026 Q3 (ahead of schedule)
Primary scenarioTraining + inferenceTraining-focusedInference-optimized

Key insight: Rubin shipping ahead of schedule indicates NVIDIA is already feeling competitive pressure from AMD and custom ASICs.

Hyperscaler Capex: Where Is $600B Flowing?

According to latest analyst forecasts (April 29, 2026), hyperscaler AI Capex trends:

YearGoogleAmazonMicrosoftMetaTotal
2024$52B$75B$48B$38B~$213B
2025$75B$100B$65B$55B~$295B
2026E$90B+$130B+$80B+$65B+$365B+
Annual (next 4-5 years)$600B+

Structural shift in Capex:

  1. From training to inference: Training was 60% of AI Capex in 2025; inference expected to exceed 50% in 2026
  2. From general to specialized: Custom inference chip (ASIC) procurement increasing
  3. From GPU to diverse: AMD MI series, Groq LPU, Cerebras Wafer-Scale gaining more orders

AMD’s Inference Counterattack

AMD is transforming from “training follower” to “inference leader”:

AMD Halo Box: New Species for Edge Inference

  • Hardware: Ryzen AI MAX+ 395 (16 Zen 5 cores + 40 RDNA 3.5 CU + XDNA 2 NPU)
  • Memory: 128GB unified memory
  • Positioning: Personal/edge AI inference device
  • Shipping: June 2026
  • Price: Estimated $1,500-$2,000

AMD MI Series: Datacenter Inference

  • Hyperscalers confirmed increasing AMD MI350/MI400 procurement
  • MI350 offers better price-performance than NVIDIA H200 for inference
  • AMD datacenter GPU revenue expected to grow 80%+ in 2026

Inference Chip Competitive Landscape

PlayerSolutionAdvantage ScenarioMarket Share Trend
NVIDIAVera Rubin / GB300High-performance inferenceDominant but declining share
AMDMI350 / Halo BoxCost-performance + edgeRapidly rising
GroqLPUUltra-low latency inferenceNiche growth
CerebrasWafer-ScaleLarge model inferenceNiche
GoogleTPU v5p/v6Internal useStable
AmazonTrainium/InferentiaAWS internalGrowing
HuaweiAscend 910CChina marketRapid growth

Investment Logic

Positive Directions

  • AI semiconductor full stack: Not just GPUs, but EDA software, custom ASICs, advanced packaging, optical interconnects, HBM memory
  • Edge inference: AMD Halo Box represents a new track for personal AI inference
  • Inference optimization software: vLLM, TensorRT-LLM will grow with hardware

Risk Factors

  • NVIDIA valuation already prices in most growth expectations
  • Inference chip competition intensifying may lead to price wars
  • Model compression advances may reduce inference hardware demand

Action Recommendations

For technology decision-makers:

  • H2 2026 inference hardware procurement should evaluate multiple vendors, not default to NVIDIA
  • Assess AMD Halo Box feasibility for edge inference scenarios
  • Monitor inference optimization software stack maturity

For investors:

  • AI semiconductors are no longer “just buy NVIDIA”—need full-stack opportunity awareness
  • Edge inference, HBM memory, advanced packaging arecertain growth directions
  • Watch AMD growth delivery in both datacenter and edge markets