Core Takeaway
On May 6, 2026, Meta jointly released the Multipath Reliable Connection (MRC) open network protocol together with five tech giants: AMD, Broadcom, Intel, Microsoft, and NVIDIA. This is a new network protocol specifically designed for large-scale AI training clusters, with the core goal of reducing GPU wait time, minimizing training interruptions caused by network failures, and improving overall training efficiency.
This tweet received 4,485 likes, 488 retweets, and 1,250 bookmarks on the day of publication, with views exceeding 580,000 — triggering unusually high discussion in the AI infrastructure domain.
What Happened
The core positioning of the MRC protocol: making large-scale AI training clusters run faster and more stably, reducing GPU time waste.
Participant Lineup
| Company | Role | Position in AI Infrastructure |
|---|---|---|
| Meta | Initiator | Ultra-large model training demand side (Llama series) |
| AMD | Co-publisher | GPU/CPU computing power supplier |
| Broadcom | Co-publisher | AI network chip custom design |
| Intel | Co-publisher | CPU/network processor supplier |
| Microsoft | Co-publisher | Cloud infrastructure operator (Azure) |
| NVIDIA | Co-publisher | GPU and network solution supplier (InfiniBand) |
The significance of this lineup is that it covers almost the entire chain of AI training infrastructure — from computing chips to network hardware, from cloud operations to model training parties.
What Problem MRC Protocol Solves
The core network challenges faced by large-scale AI training clusters:
Traditional approach problems:
┌─────┐ ┌─────┐ ┌─────┐
│GPU 0│────│GPU 1│────│GPU 2│ ← Single path dependency, any link failure causes training interruption
└─────┘ └─────┘ └─────┘
│ │ │
└──────────┴──────────┘
Single network path
MRC approach improvement:
┌─────┐ ┌─────┐ ┌─────┐
│GPU 0│═══│GPU 1│═══│GPU 2│ ← Multipath reliable connection, automatic failover
└─────┘ └─────┘ └─────┘
│ ╲ │ ╲ │
│ ╲ │ ╲ │
│ ╲ │ ╲ │
└══════╲═┴══════╲═┘
Multipath redundancy + reliable transport
Technical Advantages
| Dimension | Traditional Approach | MRC Protocol |
|---|---|---|
| Network path | Single path, failure means interruption | Multipath redundancy, automatic failover |
| Reliability | Depends on physical link stability | Reliable connection layer, software-level fault tolerance |
| GPU utilization | Network issues cause GPU idle waiting | Reduced GPU wait time |
| Openness | Vendor-proprietary protocol (e.g., InfiniBand) | Open protocol, cross-vendor compatibility |
| Ecosystem support | Lock-in to specific vendor solutions | Six major tech giants jointly support, open standard |
Why It Matters
1. AI Training Bottleneck Shifting from Compute to Network
As model sizes grow (from hundreds of billions to trillions of parameters), the number of GPUs in training clusters increases from hundreds to tens of thousands. As GPU count increases, network communication overhead and failure rates grow exponentially.
A typical trillion-parameter model training task:
- Requires thousands of GPUs working simultaneously
- Parameter synchronization between GPUs consumes significant network bandwidth
- Network failure of any single GPU can cause the entire training task to pause
The MRC protocol directly addresses this pain point, reducing the impact of network failures on training through multipath redundancy and reliable connection layers.
2. Open Protocol vs. Proprietary Protocol Competition
Current AI training cluster network solutions are primarily monopolized by NVIDIA’s InfiniBand. The emergence of MRC as an open protocol means:
- Reduced vendor lock-in risk: Cluster operators can mix network equipment from different vendors
- Lower infrastructure costs: Competition from open protocols may reduce network equipment prices
- Accelerated technical innovation: Multi-vendor participation drives protocol iteration
3. AMD Datacenter AI Business Growth Signal of 80%
On the same day, AMD announced that its datacenter AI business is expected to grow 80%, primarily driven by GPU/CPU orders from cloud and infrastructure operators. AMD specifically noted: market forecasts are now catching up to actual deployment cycles, signaling sustained demand ahead.
This echoes the release of the MRC protocol — the AI infrastructure market is at a turning point from planning to large-scale deployment.
Impact on the Industry
For Model Training Parties
- Higher training stability: Reduced training interruptions and restarts caused by network issues
- Lower GPU idle costs: Less GPU wait time for network, improved training efficiency
- More flexible hardware choices: No longer locked into specific vendors’ network solutions
For Cloud Service Providers
- Infrastructure differentiation competition: Cloud platforms supporting MRC protocol gain training efficiency advantage
- Reduced operations complexity: Multipath redundancy reduces dependency on physical network stability
For Chip Vendors
- New competitive dimension: Competition at the network protocol level will affect GPU/network chip market dynamics
- Open ecosystem opportunities: Smaller vendors can enter the AI infrastructure market by supporting MRC protocol
Landscape Assessment
The release of the MRC protocol is a watershed event in the AI infrastructure domain. It marks:
- Shifting bottleneck awareness in AI training — from “need more GPUs” to “need better networks”
- Open protocols challenging proprietary protocol monopolies — InfiniBand’s moat is being eroded
- Industry giants jointly setting standards — Meta, NVIDIA, AMD, Intel and others’ joint participation shows AI infrastructure standardization is accelerating
For China’s AI industry, there are two reasons to watch MRC protocol development: domestic large model training faces the same cluster network bottleneck issues; and the emergence of open protocols may lower the threshold for domestic vendors to access AI training infrastructure.