The Signal
Barely has Kimi K2.6 announced its June launch, and intelligence about Moonshot AI’s next flagship, Kimi K3, is already leaking. Cross-verified by multiple sources, K3 is in intensive internal testing and is expected to officially launch in Q3 this year.
The core specs are striking: the model exceeds 2.5 trillion parameters in total scale, and internal experiments have already run through context lengths far beyond 1 million tokens.
Key Increments
2.5 Trillion Parameters: Another Leap in MoE Architecture
Kimi K2.6 was already a 1.1 trillion parameter MoE model. K3 pushes the scale past 2.5 trillion. This is not simply “parameter stacking” — under the MoE (Mixture of Experts) architecture, only a subset of experts is activated per inference, keeping actual compute manageable while achieving a qualitative leap in model capacity and knowledge density.
Notably, DeepSeek V4 Flash / Pro has already driven the price of 1M context to extremely low levels, while Kimi K3 chooses to double down on the same dimension. This signals that long context + large-scale MoE has become a consensus technical route among top-tier domestic models.
Million-Level Context: Not a Technical Problem, a Compute Problem
According to internal sources, the main reason K3 restricts 1 million context from public release is not a technical bottleneck, but computing resources.
This sentence carries a lot of information. It implies two things:
- Model capability is ready — in internal test environments, 1M+ context has already been run through with acceptable results.
- Inference cost is the real barrier — million-level context means KV cache memory usage grows linearly, placing extreme demands on GPU cluster VRAM and bandwidth.
This also explains why Moonshot AI, after launching Kimi K2.6, simultaneously ramped up promotion across various relay stations — grinding token for JD gift cards is essentially about expanding use cases and the data flywheel while accumulating operational experience for K3’s compute demands.
K2.6’s Transitional Role
Kimi K2.6’s positioning is clear: it is not the destination, but a bridge to K3.
K2.6’s keywords are “open weights” and “built for agents” — 1.1 trillion parameters, fully open weights, designed specifically for sustained autonomous execution. These features lay the ecosystem groundwork for K3: the developer community can first familiarize themselves with MoE-based Agent workflows using K2.6, then smoothly upgrade when K3 arrives.
But there are user reports of K2.6 showing instability on some basic tasks, with some bluntly stating “it feels unusable until K3 arrives.” This “transition period growing pain” is not uncommon in fast-iteration model release cycles, but it also means Moonshot AI needs to deliver a more convincing answer on K3’s stability.
Industry Impact
Once Kimi K3 launches, it will directly rewrite the competitive landscape of domestic LLMs:
- Long Context Track: Currently, only a handful of domestic models can handle million-level context. If K3 lands stably, it will establish significant advantages in document analysis, codebase understanding, and long video analysis.
- Open Source vs. Closed Source: K2.6 already chose to open its weights, and K3 is highly likely to continue this path. This will further squeeze the living space of closed-source models.
- Agent Ecosystem: Million-level context + MoE architecture means Agents can carry more “memory” and “tools,” executing longer-horizon autonomous tasks.
Actionable Advice
- Agent developers: Build Agent workflows with K2.6 first, monitor actual performance after its June launch, and prepare for the K3 upgrade in Q3.
- Enterprise users: If you need million-level context capability, start evaluating compute solutions now — after K3 launches, compute demand could surge, so plan GPU resources in advance.
- Researchers: The training and inference strategies for a 2.5 trillion MoE are worth deep attention — this could be another key milestone where open-source models approach closed-source performance.