Core Release
Google released Gemini 3.1 Ultra this month—rated by AI Tools Recap as "the most important infrastructure-level release of the month."
Three key features are worth highlighting separately:
2 Million Token Context Window
This is no small number. 2 million tokens roughly equate to 1.5 million English words, or about 1.5 times the length of a 600-page novel. Within this context window, Gemini can:
- Read an entire technical manual in one go
- Analyze hours of meeting transcripts
- Process the entire codebase of a large project
For comparison, OpenAI's GPT-4o has a 128K token context window, and Claude Opus 4 has 200K. Gemini 3.1 Ultra's context window is 10-15 times larger than its competitors'.
Truly Native Multimodality
Gemini 3.1 Ultra's "native multimodality" is not just marketing jargon. It operates directly across text, images, audio, and video, eliminating the need for intermediate transcription layers.
What does this mean? In the past, multimodal models processing video would typically convert video frames into text descriptions first, then analyze them—a process that loses a significant amount of visual and temporal information. Gemini 3.1 Ultra operates directly on raw video frames, preserving complete spatiotemporal data.
Built-in Sandbox Code Execution
Gemini 3.1 Ultra comes with a sandboxed Code Execution tool—the model can write and run code directly within a conversation. This isn't just "recommending a code snippet to you," but rather executing it directly in a secure sandbox and returning the results to you.
For scenarios like data analysis, scientific computing, and visualization, this essentially eliminates the entire workflow of "copy code → open Jupyter → paste → run → check results."
Google's Timeline
This release is not an isolated event. Google is currently in a dense AI release cycle:
- May 12: Live Google Android Show, teasing Android 17 and Gemini agentic updates
- May 19-20: Google I/O 2026 Conference
The timing of the Gemini 3.1 Ultra release is clearly a warm-up for the I/O conference. It's reasonable to expect more product announcements related to the Gemini ecosystem at I/O.
Competitive Landscape
Google's position in the model race is undergoing subtle shifts:
| Dimension | Google Gemini 3.1 Ultra | Anthropic Claude | OpenAI GPT-5.5 |
|---|---|---|---|
| Context Window | 2M tokens | 200K tokens | 128K tokens |
| Native Multimodality | ✅ Text/Image/Audio/Video | ✅ Text/Image | ✅ Text/Image/Audio |
| Code Execution | ✅ Built-in Sandbox | ❌ Requires Claude Code | ❌ Requires Codex |
| Open Source Strategy | Partially Open Source | Closed Source | Closed Source |
Google's strategy is becoming increasingly clear: building a technological moat using infrastructure advantages (compute, context window, multimodal depth), while maintaining a partially open-source strategy to attract the developer community.
Potential Concerns
A 2 million token context window does not come for free. Inference costs will grow exponentially, especially when processing at full capacity. How Google prices this and balances performance with cost will be key to determining whether this feature can be deployed at scale.
Furthermore, the assumption that "bigger context is always better" itself needs validation. Research shows that when context windows become excessively large, a model's attention allocation can become inefficient—it may "see" all the information but struggle to precisely focus on the most relevant parts.