Google Gemini 3.1 Ultra Released: 2 Million Token Context Window, The Era of Native Multimodality is Here

Core Release

Google released Gemini 3.1 Ultra this month—rated by AI Tools Recap as "the most important infrastructure-level release of the month."

Three key features are worth highlighting separately:

2 Million Token Context Window

This is no small number. 2 million tokens roughly equate to 1.5 million English words, or about 1.5 times the length of a 600-page novel. Within this context window, Gemini can:

Read an entire technical manual in one go
Analyze hours of meeting transcripts
Process the entire codebase of a large project

For comparison, OpenAI's GPT-4o has a 128K token context window, and Claude Opus 4 has 200K. Gemini 3.1 Ultra's context window is 10-15 times larger than its competitors'.

Truly Native Multimodality

Gemini 3.1 Ultra's "native multimodality" is not just marketing jargon. It operates directly across text, images, audio, and video, eliminating the need for intermediate transcription layers.

What does this mean? In the past, multimodal models processing video would typically convert video frames into text descriptions first, then analyze them—a process that loses a significant amount of visual and temporal information. Gemini 3.1 Ultra operates directly on raw video frames, preserving complete spatiotemporal data.

Built-in Sandbox Code Execution

Gemini 3.1 Ultra comes with a sandboxed Code Execution tool—the model can write and run code directly within a conversation. This isn't just "recommending a code snippet to you," but rather executing it directly in a secure sandbox and returning the results to you.

For scenarios like data analysis, scientific computing, and visualization, this essentially eliminates the entire workflow of "copy code → open Jupyter → paste → run → check results."

Google's Timeline

This release is not an isolated event. Google is currently in a dense AI release cycle:

May 12: Live Google Android Show, teasing Android 17 and Gemini agentic updates
May 19-20: Google I/O 2026 Conference

The timing of the Gemini 3.1 Ultra release is clearly a warm-up for the I/O conference. It's reasonable to expect more product announcements related to the Gemini ecosystem at I/O.

Competitive Landscape

Google's position in the model race is undergoing subtle shifts:

Dimension	Google Gemini 3.1 Ultra	Anthropic Claude	OpenAI GPT-5.5
Context Window	2M tokens	200K tokens	128K tokens
Native Multimodality	✅ Text/Image/Audio/Video	✅ Text/Image	✅ Text/Image/Audio
Code Execution	✅ Built-in Sandbox	❌ Requires Claude Code	❌ Requires Codex
Open Source Strategy	Partially Open Source	Closed Source	Closed Source

Google's strategy is becoming increasingly clear: building a technological moat using infrastructure advantages (compute, context window, multimodal depth), while maintaining a partially open-source strategy to attract the developer community.

Potential Concerns

A 2 million token context window does not come for free. Inference costs will grow exponentially, especially when processing at full capacity. How Google prices this and balances performance with cost will be key to determining whether this feature can be deployed at scale.

Furthermore, the assumption that "bigger context is always better" itself needs validation. Research shows that when context windows become excessively large, a model's attention allocation can become inefficient—it may "see" all the information but struggle to precisely focus on the most relevant parts.

Core Release

2 Million Token Context Window

Truly Native Multimodality

Built-in Sandbox Code Execution

Google's Timeline

Competitive Landscape

Potential Concerns

Related

Claude Code Supports Artifacts: Coding Agents Finally Start Delivering 'Viewable Live Pages'

Claude Adds Enterprise Managed Authorization for MCP Connectors: Identity Checks Come Before Agents Enter the Enterprise

Claude Platform Supports Workload Identity Federation: Another Step Back for the API Key Era