Just dropped 1M context, then immediately added image mode
DeepSeek’s update pace is honestly unreasonable.
V4 with its 1M context window barely had time to settle in the community before image mode quietly appeared. No press conference, no PR blast — a researcher posted a message on social media, deleted it, and the feature showed up in the app.
Classic DeepSeek.
Not OCR. It actually understood.
The test was simple: upload a photo of Guilin’s Elephant Trunk Hill with zero text on it.
DeepSeek V4 gave the landmark name, described its morphological features, and inferred the geographic location.
This isn’t “there’s text in the image, let me read it for you.” This is genuine visual understanding — it “saw” the scene and matched it against its knowledge base.
Put simply: the last major Chinese LLM without vision support has finally filled this gap.
Why Didn’t It Have This Before?
DeepSeek took a different path from the start.
Tongyi Qianwen, ERNIE, Kimi, Zhipu GLM — these competitors added multimodal input from early on. DeepSeek focused its energy on text reasoning and coding, pushing a pure-text model into the top tier.
That choice was controversial at the time. Many felt that not supporting images in 2025 meant the model was “crippled.” But DeepSeek’s logic might have been: max out text capability first, add vision incrementally.
Looking back, that strategy worked. V4’s text prowess is proven across multiple benchmarks, and image mode removes the last obvious gap.
The Benefits of Incremental Multimodal
DeepSeek didn’t build a multimodal model from scratch — it extended a visual encoder on top of the existing architecture.
Unified experience. No need to switch products or modes — text and images in the same dialog box.
Faster iteration. No need to wait for V5 — existing architecture extends to new capabilities.
Better cost control. Incremental training costs far less than training a multimodal model from zero.
Of course, this incremental approach may have limits — complex visual reasoning tasks might need more iterations to match dedicated multimodal models. But at least, the direction is right.
Still in Gray-Scale
Image mode is currently in gray-scale internal testing. Some users may not yet see the entry point. The official recommendation is to upgrade the app to the latest version.
If you already see the “Image Mode” icon in your app — congratulations, your DeepSeek V4 just unlocked its final piece.