Qwen Former Tech Lead Lin Junyang: The Next Phase of LLMs is "Thinking for Action"

Core Thesis

Junyang Lin, former technical lead of the Qwen (Tongyi Qianwen) team, published his first systematic judgment on the direction of large model development after leaving in late March 2026: “The next phase of large models is not about making them think longer, but about making them think for action.”

Lin directly led the technical development of the Qwen3 series, giving him first-hand insight into the evolution of Qwen’s technical roadmap. His judgment is not academic speculation—it is a conclusion drawn from large-scale model training and deployment practice.

Why “Thinking Longer” Is Not the Answer

The mainstream direction of the current large model race is extending reasoning time—from Chain-of-Thought to o1-series structured reasoning to various “long thinking” approaches. But Lin points out a fundamental limitation in this route:

Dimension	”Think Longer” Route	”Think for Action” Route
Goal	Improve static QA accuracy	Improve dynamic task completion rate
Output	Long text reasoning chains	Executable action sequences
Feedback	Offline evaluation benchmarks	Real-time environmental feedback
Bottleneck	Inference cost grows exponentially	Action efficiency and tool-call precision
Ceiling	Limited by training data distribution	Continuously evolves through environmental interaction

He implies that once a model’s static reasoning ability crosses a certain threshold, the marginal returns of adding more reasoning steps drop sharply. Rather than having a model spend 100 reasoning steps to answer a question it could verify through actual operation in 5 steps, it’s better to train it to act directly.

What This Means for the Qwen Ecosystem

Although Lin has left the company, his decision-making influence on the Qwen team runs deep. This judgment aligns closely with Qwen’s recent technical moves:

Qwen-Agent framework continues to iterate: The Qwen team has been consistently strengthening agent capabilities rather than pure language model abilities
Tool-use capability prioritized: Qwen3 series stands out on tool-use benchmarks—this is no accident
Multimodal interaction enhanced: Improved visual understanding capabilities directly serve the “see→act” closed loop

This route creates differentiated competition with OpenAI’s o-series in the agent application layer: OpenAI bets on long reasoning, Qwen bets on action efficiency.

Industry Landscape Judgment

The proposal of the “thinking for action” paradigm marks an important industry inflection point:

Evaluation systems will shift: From static benchmarks like SWE-bench and MMLU to dynamic environment interaction evaluations like WebArena and OSWorld
Model architectures will change: Reasoning engines need native support for action-output formats, not just text-output
Training data will expand: From pure text corpora to operation logs, tool-call trajectories, and environmental state changes

For developers and enterprise users, this means model selection criteria must shift from “who answers most accurately” to “who performs best.”

Action Recommendations

Focus on tool-use benchmarks when selecting models: Don’t just look at MMLU/GSM8K—pay attention to BFCL, τ²-Bench, and other tool-call evaluations
Prioritize testing agent framework integration: Native support for Qwen-Agent, LangChain, OpenClaw, etc. directly impacts deployment efficiency
Reserve architectural space for agentization: Even if you’re only using models for Q&A today, your system architecture should reserve interfaces for tool-use and action-output capabilities

Core Thesis

Why “Thinking Longer” Is Not the Answer

What This Means for the Qwen Ecosystem

Industry Landscape Judgment

Action Recommendations

相关内容

GPT-6 Enters Safety Alignment Phase: 5-6 Trillion Parameters, Math Reasoning 92.5%, Code Pass Rate 96.8%

MiniMax M3 Launching This Month: Targeting Office Scenarios with Major Agentic Capability Upgrades

GLM-5.1 Lands on 0G Private Computer: What Running a 754B MoE Model Inside a TEE Means