Core Thesis
Junyang Lin, former technical lead of the Qwen (Tongyi Qianwen) team, published his first systematic judgment on the direction of large model development after leaving in late March 2026: “The next phase of large models is not about making them think longer, but about making them think for action.”
Lin directly led the technical development of the Qwen3 series, giving him first-hand insight into the evolution of Qwen’s technical roadmap. His judgment is not academic speculation—it is a conclusion drawn from large-scale model training and deployment practice.
Why “Thinking Longer” Is Not the Answer
The mainstream direction of the current large model race is extending reasoning time—from Chain-of-Thought to o1-series structured reasoning to various “long thinking” approaches. But Lin points out a fundamental limitation in this route:
| Dimension | ”Think Longer” Route | ”Think for Action” Route |
|---|---|---|
| Goal | Improve static QA accuracy | Improve dynamic task completion rate |
| Output | Long text reasoning chains | Executable action sequences |
| Feedback | Offline evaluation benchmarks | Real-time environmental feedback |
| Bottleneck | Inference cost grows exponentially | Action efficiency and tool-call precision |
| Ceiling | Limited by training data distribution | Continuously evolves through environmental interaction |
He implies that once a model’s static reasoning ability crosses a certain threshold, the marginal returns of adding more reasoning steps drop sharply. Rather than having a model spend 100 reasoning steps to answer a question it could verify through actual operation in 5 steps, it’s better to train it to act directly.
What This Means for the Qwen Ecosystem
Although Lin has left the company, his decision-making influence on the Qwen team runs deep. This judgment aligns closely with Qwen’s recent technical moves:
- Qwen-Agent framework continues to iterate: The Qwen team has been consistently strengthening agent capabilities rather than pure language model abilities
- Tool-use capability prioritized: Qwen3 series stands out on tool-use benchmarks—this is no accident
- Multimodal interaction enhanced: Improved visual understanding capabilities directly serve the “see→act” closed loop
This route creates differentiated competition with OpenAI’s o-series in the agent application layer: OpenAI bets on long reasoning, Qwen bets on action efficiency.
Industry Landscape Judgment
The proposal of the “thinking for action” paradigm marks an important industry inflection point:
- Evaluation systems will shift: From static benchmarks like SWE-bench and MMLU to dynamic environment interaction evaluations like WebArena and OSWorld
- Model architectures will change: Reasoning engines need native support for action-output formats, not just text-output
- Training data will expand: From pure text corpora to operation logs, tool-call trajectories, and environmental state changes
For developers and enterprise users, this means model selection criteria must shift from “who answers most accurately” to “who performs best.”
Action Recommendations
- Focus on tool-use benchmarks when selecting models: Don’t just look at MMLU/GSM8K—pay attention to BFCL, τ²-Bench, and other tool-call evaluations
- Prioritize testing agent framework integration: Native support for Qwen-Agent, LangChain, OpenClaw, etc. directly impacts deployment efficiency
- Reserve architectural space for agentization: Even if you’re only using models for Q&A today, your system architecture should reserve interfaces for tool-use and action-output capabilities