Qwen3.5 from Alibaba: Next-Gen Open-Source Multimodal LLM with Native Agent Capabilities

Date: February 16, 2026
Category: Artificial Intelligence, Open Source LLMs, Multimodal AI
Reading Time: 8 Minutes

qwen 3.5 open source release: Next-Gen Open-Source Multimodal LLM with Native Agent Capabilities

The open-source AI landscape witnessed a pivotal moment today. On February 16, 2026, the Qwen Team at Alibaba Cloud officially released Qwen3.5, comprising two new models: Qwen3.5-Plus and Qwen3.5-397B-A17B (the first open-weight version in the Qwen 3.5 series).

Qwen3.5-Plus is positioned as the latest large language model in the Qwen3.5 series, while Qwen3.5-397B-A17B is positioned as the flagship large language model in the open-source Qwen 3.5 series. Both models support text and multimodal tasks.

Moving beyond the industry's obsession with raw parameter scaling, Qwen3.5 represents a "humble leap" forward. It prioritizes architectural efficiency, native multimodal understanding, and massive-scale reinforcement learning to deliver a model that is accessible yet performs at the frontier level.

In this technical review, we explore the specifications, architecture, benchmark performance, and deployment strategies for Qwen3.5, helping developers understand why this model is a significant upgrade for AI agents.

Qwen3.5-397B-A17B – Key Highlights

qwen 3.5-397b-a17b key highlights: Native multimodal, built & trained for real-world agents

🚀 First open-weight model of the Qwen3.5 series – now released!
🖼️ Native multimodal, built & trained for real-world agents
✨ Hybrid linear attention + sparse MoE + massive RL scaling
⚡ 8.6–19.0× faster decoding than Qwen3-Max
🌍 Supports 201 languages & dialects
📜 Apache 2.0 licensed – fully open

Architecture: Efficiency Through Innovation

The defining characteristic of Qwen3.5 is its ability to do more with less. While the headline parameter count is 397 billion, the model utilizes a sophisticated Sparse Mixture-of-Experts (MoE) architecture that activates only 17 billion parameters per forward pass.

This design allows Qwen3.5 to maintain the vast knowledge base of a 400B+ model while running with the inference latency and cost profile of a much smaller model.

Key Technical Specifications

According to the official technical report, the Qwen3.5-397B-A17B features:

  • Total Parameters: 397B (17B Activated)
  • Architecture: Hybrid Gated DeltaNet (Linear Attention) + MoE
  • Layer Structure: 60 Layers. Layout: 15 blocks of [3× (Gated DeltaNet → MoE) → 1× (Gated Attention → MoE)]
  • Context Window: 262,144 tokens (native), extensible to 1,010,000 tokens
  • Vocabulary Size: 248,320 (Expanded for multilingual efficiency)
  • Hidden Dimension: 4096

The "Gated DeltaNet" Advantage

By integrating Gated Delta Networks, a form of linear attention, Qwen3.5 significantly optimizes memory usage. In standard 32k context scenarios, decoding throughput is 8.6x higher than the previous Qwen3-Max. For ultra-long context tasks (256k), throughput improves by up to 19x, with a reported 60% reduction in deployment VRAM usage.

Native Multimodal Capabilities

Unlike previous generations that stitched vision encoders onto text models, Qwen3.5 is natively multimodal. It was trained from scratch on a massive dataset of interleaved text, image, and video tokens.

This "early fusion" approach allows Qwen3.5 to "see" the world more like a human does.

  • Video Understanding: With a 1M token context, the model can process and analyze up to 2 hours of continuous video.
  • Visual Coding: It can interpret hand-drawn UI sketches and generate functional frontend code directly.
  • Spatial Reasoning: The model demonstrates improved performance in robotics planning and spatial analysis tasks.

Benchmark Performance

qwen 3.5 benchmark performance: Qwen3.5-397B-A17B demonstrates outstanding results across a full range of benchmark evaluations, including reasoning, coding, agent capabilities, and multimodal understanding

The Qwen Team has released extensive comparison data. Qwen3.5 consistently achieves parity with, and often surpasses, proprietary frontier models.

Language & Reasoning

In pure logic and knowledge tasks, the efficient 17B activated parameters hold their own against massive dense models.

BenchmarkQwen3.5 (397B-A17B)GPT-5.2Claude 4.5 OpusGemini-3 Pro
MMLU-Pro (Knowledge)87.887.489.589.8
GPQA (STEM)88.492.487.091.9
IFBench (Instruction)76.575.458.070.4
LiveCodeBench v683.687.784.890.7

Vision-Language & Agents

This is where Qwen3.5 truly shines, particularly in agentic workflows and visual reasoning.

BenchmarkQwen3.5 (397B-A17B)Qwen3-VLGemini-3 Pro
MathVision88.674.686.6
RealWorldQA83.981.383.3
OmniDocBench 1.590.888.588.5
BFCL-V4 (General Agent)72.967.772.5

Note: Benchmark data sourced from the official Qwen3.5 release blog, February 2026.

Reinforcement Learning & Agents

Qwen3.5 has been fine-tuned using a scalable asynchronous Reinforcement Learning (RL) framework. The model was trained across millions of agent environments, learning to plan, use tools, and correct its own errors.

This makes Qwen3.5 highly effective for:

  1. Computer Control: Automating tasks across desktop and mobile operating systems (OSWorld).
  2. Web Research: Autonomously browsing, filtering, and summarizing complex topics.
  3. "Vibe Coding": Working seamlessly with IDE agents like Qwen Code to iterate on software projects using natural language.

Global Accessibility: 201 Languages

In a push for inclusivity, Qwen3.5 supports 201 languages and dialects. The vocabulary expansion to ~250k tokens improves encoding efficiency for low-resource languages by 10–60%, making it a truly global foundation model. Qwen 3.5-Plus performs strongly in core benchmarks such as inference, programming, and agent performance, with significantly reduced deployment costs and greatly improved inference efficiency compared to its predecessor. As a fully open-source Apache 2.0 model that can be downloaded locally, its cost-effectiveness is truly exceptional.

Deployment Guide

Developers can access Qwen3.5 immediately via open weights or managed APIs.

Open Source Deployment

The weights are available on Hugging Face and ModelScope. Due to the MoE architecture, using the latest versions of inference engines is recommended.

Using vLLM (Recommended for Production):

vllm serve Qwen/Qwen3.5-397B-A17B \
  --port 8000 \
  --tensor-parallel-size 8 \
  --max-model-len 262144 \
  --reasoning-parser qwen3

Using SGLang:

python -m sglang.launch_server \
  --model-path Qwen/Qwen3.5-397B-A17B \
  --tp-size 8 \
  --context-length 262144

Managed API (Qwen3.5-Plus)

For those preferring a managed solution, Qwen3.5-Plus is available on Alibaba Cloud Model Studio. It features the "Thinking Mode" by default and costs approximately 0.8 RMB per million tokens, which is 1/18th of that of the Gemini 3 Pro, making it highly cost-effective for scaling.

Conclusion

Qwen3.5 is more than just an upgrade; it is a validation of efficient, hybrid architectures. By delivering frontier-class intelligence with a 17B active parameter footprint, the Qwen Team has lowered the barrier to entry for advanced AI.

Whether you are building complex multimodal agents, analyzing long-form video, or deploying multilingual applications, Qwen3.5 offers a robust, open-source foundation. We look forward to seeing the innovations the community will build on top of this impressive release.

🔗Dive in

GitHub: https://github.com/QwenLM/Qwen3.5
Chat: https://chat.qwen.ai
API:https://modelstudio.console.alibabacloud.com/ap-southeast-1/?tab=doc#/doc/?type=model&url=2840914_2&modelId=group-qwen3.5-plus
Qwen Code: https://github.com/QwenLM/qwen-code
Hugging Face: https://huggingface.co/collections/Qwen/qwen35
ModelScope: https://modelscope.cn/collections/Qwen/Qwen35
blog: https://qwen.ai/blog?id=qwen3.5

For more details, visit the Official Qwen GitHub or the Hugging Face Collection.

Read more about the latest Kling 3.0 release updates

Kling 3 Release

Kling AI enters the 3.0 era. Explore the unified multimodal engine, Native Audio, Multi-Shot, and Elements 3.0. Full tech comparison of Video 3.0 vs 2.6.

Read article

Kling 3 Prompt Guide

Master Kling AI 3.0 video generation. Discover expert prompting formulas, cinematic camera controls, native audio lip-syncing, and 5 exclusive prompt examples.

Read article

Kling Image 3 Release

Discover Kling Image 3.0: The new standard for AI art with Visual Chain-of-Thought, Image Series Mode, and native 4K cinematic output.

Read article

Kling 3 Could Change AI Video Forever

Explore why Kling 3.0 Could Change AI Video Forever. A technical review of the unified model, 15s multi-shot generation, native audio, elements 3.0 consistency.

Read article

Seedance 2 Release

ByteDance unveils Seedance 2.0. Explore the quad-modal engine, industrial-grade character consistency, DiT architecture, and advanced reference control.

Read article

Seedance 2 Review

In-depth Seedance 2.0 review analyzing community feedback. Explore the 'Director Mode' workflow, native audio, multi-shot consistency, and pros/cons vs. competitors.

Read article

Qwen Image 2 Release

Explore Qwen-Image-2.0 from Alibaba: A unified foundation model mastering 1K token prompts, complex text rendering, and seamless generation-editing workflows.

Read article

Seedance 2 Prompt Guide

Master Seedance 2.0 with our expert prompt guide. Learn to control camera movements, use the '@' reference system, and create professional AI videos on Jimeng.

Read article