Technical specifications of Qwen3.5-397B-A17B
| Item | Qwen3.5-397B-A17B (open-weight post-trained) |
|---|---|
| Model family | Qwen3.5 (Tongyi Qwen series, Alibaba) |
| Architecture | Hybrid Mixture-of-Experts (MoE) + Gated DeltaNet; early-fusion multimodal training |
| Total parameters | ~397 billion (total) |
| Active parameters (A17B) | ~17 billion active per-token (sparse routing) |
| Input types | Text, Image, Video (multimodal early-fusion) |
| Output types | Text (chat, code, RAG outputs), image-to-text, multimodal responses |
| Native context window | 262,144 tokens (native ISL) |
| Extensible context | Up to ~1,010,000 tokens via YaRN/ RoPE scaling (platform-dependent) |
| Max output tokens | Framework/serve-dependent (examples show 81,920–131,072 in guides) |
| Languages | 200+ languages and dialects |
| Release date | February 16, 2026 (open-weight release) |
| License | Apache‑2.0 (open weights on Hugging Face / ModelScope) |
What is Qwen3.5-397B-A17B
Qwen3.5-397B-A17B is the first open‑weight release in Alibaba’s Qwen3.5 family: a large, multimodal mixture‑of‑experts foundation model trained with early‑fusion vision–language objectives and optimized for agentic workflows. The model exposes the full capacity of a 397B‑parameter architecture while using sparse routing (the “A17B” suffix) so that only ~17B parameters are active per token—giving a balance between knowledge capacity and inference efficiency.
This release is intended for researchers and engineering teams who need an open, deployable, and multimodal foundation model capable of long‑context reasoning, visual understanding, and retrieval‑augmented/agentic applications.
Main features of Qwen3.5-397B-A17B
- Sparse MoE with active-parameter efficiency: Large global capacity (397B) with per‑token activity comparable to a 17B dense model, lowering FLOPS per token while preserving knowledge diversity.
- Native multimodality (early fusion): Trained to handle text, images, and video via a unified tokenization and encoder strategy for cross-modal reasoning.
- Very long-context support: Native input sequence length of 262K tokens and documented paths to extend to ~1M+ tokens using RoPE/YARN scaling for retrieval and long-document pipelines.
- Thinking mode & agent tooling: Support for internal reasoning traces and an agentic execution pattern; examples include enabling tool calls and code interpreter integration.
- Open-weight & broad compatibility: Released under Apache‑2.0 on Hugging Face and ModelScope, with first‑party integration guides for Transformers, vLLM, SGLang and community frameworks.
- Enterprise-friendly language coverage: Extensive multilingual training (200+ languages), plus instructions and recipes for deployment at scale.
Qwen3.5-397B-A17B vs Selected models
| Model | Context window (native) | Strength | Typical trade-offs |
|---|---|---|---|
| Qwen3.5-397B-A17B | 262K (native) | Multimodal MoE, open weights, 397B capacity with 17B active | Large model artifacts, requires distributed hosting for full performance |
| GPT-5.2 (representative closed) | ~400K (reported for some variants) | High single‑model dense reasoning accuracy | Closed weights, higher inference cost at scale |
| LLaMA‑style dense 70B | ~128K (varies) | Simpler inference stack, lower VRAM for dense runtimes | Less parameter capacity relative to MoE global knowledge |
Known limitations & operational considerations
- Memory footprint: Sparse MoE still requires storing large weight files; hosting demands significant storage and device memory compared with a 17B dense clone.
- Engineering complexity: Optimal throughput requires careful parallelism (tensor/pipeline) and frameworks like vLLM or SGLang; naive single‑GPU hosting is impractical.
- Token economics: While per‑token compute is reduced, very long contexts still increase I/O, KV cache size, and billing for managed providers.
- Safety & guardrails: Open weights increase flexibility but shift responsibility for safety filtering, monitoring, and deployment guardrails to the operator.
Representative use cases
- Research & model analysis: Open weights enable reproducible research and community-driven evaluation.
- On‑premise multimodal services: Enterprises needing data residency can deploy and run vision+text workloads locally.
- RAG and long‑document pipelines: Native long‑context support aids single‑pass reasoning over large corpora.
- Code intelligence & agent tooling: Analyze monorepos, generate patches, and run agentic tool‑call loops in controlled environments.
- Multilingual applications: High‑coverage language support for global products.
How to access and integrate Qwen3.5-397B-A17B
Step 1: Sign Up for API Key
Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.
Step 2: Send Requests to Qwen3.5-397B-A17B API
Select the “Qwen3.5-397B-A17B” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. Where to call it: Chat format.
Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.
Step 3: Retrieve and Verify Results
Process the API response to get the generated answer. After processing, the API responds with the task status and output data.