Is Qwen3.5-397B-A17B available as open weights for local hosting and research?

Yes. The Qwen3.5-397B-A17B weights are released under Apache-2.0 on Hugging Face and ModelScope, and the project provides serving recipes for Transformers, vLLM, and SGLang.

What does the "A17B" suffix mean in Qwen3.5-397B-A17B?

A17B indicates the model's sparse routing design uses roughly 17 billion active parameters per token (active experts), while the global model capacity is ~397 billion parameters.

What is the native context window and can I extend it for very long documents?

The model ships with a native input sequence length of 262,144 tokens and includes documented methods to extend context to ~1,010,000 tokens via YaRN/RoPE scaling, depending on serving framework.

Which input modalities does Qwen3.5-397B-A17B support?

It is a unified vision-language model trained with early-fusion; supported inputs include text, images, and video tokens for multimodal reasoning and generation.

How does inference efficiency compare to a 17B dense model?

Per-token inference compute is similar to 17B dense-class models thanks to sparse MoE routing, but model artifacts and memory requirements are larger because full weights must be stored and distributed across devices.

Affordable qwen3.5-397b-a17b API | text-to-text

Technical specifications of Qwen3.5-397B-A17B

Item	Qwen3.5-397B-A17B (open-weight post-trained)
Model family	Qwen3.5 (Tongyi Qwen series, Alibaba)
Architecture	Hybrid Mixture-of-Experts (MoE) + Gated DeltaNet; early-fusion multimodal training
Total parameters	~397 billion (total)
Active parameters (A17B)	~17 billion active per-token (sparse routing)
Input types	Text, Image, Video (multimodal early-fusion)
Output types	Text (chat, code, RAG outputs), image-to-text, multimodal responses
Native context window	262,144 tokens (native ISL)
Extensible context	Up to ~1,010,000 tokens via YaRN/ RoPE scaling (platform-dependent)
Max output tokens	Framework/serve-dependent (examples show 81,920–131,072 in guides)
Languages	200+ languages and dialects
Release date	February 16, 2026 (open-weight release)
License	Apache‑2.0 (open weights on Hugging Face / ModelScope)

What is Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is the first open‑weight release in Alibaba’s Qwen3.5 family: a large, multimodal mixture‑of‑experts foundation model trained with early‑fusion vision–language objectives and optimized for agentic workflows. The model exposes the full capacity of a 397B‑parameter architecture while using sparse routing (the “A17B” suffix) so that only ~17B parameters are active per token—giving a balance between knowledge capacity and inference efficiency.

This release is intended for researchers and engineering teams who need an open, deployable, and multimodal foundation model capable of long‑context reasoning, visual understanding, and retrieval‑augmented/agentic applications.

Main features of Qwen3.5-397B-A17B

Sparse MoE with active-parameter efficiency: Large global capacity (397B) with per‑token activity comparable to a 17B dense model, lowering FLOPS per token while preserving knowledge diversity.
Native multimodality (early fusion): Trained to handle text, images, and video via a unified tokenization and encoder strategy for cross-modal reasoning.
Very long-context support: Native input sequence length of 262K tokens and documented paths to extend to ~1M+ tokens using RoPE/YARN scaling for retrieval and long-document pipelines.
Thinking mode & agent tooling: Support for internal reasoning traces and an agentic execution pattern; examples include enabling tool calls and code interpreter integration.
Open-weight & broad compatibility: Released under Apache‑2.0 on Hugging Face and ModelScope, with first‑party integration guides for Transformers, vLLM, SGLang and community frameworks.
Enterprise-friendly language coverage: Extensive multilingual training (200+ languages), plus instructions and recipes for deployment at scale.

Qwen3.5-397B-A17B vs Selected models

Model	Context window (native)	Strength	Typical trade-offs
Qwen3.5-397B-A17B	262K (native)	Multimodal MoE, open weights, 397B capacity with 17B active	Large model artifacts, requires distributed hosting for full performance
GPT-5.2 (representative closed)	~400K (reported for some variants)	High single‑model dense reasoning accuracy	Closed weights, higher inference cost at scale
LLaMA‑style dense 70B	~128K (varies)	Simpler inference stack, lower VRAM for dense runtimes	Less parameter capacity relative to MoE global knowledge

Known limitations & operational considerations

Memory footprint: Sparse MoE still requires storing large weight files; hosting demands significant storage and device memory compared with a 17B dense clone.
Engineering complexity: Optimal throughput requires careful parallelism (tensor/pipeline) and frameworks like vLLM or SGLang; naive single‑GPU hosting is impractical.
Token economics: While per‑token compute is reduced, very long contexts still increase I/O, KV cache size, and billing for managed providers.
Safety & guardrails: Open weights increase flexibility but shift responsibility for safety filtering, monitoring, and deployment guardrails to the operator.

Representative use cases

Research & model analysis: Open weights enable reproducible research and community-driven evaluation.
On‑premise multimodal services: Enterprises needing data residency can deploy and run vision+text workloads locally.
RAG and long‑document pipelines: Native long‑context support aids single‑pass reasoning over large corpora.
Code intelligence & agent tooling: Analyze monorepos, generate patches, and run agentic tool‑call loops in controlled environments.
Multilingual applications: High‑coverage language support for global products.

How to access and integrate Qwen3.5-397B-A17B

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

Step 2: Send Requests to Qwen3.5-397B-A17B API

Select the “Qwen3.5-397B-A17B” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. Where to call it: Chat format.

Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data.

qwen3.5-397b-a17b