Home/Models/Aliyun/qwen3.5-397b-a17b
Q

qwen3.5-397b-a17b

Input:$0.48/M
Output:$2.88/M
The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency.
New
Commercial Use
Playground
Overview
Features
Pricing
API

Technical specifications of Qwen3.5-397B-A17B

ItemQwen3.5-397B-A17B (open-weight post-trained)
Model familyQwen3.5 (Tongyi Qwen series, Alibaba)
ArchitectureHybrid Mixture-of-Experts (MoE) + Gated DeltaNet; early-fusion multimodal training
Total parameters~397 billion (total)
Active parameters (A17B)~17 billion active per-token (sparse routing)
Input typesText, Image, Video (multimodal early-fusion)
Output typesText (chat, code, RAG outputs), image-to-text, multimodal responses
Native context window262,144 tokens (native ISL)
Extensible contextUp to ~1,010,000 tokens via YaRN/ RoPE scaling (platform-dependent)
Max output tokensFramework/serve-dependent (examples show 81,920–131,072 in guides)
Languages200+ languages and dialects
Release dateFebruary 16, 2026 (open-weight release)
LicenseApache‑2.0 (open weights on Hugging Face / ModelScope)

What is Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is the first open‑weight release in Alibaba’s Qwen3.5 family: a large, multimodal mixture‑of‑experts foundation model trained with early‑fusion vision–language objectives and optimized for agentic workflows. The model exposes the full capacity of a 397B‑parameter architecture while using sparse routing (the “A17B” suffix) so that only ~17B parameters are active per token—giving a balance between knowledge capacity and inference efficiency.

This release is intended for researchers and engineering teams who need an open, deployable, and multimodal foundation model capable of long‑context reasoning, visual understanding, and retrieval‑augmented/agentic applications.


Main features of Qwen3.5-397B-A17B

  • Sparse MoE with active-parameter efficiency: Large global capacity (397B) with per‑token activity comparable to a 17B dense model, lowering FLOPS per token while preserving knowledge diversity.
  • Native multimodality (early fusion): Trained to handle text, images, and video via a unified tokenization and encoder strategy for cross-modal reasoning.
  • Very long-context support: Native input sequence length of 262K tokens and documented paths to extend to ~1M+ tokens using RoPE/YARN scaling for retrieval and long-document pipelines.
  • Thinking mode & agent tooling: Support for internal reasoning traces and an agentic execution pattern; examples include enabling tool calls and code interpreter integration.
  • Open-weight & broad compatibility: Released under Apache‑2.0 on Hugging Face and ModelScope, with first‑party integration guides for Transformers, vLLM, SGLang and community frameworks.
  • Enterprise-friendly language coverage: Extensive multilingual training (200+ languages), plus instructions and recipes for deployment at scale.

Qwen3.5-397B-A17B vs Selected models

ModelContext window (native)StrengthTypical trade-offs
Qwen3.5-397B-A17B262K (native)Multimodal MoE, open weights, 397B capacity with 17B activeLarge model artifacts, requires distributed hosting for full performance
GPT-5.2 (representative closed)~400K (reported for some variants)High single‑model dense reasoning accuracyClosed weights, higher inference cost at scale
LLaMA‑style dense 70B~128K (varies)Simpler inference stack, lower VRAM for dense runtimesLess parameter capacity relative to MoE global knowledge

Known limitations & operational considerations

  • Memory footprint: Sparse MoE still requires storing large weight files; hosting demands significant storage and device memory compared with a 17B dense clone.
  • Engineering complexity: Optimal throughput requires careful parallelism (tensor/pipeline) and frameworks like vLLM or SGLang; naive single‑GPU hosting is impractical.
  • Token economics: While per‑token compute is reduced, very long contexts still increase I/O, KV cache size, and billing for managed providers.
  • Safety & guardrails: Open weights increase flexibility but shift responsibility for safety filtering, monitoring, and deployment guardrails to the operator.

Representative use cases

  1. Research & model analysis: Open weights enable reproducible research and community-driven evaluation.
  2. On‑premise multimodal services: Enterprises needing data residency can deploy and run vision+text workloads locally.
  3. RAG and long‑document pipelines: Native long‑context support aids single‑pass reasoning over large corpora.
  4. Code intelligence & agent tooling: Analyze monorepos, generate patches, and run agentic tool‑call loops in controlled environments.
  5. Multilingual applications: High‑coverage language support for global products.

How to access and integrate Qwen3.5-397B-A17B

Step 1: Sign Up for API Key

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

Step 2: Send Requests to Qwen3.5-397B-A17B API

Select the “Qwen3.5-397B-A17B” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. Where to call it: Chat format.

Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data.

FAQ

Is Qwen3.5-397B-A17B available as open weights for local hosting and research?

Yes. The Qwen3.5-397B-A17B weights are released under Apache-2.0 on Hugging Face and ModelScope, and the project provides serving recipes for Transformers, vLLM, and SGLang.

What does the "A17B" suffix mean in Qwen3.5-397B-A17B?

A17B indicates the model's sparse routing design uses roughly 17 billion active parameters per token (active experts), while the global model capacity is ~397 billion parameters.

What is the native context window and can I extend it for very long documents?

The model ships with a native input sequence length of 262,144 tokens and includes documented methods to extend context to ~1,010,000 tokens via YaRN/RoPE scaling, depending on serving framework.

Which input modalities does Qwen3.5-397B-A17B support?

It is a unified vision-language model trained with early-fusion; supported inputs include text, images, and video tokens for multimodal reasoning and generation.

How does inference efficiency compare to a 17B dense model?

Per-token inference compute is similar to 17B dense-class models thanks to sparse MoE routing, but model artifacts and memory requirements are larger because full weights must be stored and distributed across devices.

Features for qwen3.5-397b-a17b

Explore the key features of qwen3.5-397b-a17b, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for qwen3.5-397b-a17b

Explore competitive pricing for qwen3.5-397b-a17b, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how qwen3.5-397b-a17b can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.48/M
Output:$2.88/M
Input:$0.6/M
Output:$3.6/M
-20%

Sample code and API for qwen3.5-397b-a17b

Access comprehensive sample code and API resources for qwen3.5-397b-a17b to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of qwen3.5-397b-a17b in your projects.
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="qwen3.5-397b-a17b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message.content)

More Models