ModelsPricingEnterprise
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Company
About usEnterprise
Resources
AI ModelsBlogChangelogSupport
Terms of ServicePrivacy Policy
© 2026 CometAPI · All rights reserved
Home/Models/Aliyun/qwen3.5-397b-a17b
Q

qwen3.5-397b-a17b

Input:$0.48/M
Output:$2.88/M
The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency.
New
Commercial Use
Playground
Overview
Features
Pricing
API

Technical specifications of Qwen3.5-397B-A17B

ItemQwen3.5-397B-A17B (open-weight post-trained)
Model familyQwen3.5 (Tongyi Qwen series, Alibaba)
ArchitectureHybrid Mixture-of-Experts (MoE) + Gated DeltaNet; early-fusion multimodal training
Total parameters~397 billion (total)
Active parameters (A17B)~17 billion active per-token (sparse routing)
Input typesText, Image, Video (multimodal early-fusion)
Output typesText (chat, code, RAG outputs), image-to-text, multimodal responses
Native context window262,144 tokens (native ISL)
Extensible contextUp to ~1,010,000 tokens via YaRN/ RoPE scaling (platform-dependent)
Max output tokensFramework/serve-dependent (examples show 81,920–131,072 in guides)
Languages200+ languages and dialects
Release dateFebruary 16, 2026 (open-weight release)
LicenseApache‑2.0 (open weights on Hugging Face / ModelScope)

What is Qwen3.5-397B-A17B

Qwen3.5-397B-A17B is the first open‑weight release in Alibaba’s Qwen3.5 family: a large, multimodal mixture‑of‑experts foundation model trained with early‑fusion vision–language objectives and optimized for agentic workflows. The model exposes the full capacity of a 397B‑parameter architecture while using sparse routing (the “A17B” suffix) so that only ~17B parameters are active per token—giving a balance between knowledge capacity and inference efficiency.

This release is intended for researchers and engineering teams who need an open, deployable, and multimodal foundation model capable of long‑context reasoning, visual understanding, and retrieval‑augmented/agentic applications.


Main features of Qwen3.5-397B-A17B

  • Sparse MoE with active-parameter efficiency: Large global capacity (397B) with per‑token activity comparable to a 17B dense model, lowering FLOPS per token while preserving knowledge diversity.
  • Native multimodality (early fusion): Trained to handle text, images, and video via a unified tokenization and encoder strategy for cross-modal reasoning.
  • Very long-context support: Native input sequence length of 262K tokens and documented paths to extend to ~1M+ tokens using RoPE/YARN scaling for retrieval and long-document pipelines.
  • Thinking mode & agent tooling: Support for internal reasoning traces and an agentic execution pattern; examples include enabling tool calls and code interpreter integration.
  • Open-weight & broad compatibility: Released under Apache‑2.0 on Hugging Face and ModelScope, with first‑party integration guides for Transformers, vLLM, SGLang and community frameworks.
  • Enterprise-friendly language coverage: Extensive multilingual training (200+ languages), plus instructions and recipes for deployment at scale.

Qwen3.5-397B-A17B vs Selected models

ModelContext window (native)StrengthTypical trade-offs
Qwen3.5-397B-A17B262K (native)Multimodal MoE, open weights, 397B capacity with 17B activeLarge model artifacts, requires distributed hosting for full performance
GPT-5.2 (representative closed)~400K (reported for some variants)High single‑model dense reasoning accuracyClosed weights, higher inference cost at scale
LLaMA‑style dense 70B~128K (varies)Simpler inference stack, lower VRAM for dense runtimesLess parameter capacity relative to MoE global knowledge

Known limitations & operational considerations

  • Memory footprint: Sparse MoE still requires storing large weight files; hosting demands significant storage and device memory compared with a 17B dense clone.
  • Engineering complexity: Optimal throughput requires careful parallelism (tensor/pipeline) and frameworks like vLLM or SGLang; naive single‑GPU hosting is impractical.
  • Token economics: While per‑token compute is reduced, very long contexts still increase I/O, KV cache size, and billing for managed providers.
  • Safety & guardrails: Open weights increase flexibility but shift responsibility for safety filtering, monitoring, and deployment guardrails to the operator.

Representative use cases

  1. Research & model analysis: Open weights enable reproducible research and community-driven evaluation.
  2. On‑premise multimodal services: Enterprises needing data residency can deploy and run vision+text workloads locally.
  3. RAG and long‑document pipelines: Native long‑context support aids single‑pass reasoning over large corpora.
  4. Code intelligence & agent tooling: Analyze monorepos, generate patches, and run agentic tool‑call loops in controlled environments.
  5. Multilingual applications: High‑coverage language support for global products.

How to access and integrate Qwen3.5-397B-A17B

Step 1: Sign Up for API Key

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

Step 2: Send Requests to Qwen3.5-397B-A17B API

Select the “Qwen3.5-397B-A17B” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. Where to call it: Chat format.

Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data.

FAQ

Is Qwen3.5-397B-A17B available as open weights for local hosting and research?

Yes. The Qwen3.5-397B-A17B weights are released under Apache-2.0 on Hugging Face and ModelScope, and the project provides serving recipes for Transformers, vLLM, and SGLang.

What does the "A17B" suffix mean in Qwen3.5-397B-A17B?

A17B indicates the model's sparse routing design uses roughly 17 billion active parameters per token (active experts), while the global model capacity is ~397 billion parameters.

What is the native context window and can I extend it for very long documents?

The model ships with a native input sequence length of 262,144 tokens and includes documented methods to extend context to ~1,010,000 tokens via YaRN/RoPE scaling, depending on serving framework.

Which input modalities does Qwen3.5-397B-A17B support?

It is a unified vision-language model trained with early-fusion; supported inputs include text, images, and video tokens for multimodal reasoning and generation.

How does inference efficiency compare to a 17B dense model?

Per-token inference compute is similar to 17B dense-class models thanks to sparse MoE routing, but model artifacts and memory requirements are larger because full weights must be stored and distributed across devices.

Features for qwen3.5-397b-a17b

Explore the key features of qwen3.5-397b-a17b, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for qwen3.5-397b-a17b

Explore competitive pricing for qwen3.5-397b-a17b, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how qwen3.5-397b-a17b can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.48/M
Output:$2.88/M
Input:$0.6/M
Output:$3.6/M
-20%

Sample code and API for qwen3.5-397b-a17b

Access comprehensive sample code and API resources for qwen3.5-397b-a17b to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of qwen3.5-397b-a17b in your projects.
POST
/v1/chat/completions
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="qwen3.5-397b-a17b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message.content)

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="qwen3.5-397b-a17b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message.content)

JavaScript Code Example

import OpenAI from "openai";

// Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY || "<YOUR_COMETAPI_KEY>";
const base_url = "https://api.cometapi.com/v1";

const openai = new OpenAI({
  apiKey: api_key,
  baseURL: base_url,
});

const completion = await openai.chat.completions.create({
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello!" }
  ],
  model: "qwen3.5-397b-a17b",
});

console.log(completion.choices[0].message.content);

Curl Code Example

#!/bin/bash

# Get your CometAPI key from https://api.cometapi.com/console/token
# Export it as: export COMETAPI_KEY="your-key-here"

curl https://api.cometapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -d '{
    "model": "qwen3.5-397b-a17b",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

More Models

C

Claude Opus 4.7

Input:$3/M
Output:$15/M
Claude Opus 4.7 is a hybrid reasoning model designed specifically for frontier-level coding, AI agents, and complex multi-step professional work. Unlike lighter models (e.g., Sonnet or Haiku variants), Opus 4.7 prioritizes depth, consistency, and autonomy on the hardest tasks.
A

Claude Sonnet 4.6

Input:$2.4/M
Output:$12/M
Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta.
O

GPT 5.5 Pro

Input:$24/M
Output:$144/M
An advanced model engineered for extremely complex logic and professional demands, representing the highest standard of deep reasoning and precise analytical capabilities.
O

GPT 5.5

Input:$4/M
Output:$24/M
A next-generation multimodal flagship model balancing exceptional performance with efficient response, dedicated to providing comprehensive and stable general-purpose AI services.
O

GPT Image 2 ALL

Per Request:$0.04
GPT Image 2 is openai state-of-the-art image generation model for fast, high-quality image generation and editing. It supports flexible image sizes and high-fidelity image inputs.
O

GPT 5.5 ALL

Input:$4/M
Output:$24/M
GPT-5.5 excels in code writing, online research, data analysis, and cross-tool operations. The model not only improves its autonomy in handling complex multi-step tasks but also significantly improves reasoning capabilities and execution efficiency while maintaining the same latency as its predecessor, marking an important step towards automated office automation in AI.

Related Blog

Google Gemma 4: The Complete Guide to Google's Open-Source AI Model (2026)
Apr 5, 2026

Google Gemma 4: The Complete Guide to Google's Open-Source AI Model (2026)

Gemma 4 is Google DeepMind’s latest open model family, launched on March 31, 2026 and announced publicly on April 2, 2026. It is designed for advanced reasoning, agentic workflows, multimodal understanding, and efficient deployment across phones, laptops, workstations, and edge devices. Google says the family ships in four versions — E2B, E4B, 26B A4B, and 31B Dense — with up to 256K context, support for more than 140 languages, open weights, and an Apache 2.0 license.
What Is Qwen 3.5-Max? Makes a Stunning Debut: Jumps to Fifth Place in Global Ranking
Mar 22, 2026
qwen3-5-max

What Is Qwen 3.5-Max? Makes a Stunning Debut: Jumps to Fifth Place in Global Ranking

Qwen 3.5-Max is a next-generation large language model (LLM) developed by Alibaba under the Qwen 3.5 family. It leverages Mixture-of-Experts (MoE) architecture, advanced reasoning capabilities, and agentic AI features to deliver state-of-the-art performance across coding, mathematics, multimodal reasoning, and autonomous task execution. Early benchmarks show it outperforming many competing models and ranking among the top global AI systems in 2026.
How to Use Qwen 3.5 API
Feb 18, 2026
qwen-3-5

How to Use Qwen 3.5 API

On Lunar New Year’s Eve (Feb 16–17, 2026), Alibaba Group released its next-generation model, Qwen 3.5 — a multimodal, agent-capable model positioned for what the company calls an “agentic AI” era. Industry coverage highlighted claims of large gains in efficiency and cost, and rapid support from hardware and cloud vendors. CometAPI is options for developers who want hosted API access or an OpenAI-compatible integration, while AMD announced Day-0 GPU support for the model on its Instinct line. ByteDance is one of the principal domestic competitors that released upgrades around the same holiday window. OpenAI remains a reference point for comparison in benchmarks and integration style.
Qwen 3.5 vs Minimax M2.5 vs GLM 5: Which is Better in 2026
Feb 17, 2026
qwen3-5
minimax-m2-5
glm-5

Qwen 3.5 vs Minimax M2.5 vs GLM 5: Which is Better in 2026

Qwen 3.5 targets large-scale, low-cost agentic multimodal workloads with a sparse Mixture-of-Experts (MoE) design and massive activated capacity; Minimax M2.5 emphasizes cost-efficient, realtime agent throughput at low running costs; GLM-5 focuses on heavy reasoning, long-context agents and engineering workflows via a very large MoE-style architecture optimized for token efficiency. The “best” depends on whether you prioritize raw reasoning/coding quality, agent throughput and cost, or open-source flexibility and long-context engineering workflows.
Qwen-3.5 on Lunar New Year — does it beat the closed-source top tier in 2026
Feb 16, 2026
qwen-3-5

Qwen-3.5 on Lunar New Year — does it beat the closed-source top tier in 2026

Alibaba’s new Qwen3.5 is a major step forward — it closes the gap with, and in some agentic / multimodal workloads claims parity or advantage over, certain frontier closed-source models on a number of public benchmarks and internal tests. However, “outperform” depends on the workload: on agentic tool-use, multimodal document/video understanding, and cost-per-inference Qwen3.5 is reported to be extremely competitive (and in some vendor charts ahead). The practical takeaway: Qwen3.5 appears to be a genuine frontier contender in early 2026 — for many enterprise agentic and multimodal use cases it is now viable as a primary option.