ModelsPricingEnterprise
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Company
About usEnterprise
Resources
AI ModelsBlogChangelogSupport
Terms of ServicePrivacy Policy
© 2026 CometAPI · All rights reserved
Home/Models/Aliyun/qwen3-vl-32b
Q

qwen3-vl-32b

Input:$0.24/M
Output:$0.96/M
Qwen3-VL-32B is the 32-billion-parameter dense variant in Alibaba’s Qwen3 vision-language model family. It is a multimodal (vision + language + video) transformer designed for unified perception, long-context reasoning, robust OCR and visual grounding, and agentic/toolified workflows.
New
Commercial Use
Playground
Overview
Features
Pricing
API
FieldValue / Notes
Model nameQwen3-VL-32B (Instruct / Thinking variants available).
Model family / architectureQwen3-VL — vision-language transformer; multimodal backbone with ViT-style visual encoder + LLM fusion layers.
Parameter countNamed “32B” class (public sources list ~32–33B parameter scale for the dense 32B variant).
VariantsDense: 2B / 4B / 8B / 32B; MoE: 30B-A3B, 235B-A22B (larger MoE variants also released).
Native context length256K tokens (native interleaved multimodal context), with engineered extension modes/techniques enabling up to ~1M tokens in some deployments.
Input modalitiesText + images (high-resolution) + long video (temporal modeling/timestamps) + OCR (multilingual).
Output modalitiesText (natural language), structured extraction (OCR/table/chart extraction), timestamps/segment summaries for video; supports tool use / agent calls.

What Qwen3-VL-32B is

Qwen3-VL-32B is the 32-billion-parameter dense variant in Alibaba’s Qwen3 vision-language model family. It is a multimodal (vision + language + video) transformer designed for unified perception, long-context reasoning, robust OCR and visual grounding, and agentic/toolified workflows.

Main features

  1. Large multimodal context — Native support for 256K interleaved tokens (text + image references) and architectural hooks / tooling to extend effective context to ~1M tokens for long documents and long videos; enables cross-document cross-media retrieval and reasoning.
  2. Unified visual + language pretraining — Joint training from early stages improving language grounding to visual inputs, leading to stronger cross-modal representations (beneficial for VQA, OCR, and diagram reasoning).
  3. Video comprehension & temporal alignment — Native video handling with timestamped text alignment and the ability to summarize or index long video streams at fine temporal granularity.
  4. Multilingual OCR and document parsing — High-quality OCR across many languages and robust document/layout understanding for table and chart extraction use cases.
  5. Instruct vs Thinking variants — Separate builds optimized for instruction compliance (Instruct) vs. deep internal chain-of-thought / reasoning throughput (Thinking) to suit application needs (safety/conciseness vs. stepwise reasoning).
  6. MoE options for scaling — For extreme capacity/coverage there are MoE variants (30B-A3B, 235B-A22B) that increase representational capacity while attempting to control inference compute via expert routing.

Where Qwen3-VL-32B is well-suited

  1. Document and form extraction at scale — robust OCR across languages, table and chart extraction, and semantic summarization of long reports.
  2. Visual question answering for complex images — medical/engineering diagrams, annotated photos, or visual troubleshooting that require integrating visual evidence with stepwise textual reasoning.
  3. Long-video indexing and summarization — generating searchable transcripts, second-level indexing and summaries for hours-long recordings or surveillance/video archives.
  4. Multimodal agents / tool chains — orchestrating tool calls that require extracting visual payloads (e.g., OCR→search→action), suitable for agent frameworks that combine perception and action.
  5. STEM visual reasoning & tutoring tools — diagrammatic math and stepwise solutions that incorporate images/graphs and textual explanation (noting that outputs should be verified for correctness in educational settings).

How to access Qwen3 VL-32B API

Step 1: Sign Up for API Key

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

Step 2: Send Requests to Qwen3 VL-32B API

Select the “Qwen3-VL-32B” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. base url is Chat

Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data.

Features for qwen3-vl-32b

Explore the key features of qwen3-vl-32b, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for qwen3-vl-32b

Explore competitive pricing for qwen3-vl-32b, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how qwen3-vl-32b can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.24/M
Output:$0.96/M
Input:$0.3/M
Output:$1.2/M
-20%

Sample code and API for qwen3-vl-32b

Access comprehensive sample code and API resources for qwen3-vl-32b to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of qwen3-vl-32b in your projects.
POST
/v1/chat/completions
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="qwen3-vl-32b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message.content)

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="qwen3-vl-32b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message.content)

JavaScript Code Example

import OpenAI from "openai";

// Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY || "<YOUR_COMETAPI_KEY>";
const base_url = "https://api.cometapi.com/v1";

const openai = new OpenAI({
  apiKey: api_key,
  baseURL: base_url,
});

const completion = await openai.chat.completions.create({
  model: "qwen3-vl-32b",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello!" },
  ],
});

console.log(completion.choices[0].message.content);

Curl Code Example

curl https://api.cometapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -d '{
    "model": "qwen3-vl-32b",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

More Models

C

Claude Opus 4.7

Input:$3/M
Output:$15/M
Claude Opus 4.7 is a hybrid reasoning model designed specifically for frontier-level coding, AI agents, and complex multi-step professional work. Unlike lighter models (e.g., Sonnet or Haiku variants), Opus 4.7 prioritizes depth, consistency, and autonomy on the hardest tasks.
A

Claude Sonnet 4.6

Input:$2.4/M
Output:$12/M
Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta.
O

GPT 5.5 Pro

Input:$24/M
Output:$144/M
انتہائی پیچیدہ منطق اور پیشہ ورانہ تقاضوں کے لیے تیار کردہ ایک ترقی یافتہ ماڈل، جو عمیق استدلال اور دقیق تجزیاتی صلاحیتوں کے اعلیٰ ترین معیار کی نمائندگی کرتا ہے۔
O

GPT 5.5

Input:$4/M
Output:$24/M
A next-generation multimodal flagship model balancing exceptional performance with efficient response, dedicated to providing comprehensive and stable general-purpose AI services.
O

GPT Image 2 ALL

Per Request:$0.04
GPT Image 2 is openai state-of-the-art image generation model for fast, high-quality image generation and editing. It supports flexible image sizes and high-fidelity image inputs.
O

GPT 5.5 ALL

Input:$4/M
Output:$24/M
GPT-5.5 excels in code writing, online research, data analysis, and cross-tool operations. The model not only improves its autonomy in handling complex multi-step tasks but also significantly improves reasoning capabilities and execution efficiency while maintaining the same latency as its predecessor, marking an important step towards automated office automation in AI.