Q

qwen3-vl-235b-a22b

Input:$0.24/M
Output:$0.96/M
Context:2M
Max Output:30K
qwen3-vl-235b-a22b is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results.
New
Commercial Use