Hurry! 1M Free Tokens Waiting for You – Register Today!

  • Home
  • Models
    • Suno v4.5
    • GPT-image-1 API
    • GPT-4.1 API
    • Qwen 3 API
    • Grok-3-Mini
    • Llama 4 API
    • GPT-4o API
    • GPT-4.5 API
    • Claude 3.7-Sonnet API
    • Grok 3 API
    • DeepSeek R1 API
    • Gemini2.5 pro
    • Runway Gen-3 Alpha API
    • FLUX 1.1 API
    • Kling 1.6 Pro API
    • All Models
  • Enterprise
  • Pricing
  • API Docs
  • Blog
  • Contact
Sign Up
Log in
Technology

Alibaba Cloud releases Qwen‑VLo multimodal model,Image capability upgrade

2025-06-30 anna No comments yet

Alibaba Cloud’s AI division has officially launched Qwen‑VLo, the latest iteration in its Qwen multimodal model series, marking a significant advancement in unified vision‑and‑language capabilities. Announced on June 28, 2025, Qwen‑VLo offers both understanding and generation functionalities, extending well beyond its predecessors to include high‑resolution image creation and editing driven by natural‑language prompts and visual inputs.

Building on earlier releases such as Qwen‑VL and Qwen2.5‑VL, Qwen‑VLo represents what Alibaba describes as a “comprehensive upgrade” in multimodal AI. While Qwen‑VL focused primarily on interpreting visual information, and Qwen2.5‑VL enhanced long‑context comprehension, Qwen‑VLo integrates these strengths into a single framework capable of bidirectional vision‑language tasks. It accommodates open‑ended instructions, supports multiple languages—including Chinese and English—and refines its outputs to rival those of human artists .

Key Features

Progressive Image Generation

Qwen‑VLo constructs images in a stepwise fashion—from left to right and top to bottom—iteratively refining predicted content to ensure consistency and visual harmony. This mechanism enhances both generation efficiency and user control over the creative process.

Dynamic Resolution Support

Utilizing dynamic resolution training, the model can handle arbitrary input/output resolutions and aspect ratios. Users can generate content tailored for diverse scenarios—such as web banners, social media covers, or high-resolution posters—without being constrained by fixed formats.

Open-Ended Instruction Editing

Through natural language prompts, Qwen VLo can perform advanced edits such as style transfers (“Apply a Van Gogh style”), composite transformations (“Add a sunny sky”), and multi-faceted modifications in a single instruction. It also supports extracting and editing traditional visual signals like depth maps, segmentation masks, and edge outlines.

Multilingual Interaction

The model accepts commands in multiple languages—currently supporting Chinese and English—thereby catering to a global user base and breaking down linguistic barriers in creative workflows.

Availability and Access

Qwen‑VLo is currently available in preview via the Qwen Chat platform at chat.qwen.ai. Alibaba Cloud has noted that, as a preview release, users may encounter occasional inconsistencies or factual inaccuracies during generation. The development team is actively iterating to address these limitations before a broader rollout.

Under the hood, Alibaba’s AI engineers have optimized Qwen‑VLo for deployment across both cloud and edge environments. Leveraging mixed‑precision quantization and novel parameter‑efficient fine‑tuning techniques, the model maintains high performance on a compact compute footprint. Alibaba has also integrated adaptive inference pipelines to balance latency and quality, ensuring that Qwen‑VLo can serve latency‑sensitive applications—such as interactive design tools—while scaling to enterprise‑grade workloads on Alibaba Cloud.

Compare to Qwen-VL-Plus/Max

Function DimensionQwen-VL-Plus/MaxQwen VLo
Image UnderstandingBasic classification, descriptionMultidimensional structure recognition, enhanced contextual understanding
Image GenerationLimited style supportHigh precision, progressive generation, strong style control capabilities
Multitasking CapabilityRequires task-specific inputUnified multitasking, supports complex language instructions
Multilingual InteractionLimited supportNative support for Chinese and English, smoother natural language control
Detail Preservation AbilityPossible detail loss in generationAccurate identification and reconstruction of key structures and semantics

Getting Started

CometAPI is a unified API platform that aggregates over 500 AI models from leading providers—such as OpenAI’s GPT series, Google’s Gemini, Anthropic’s Claude, Midjourney, Suno, and more—into a single, developer-friendly interface. By offering consistent authentication, request formatting, and response handling, CometAPI dramatically simplifies the integration of AI capabilities into your applications. Whether you’re building chatbots, image generators, music composers, or data‐driven analytics pipelines, CometAPI lets you iterate faster, control costs, and remain vendor-agnostic—all while tapping into the latest breakthroughs across the AI ecosystem.

To begin, explore models’s capabilities in the Playground and consult the API guide for detailed instructions. Before accessing, please make sure you have logged in to CometAPI and obtained the API key.

The latest integration Qwen‑VLo API will soon appear on CometAPI, so stay tuned!While we finalize Qwen‑VLo Model upload, explore our other models on the Models page or try them in the AI Playground. Qwen’s latest Model in CometAPI is Qwen 3 API(qwen3-235b-a22b;qwen3-30b-a3b;qwen3-8b) and qwen-vl-plus-latest.

  • Qwen
  • Qwen‑VLo
anna

Post navigation

Previous
Next

Search

Categories

  • AI Company (2)
  • AI Comparisons (48)
  • AI Model (83)
  • Model API (29)
  • Technology (347)

Tags

Alibaba Cloud Anthropic Black Forest Labs ChatGPT Claude Claude 3.7 Sonnet Claude 4 Claude Opus 4 Claude Sonnet 4 Codex cometapi DALL-E 3 deepseek DeepSeek R1 DeepSeek V3 FLUX Gemini Gemini 2.0 Gemini 2.0 Flash Gemini 2.5 Flash Gemini 2.5 Pro Google GPT-4.1 GPT-4o GPT -4o Image GPT-Image-1 GPT 4.5 gpt 4o grok 3 Midjourney Midjourney V7 Minimax o3 o4 mini OpenAI Qwen Qwen 2.5 Qwen3 sora Stable AI Stable Diffusion Suno Suno Music Veo 3 xAI

Related posts

Technology

Alibaba Cloud Unveils Qwen‑TTS: A High‑Fidelity, Streaming Speech Synthesis Model

2025-07-01 anna No comments yet

On June 26, 2025, Alibaba Cloud launched Qwen‑TTS, the latest addition to its Tongyi Qianwen (Qwen) family of large AI models. Designed for versatile, high‑quality text‑to‑speech applications, Qwen‑TTS supports Chinese, English, and mixed‑language input and offers both batch and streaming audio outputs, catering to diverse use cases from intelligent voice assistants to multimedia content production. […]

Technology

How Does Qwen3 Work?

2025-06-02 anna No comments yet

Qwen3 represents a significant leap forward in open-source large language models (LLMs), blending sophisticated reasoning capabilities with high efficiency and broad accessibility. Developed by Alibaba’s research and cloud computing teams, Qwen3 is positioned to rival leading proprietary systems such as OpenAI’s GPT-4x and Google’s PaLM, while remaining fully open under the Apache 2.0 license. This […]

Technology

How to access Qwen 2.5? 5 Ways!

2025-05-04 anna No comments yet

In the rapidly evolving landscape of artificial intelligence, Alibaba’s Qwen 2.5 has emerged as a formidable contender, challenging established models like OpenAI’s GPT-4o and Meta’s LLaMA 3.1. Released in January 2025, Qwen 2.5 boasts a suite of features that cater to a diverse range of applications, from software development to multilingual content creation. This article […]

500+ AI Model API,All In One API. Just In CometAPI

Models API
  • GPT API
  • Suno API
  • Luma API
  • Sora API
Developer
  • Sign Up
  • API DashBoard
  • Documentation
  • Quick Start
Resources
  • Pricing
  • Enterprise
  • Blog
  • AI Model API Articles
  • Discord Community
Get in touch
  • [email protected]

© CometAPI. All Rights Reserved.  

  • Terms & Service
  • Privacy Policy