ModelsPricingEnterprise
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Company
About usEnterprise
Resources
AI ModelsBlogChangelogSupport
Terms of ServicePrivacy Policy
© 2026 CometAPI · All rights reserved
Home/Models/Xiaomi/mimo-v2-omni
X

mimo-v2-omni

Input:$0.32/M
Output:$1.6/M
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities. 256K context window.
New
Commercial Use
Playground
Overview
Features
Pricing
API

MiMo-V2-Omni Overview

MiMo-V2-Omni is Xiaomi MiMo’s omni foundation model for the API platform, built to see, hear, read, and act in the same workflow. Xiaomi positions it as a multimodal agent model that combines image, video, audio, and text understanding with structured tool calling, function execution, and UI grounding.

Technical specifications

ItemMiMo-V2-Omni
ProviderXiaomi MiMo
Model familyMiMo-V2
ModalityImage, video, audio, text
Output typeText
Native audio supportYes
Native audio-video joint inputYes
Structured tool callingYes
Function executionYes
UI groundingYes
Long audio handlingOver 10 hours continuous audio understanding
Release date2026-03-18
Public numeric context lengthNot stated on the official Omni page

What is MiMo-V2-Omni?

MiMo-V2-Omni is designed for agentic systems that need perception and action in one model. Xiaomi says the model fuses dedicated image, video, and audio encoders into one shared backbone, then trains it to anticipate what should happen next rather than only describe what is already visible.

Main features of MiMo-V2-Omni

  • Unified multimodal perception: image, video, audio, and text are handled as one perceptual stream rather than separate add-ons.
  • Agent-ready outputs: the model natively supports structured tool calling, function execution, and UI grounding for real agent frameworks.
  • Long-form audio understanding: Xiaomi claims it can handle continuous audio longer than 10 hours, which is unusually strong for a general omni model.
  • Native audio-video reasoning: the official page highlights joint audio-video input for video comprehension instead of a text-only transcript pipeline.
  • Browser and workflow execution: Xiaomi demonstrates end-to-end browser shopping and TikTok upload flows using MiMo-V2-Omni plus OpenClaw.
  • Perception-to-action framing: the model is trained to connect what it sees with what it should do next, which is the core difference between a demo model and an agentic model.

Benchmark performance

mimo-v2-omni

It clearly states that Omni exceeds Gemini 3 Pro on audio understanding, exceeds Claude Opus 4.6 on image understanding, and performs on par with the strongest reasoning models on agentic productivity benchmarks.

MiMo-V2-Omni vs MiMo-V2-Pro vs MiMo-V2-Flash

ModelCore strengthContext / scaleBest fit
MiMo-V2-OmniMultimodal perception + agent actionPublic context length not stated on the Omni pageAudio, image, video, UI, and browser agents
MiMo-V2-ProLargest flagship agent modelUp to 1M-token context; 1T+ params, 42B activeHeavy agent orchestration and long-horizon work
MiMo-V2-FlashFast reasoning and coding256K context; 309B total, 15B activeEfficient reasoning, coding, and high-throughput agent tasks

Best use cases

MiMo-V2-Omni is the right pick when your workflow depends on non-text inputs or outputs: screen understanding, voice and audio analysis, video review, browser automation, multimodal assistants, and robotics-style agent loops. If your workload is mostly text-only and you care more about raw speed or maximum context, the sibling Pro and Flash models are the more obvious alternatives.

FAQ

What can the MiMo-V2-Omni API understand besides text?

MiMo-V2-Omni is built for image, video, audio, and undfied perceptual system rather than separate modality add-ons, which makes it a better fit for multimodal agents than a text-only LLM.

Can MiMo-V2-Omni API process audio and video together?

Yes. the model supports native audio-video joint input for video comprehension, so it can reason over what is happening on screen and in the soundtrack at the same time.

How long of an audio file can MiMo-V2-Omni API handle?

MiMo-V2-Omni supports continuous audio understanding beyond 10 hours. That is a strong signal that it is meant for long-form audio analysis rather than short clip transcription only.

When should I use MiMo-V2-Omni API instead of MiMo-V2-Pro?

Use MiMo-V2-Omni when the job depends on multimodal perception: screens, videos, voice, or audio-visual workflow mostly agentic text work and you want the largest flagship context window, which Xiaomi says reaches 1M tokens.

Does MiMo-V2-Omni API support structured tool?

Yes. MiMo-V2-Omni natively supports structured tool calling, function execution, and UI grounding, which is exactly what you want for agent automation.

Is MiMo-V2-Omni API good for browser automation and real-world agents?

Yes. Xiaomi’s demos show it scanning shopping adviceing on JD.com, and completing a TikTok upload workflow through OpenClaw. That makes it a strong fit for browser agents, workflow automation, and UI-driven tasks.

Features for mimo-v2-omni

Explore the key features of mimo-v2-omni, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for mimo-v2-omni

Explore competitive pricing for mimo-v2-omni, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how mimo-v2-omni can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.32/M
Output:$1.6/M
Input:$0.4/M
Output:$2/M
-20%

Sample code and API for mimo-v2-omni

Access comprehensive sample code and API resources for mimo-v2-omni to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of mimo-v2-omni in your projects.
POST
/v1/chat/completions
POST
/v1/messages
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"

client = OpenAI(api_key=COMETAPI_KEY, base_url="https://api.cometapi.com/v1")

# mimo-v2-omni: built-in web_search tool (pass as top-level tools param)
completion = client.chat.completions.create(
    model="mimo-v2-omni",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who is Lei Jun?"},
    ],
    tools=[{"type": "web_search", "force_search": True, "max_keyword": 3, "limit": 1}],
    tool_choice="auto",
    extra_body={"thinking": {"type": "disabled"}},
)

msg = completion.choices[0].message
if msg.content:
    print(msg.content)

# annotations are populated when web_search runs (content may be null on search-only responses)
raw = completion.model_dump()
annotations = raw["choices"][0]["message"].get("annotations") or []
if annotations:
    print("
--- Sources ---")
    for ann in annotations:
        c = ann.get("url_citation") or {}
        print(f"[{c.get('title')}] {c.get('url')}")

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"

client = OpenAI(api_key=COMETAPI_KEY, base_url="https://api.cometapi.com/v1")

# mimo-v2-omni: built-in web_search tool (pass as top-level tools param)
completion = client.chat.completions.create(
    model="mimo-v2-omni",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who is Lei Jun?"},
    ],
    tools=[{"type": "web_search", "force_search": True, "max_keyword": 3, "limit": 1}],
    tool_choice="auto",
    extra_body={"thinking": {"type": "disabled"}},
)

msg = completion.choices[0].message
if msg.content:
    print(msg.content)

# annotations are populated when web_search runs (content may be null on search-only responses)
raw = completion.model_dump()
annotations = raw["choices"][0]["message"].get("annotations") or []
if annotations:
    print("\n--- Sources ---")
    for ann in annotations:
        c = ann.get("url_citation") or {}
        print(f"[{c.get('title')}] {c.get('url')}")

JavaScript Code Example

// Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY || "<YOUR_COMETAPI_KEY>";

// mimo-v2-omni: use fetch for web_search (non-standard tool type unsupported by openai SDK)
const resp = await fetch("https://api.cometapi.com/v1/chat/completions", {
  method: "POST",
  headers: { Authorization: `Bearer ${api_key}`, "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "mimo-v2-omni",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "Who is Lei Jun?" },
    ],
    tools: [{ type: "web_search", force_search: true, max_keyword: 3, limit: 1 }],
    tool_choice: "auto",
    thinking: { type: "disabled" },
  }),
});

const data = await resp.json();
const msg = data.choices[0].message;
if (msg.content) console.log(msg.content);

const annotations = msg.annotations ?? [];
if (annotations.length) {
  console.log("\n--- Sources ---");
  for (const ann of annotations) {
    const c = ann.url_citation ?? {};
    console.log(`[${c.title}] ${c.url}`);
  }
}

Curl Code Example

# Get your CometAPI key from https://api.cometapi.com/console/token
# Export it as: export COMETAPI_KEY="your-key-here"

curl https://api.cometapi.com/v1/chat/completions \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mimo-v2-omni",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Who is Lei Jun?"}
    ],
    "tools": [{"type": "web_search", "force_search": true, "max_keyword": 3, "limit": 1}],
    "thinking": {"type": "disabled"}
  }'

More Models

C

Claude Opus 4.7

Input:$4/M
Output:$20/M
Claude Opus 4.7 is a hybrid reasoning model designed specifically for frontier-level coding, AI agents, and complex multi-step professional work. Unlike lighter models (e.g., Sonnet or Haiku variants), Opus 4.7 prioritizes depth, consistency, and autonomy on the hardest tasks.
A

Claude Sonnet 4.6

Input:$2.4/M
Output:$12/M
Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta.
X

Grok 4.3

Input:$1/M
Output:$2/M
Excels at agentic reasoning, knowledge work, and tool use.
O

GPT 5.5 Pro

Input:$24/M
Output:$144/M
An advanced model engineered for extremely complex logic and professional demands, representing the highest standard of deep reasoning and precise analytical capabilities.
O

GPT 5.5

Input:$4/M
Output:$24/M
A next-generation multimodal flagship model balancing exceptional performance with efficient response, dedicated to providing comprehensive and stable general-purpose AI services.
O

GPT Image 2 ALL

Per Request:$0.04
GPT Image 2 is openai state-of-the-art image generation model for fast, high-quality image generation and editing. It supports flexible image sizes and high-fidelity image inputs.

Related Blog

MiMo V2 Pro vs Omni vs Flash: How should I choose in 2026?
Mar 26, 2026
mimo-v2

MiMo V2 Pro vs Omni vs Flash: How should I choose in 2026?

MiMo V2 Pro is the flagship choice for demanding agentic work, MiMo V2 Omni is the multimodal specialist for image, video, audio, and tool-using agents, and MiMo V2 Flash is the fast, low-cost open-source option for reasoning, coding, and everyday agent workflows.
How to Use MiMo V2 API for Free in 2026: Complete Guide (Pro, Omni & Flash)
Mar 25, 2026
mimo-v2

How to Use MiMo V2 API for Free in 2026: Complete Guide (Pro, Omni & Flash)

To use MiMo V2 API for free, get free quota via CometAPI or self-host the open-source weights on Hugging Face. For Pro and Omni, leverage OpenRouter routing, CometAPI aggregation, or Puter.js user-pays proxies. All models use a standard OpenAI-compatible endpoint. Official Xiaomi pricing starts at $1/$3 per million tokens for Pro (cheaper than Claude Opus 4.6), but free tiers and aggregators make high-performance agentic AI accessible without upfront costs.