ModelsPricingEnterprise
X

Xiaomi Models

Browse models from Xiaomi
Xiaomi MiMo-V2 Model Series: Towards the Agentic Era Integrating trillion-scale parameters, omni-modal perception, and human-like interaction—unifying understanding and action, from the present into the future.
X

MiMo-V2.5-Pro

Coming soon
Input:$60/M
Output:$240/M
MiMo-V2.5-Pro is Xiaomi's flagship model, excelling in general-purpose agent capabilities and complex software engineering.
X

MiMo-V2.5

Coming soon
Input:$60/M
Output:$240/M
MiMo-V2.5 is Xiaomi's native full-modal model. It achieves professional-grade agent performance at about half the cost of inference, while outperforming MiMo-V2-Omni in multimodal perception in image and video understanding tasks.
X

mimo-v2-pro

Input:$0.8/M
Output:$2.4/M
MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is highly adaptable to general agent frameworks like OpenClaw. It ranks among the global top tier in the standard PinchBench and ClawBench benchmarks, with perceived performance approaching that of Opus 4.6. MiMo-V2-Pro is designed to serve as the brain of agent systems, orchestrating complex workflows, driving production engineering tasks, and delivering results reliably.
X

mimo-v2-omni

Input:$0.32/M
Output:$1.6/M
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities. 256K context window.
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Company
About usEnterprise
Resources
AI ModelsBlogChangelogSupport
Terms of ServicePrivacy Policy
© 2026 CometAPI · All rights reserved