Xiaomi Models - CometAPI

X

MiMo-V2.5-Pro

Coming soon

Input:$60/M

Output:$240/M

MiMo-V2.5-Pro is Xiaomi's flagship model, excelling in general-purpose agent capabilities and complex software engineering.

X

MiMo-V2.5

Coming soon

Input:$60/M

Output:$240/M

MiMo-V2.5 is Xiaomi's native full-modal model. It achieves professional-grade agent performance at about half the cost of inference, while outperforming MiMo-V2-Omni in multimodal perception in image and video understanding tasks.

X

mimo-v2-pro

Input:$0.8/M

Output:$2.4/M

MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is highly adaptable to general agent frameworks like OpenClaw. It ranks among the global top tier in the standard PinchBench and ClawBench benchmarks, with perceived performance approaching that of Opus 4.6. MiMo-V2-Pro is designed to serve as the brain of agent systems, orchestrating complex workflows, driving production engineering tasks, and delivering results reliably.

X

mimo-v2-omni

Input:$0.32/M

Output:$1.6/M

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities. 256K context window.