Home/Models/Moonshot AI/Kimi K2.5
M

Kimi K2.5

Input:$0.48/M
Output:$2.4/M
Kimi K2.5 is Kimi's most intelligent model to date, achieving open-source SoTA performance in Agent, code, visual understanding, and a range of general intelligent tasks. Kimi K2.5 is also Kimi's most versatile model to date, featuring a native multimodal architecture that supports both visual and text input, thinking and non-thinking modes, and dialogue and Agent tasks.
New
Commercial Use
Playground
Overview
Features
Pricing
API

Technical specifications of Kimi k2.5

ItemValue / notes
Model name / vendorKimi-K2.5 (v1.0) — Moonshot AI (open-weights).
Architecture familyMixture-of-Experts (MoE) hybrid reasoning model (DeepSeek-style MoE).
Parameters (total / active)≈ 1 trillion total parameters; ~32B active per token (384 experts, 8 selected per token reported).
Modalities (input / output)Input: text, images, video (multimodal). Output: primarily text (rich reasoning traces), optionally structured tool calls / multi-step outputs.
Context window256k tokens
Training dataContinual pretraining on ~15 trillion mixed visual + text tokens (vendor reported). Training labels/dataset composition: undisclosed.
ModesThinking mode (returns internal reasoning traces; recommended temp=1.0) and Instant mode (no reasoning traces; recommended temp=0.6).
Agent featuresAgent Swarm / parallel sub-agents: orchestrator can spawn up to 100 sub-agents and execute large numbers of tool calls (vendor claims up to ~1,500 tool calls; parallel execution reduces runtime).

What Is Kimi K2.5?

Kimi K2.5 is Moonshot AI’s open-weight flagship large language model, designed as a native multimodal and agent-oriented system rather than a text-only LLM with add-on components. It integrates language reasoning, vision understanding, and long-context processing into a single architecture, enabling complex multi-step tasks that involve documents, images, videos, tools, and agents.

It is designed for long-horizon, tool-augmented workflows (coding, multi-step search, document/video understanding) and ships with two interaction modes (Thinking and Instant) and native INT4 quantization for efficient inference.


Core Features of Kimi K2.5

  1. Native multimodal reasoning
    Vision and language are trained jointly from pretraining onward. Kimi K2.5 can reason across images, screenshots, diagrams, and video frames without relying on external vision adapters.
  2. Ultra-long context window (256K tokens)
    Enables persistent reasoning over entire codebases, long research papers, legal documents, or extended multi-hour conversations without context truncation.
  3. Agent Swarm execution model
    Supports dynamic creation and coordination of up to ~100 specialized sub-agents, allowing parallel planning, tool use, and task decomposition for complex workflows.
  4. Multiple inference modes
    • Instant mode for low-latency responses
    • Thinking mode for deep multi-step reasoning
    • Agent / Swarm mode for autonomous task execution and orchestration
  5. Strong vision-to-code capability
    Capable of converting UI mockups, screenshots, or video demonstrations into working front-end code, and debugging software using visual context.
  6. Efficient MoE scaling
    The MoE architecture activates only a subset of experts per token, allowing trillion-parameter capacity with manageable inference cost compared to dense models.

Benchmark Performance of Kimi K2.5

Publicly reported benchmark results (primarily in reasoning-focused settings):

Reasoning & Knowledge Benchmarks

BenchmarkKimi K2.5GPT-5.2 (xhigh)Claude Opus 4.5Gemini 3 Pro
HLE-Full (with tools)50.245.543.245.8
AIME 202596.110092.895.0
GPQA-Diamond87.692.487.091.9
IMO-AnswerBench81.886.378.583.1

Vision & Video Benchmarks

BenchmarkKimi K2.5GPT-5.2Claude Opus 4.5Gemini 3 Pro
MMMU-Pro78.579.5*74.081.0
MathVista (Mini)90.182.8*80.2*89.8*
VideoMMMU87.486.0—88.4

Scores marked with reflect differences in evaluation setups reported by original sources.

Overall, Kimi K2.5 demonstrates strong competitiveness in multimodal reasoning, long-context tasks, and agent-style workflows, especially when evaluated beyond short-form QA.


Kimi K2.5 vs Other Frontier Models

DimensionKimi K2.5GPT-5.2Gemini 3 Pro
MultimodalityNative (vision + text)Integrated modulesIntegrated modules
Context length256K tokensLong (exact limit undisclosed)Long (<256K typical)
Agent orchestrationMulti-agent swarmSingle-agent focusSingle-agent focus
Model accessOpen weightsProprietaryProprietary
DeploymentLocal / cloud / customAPI onlyAPI only

Model selection guidance:

  • Choose Kimi K2.5 for open-weight deployment, research, long-context reasoning, or complex agent workflows.
  • Choose GPT-5.2 for production-grade general intelligence with strong tool ecosystems.
  • Choose Gemini 3 Pro for deep integration with Google’s productivity and search stack.

Representative Use Cases

  1. Large-scale document and code analysis
    Process entire repositories, legal corpora, or research archives in a single context window.
  2. Visual software engineering workflows
    Generate, refactor, or debug code using screenshots, UI designs, or recorded interactions.
  3. Autonomous agent pipelines
    Execute end-to-end workflows involving planning, retrieval, tool calls, and synthesis via agent swarms.
  4. Enterprise knowledge automation
    Analyze internal documents, spreadsheets, PDFs, and presentations to produce structured reports and insights.
  5. Research and model customization
    Fine-tuning, alignment research, and experimentation enabled by open model weights.

Limitations and Considerations

  • High hardware requirements: Full-precision deployment requires substantial GPU memory; production use typically relies on quantization (e.g., INT4).
  • Agent Swarm maturity: Advanced multi-agent behaviors are still evolving and may require careful orchestration design.
  • Inference complexity: Optimal performance depends on inference engine, quantization strategy, and routing configuration.

How to access Kimi k2.5 API via CometAPI

Step 1: Sign Up for API Key

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

cometapi-key

Step 2: Send Requests to Kimi k2.5 API

Select the “kimi-k2.5” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace with your actual CometAPI key from your account. base url is Chat Completions.

Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data.

FAQ

How many parameters does Kimi K2.5 have, and what architecture does it use?

Kimi K2.5 uses a Mixture-of-Experts (MoE) architecture with a total of ~1 trillion parameters, of which about 32 billion are active per token during inference. :contentReference[oaicite:1]{index=1}

What types of input can Kimi K2.5 handle?

Kimi K2.5 is a native multimodal model that processes both language and visual inputs (images and video) without add-on modules, using its built-in MoonViT vision encoder. :contentReference[oaicite:2]{index=2}

What is the context window size of Kimi K2.5 and why does it matter?

Kimi K2.5 supports an extended context window of up to 256,000 tokens, enabling it to maintain context over large documents, extensive codebases, or long conversations. :contentReference[oaicite:3]{index=3}

What are the main modes of operation in Kimi K2.5?

The model supports multiple modes including Instant (fast responses), Thinking (deep reasoning), and Agent/Agent Swarm modes for orchestrating complex multi-step tasks. :contentReference[oaicite:4]{index=4}

How does the Agent Swarm feature enhance performance?

Agent Swarm lets Kimi K2.5 dynamically generate and coordinate up to ~100 specialized sub-agents to work in parallel on complex objectives, reducing end-to-end runtime in multi-step workflows. :contentReference[oaicite:5]{index=5}

Is Kimi K2.5 suitable for coding tasks involving visual specifications?

Yes — Kimi K2.5 can generate or debug code from visual inputs like UI mockups or screenshots because its vision and language reasoning are integrated at the core. :contentReference[oaicite:6]{index=6}

What are practical limitations to consider with Kimi K2.5?

Because of its size (1T parameters), full-weight local deployment requires substantial hardware (hundreds of GBs of RAM/VRAM), and its most advanced capabilities (like Agent Swarm) may be experimental or in beta. :contentReference[oaicite:7]{index=7}

Features for Kimi K2.5

Explore the key features of Kimi K2.5, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for Kimi K2.5

Explore competitive pricing for Kimi K2.5, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how Kimi K2.5 can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)
Input:$0.48/M
Output:$2.4/M
Input:$0.6/M
Output:$3/M

Sample code and API for Kimi K2.5

Access comprehensive sample code and API resources for Kimi K2.5 to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of Kimi K2.5 in your projects.
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://api.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message.content)

More Models