ModelsPricingEnterprise
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Company
About usEnterprise
Resources
AI ModelsBlogChangelogSupport
Terms of ServicePrivacy Policy
© 2026 CometAPI · All rights reserved
Home/Models/MiniMax/MiniMax-M3
M

MiniMax-M3

Input:$0.48/M
Output:$1.92/M
A native multimodal, 1M context Frontier Coding model
New
Commercial Use
Playground
Overview
Features
Pricing
API
Versions

Technical Specifications of MiniMax M3

ItemMiniMax M3
Model familyMiniMax M3 frontier foundation model
ProviderMiniMax
ArchitectureMiniMax Sparse Attention (MSA)
Input typesText, Image, Video
Output typesText
Context windowUp to 1,000,000 tokens (minimum guaranteed 512K)
Primary strengthsCoding, agentic workflows, multimodal reasoning, long-context processing
Reasoning modeThinking on/off modes
Tool useAgent workflows, tool invocation, terminal-task execution
DeploymentAPI, MiniMax Code, Token Plan, upcoming open-weight release
Multimodal supportNative multimodal pretraining from step zero
Release dateJune 2026

What is MiniMax M3?

MiniMax M3 is a frontier-scale AI model designed around three capabilities that have historically been limited to closed-source systems: advanced coding performance, million-token context processing, and native multimodal understanding. Unlike models that add vision as a later extension, M3 was trained as a multimodal model from the beginning, allowing deeper alignment between visual and textual reasoning.

The model is built on MiniMax Sparse Attention (MSA), a sparse-attention architecture designed to make million-token contexts computationally practical while preserving performance on coding, reasoning, and agentic tasks.

Main Features of MiniMax M3

  • 1M-token context window: Supports extremely large repositories, lengthy research corpora, multi-document analysis, and long-running agent sessions.
  • Agent-oriented architecture: Designed for autonomous task decomposition, tool calling, iterative planning, and multi-step execution.
  • Native multimodality: Processes text, images, diagrams, screenshots, and video inputs without relying on a separate vision stack.
  • Advanced coding capability: Strong performance on software-engineering benchmarks including SWE-Bench Pro, Terminal-Bench, and KernelBench.
  • Long-horizon execution: Demonstrated multi-hour autonomous workflows including research reproduction and CUDA optimization projects.
  • Configurable reasoning: Thinking mode can be enabled for deeper reasoning workloads or disabled for lower-latency interactions.

Benchmark Performance of MiniMax M3

MiniMax reports frontier-level benchmark results across coding, agentic execution, and multimodal evaluation tasks. Reported results include:

BenchmarkScore
SWE-Bench Pro59.0%
Terminal-Bench 2.166.0%
SWE-fficiency34.8%
KernelBench Hard28.8%
MCP Atlas74.2%
BrowseComp83.5
PostTrainBench37.1

The company also reports that M3 surpasses GPT-5.5 and Gemini 3.1 Pro on several coding-oriented benchmarks while approaching Claude Opus 4.7 performance in selected evaluations. These claims originate from MiniMax's internal benchmark disclosures and should be interpreted alongside independent third-party testing as it becomes available.

Long-Context Architecture and MSA

MiniMax Sparse Attention (MSA) is the architectural innovation behind M3's million-token context capability. Instead of applying full quadratic attention across the entire sequence, MSA performs block-level routing and sparse attention over selected regions of context.

According to MiniMax, this reduces compute requirements substantially at large context lengths and delivers:

  • More than 9× faster prefill performance at 1M context length
  • More than 15× faster decoding performance
  • Approximately 1/20 of previous-generation per-token compute at 1M context scale

These improvements are intended to make repository-scale coding and long-horizon agent workflows practical.

MiniMax M3 vs Claude Opus 4.7 vs Gemini 3.1 Pro

CapabilityMiniMax M3Claude Opus 4.7Gemini 3.1 Pro
Context WindowUp to 1MSmaller publicly available context tiersLarge-context multimodal
Native Multimodal TrainingYesYesYes
Agentic Coding FocusVery strongVery strongStrong
SWE-Bench Pro59.0%Higher according to MiniMax reportingLower according to MiniMax reporting
Open-Weight AvailabilityPlannedNoNo
Long-Horizon Agent WorkflowsMajor design focusStrongStrong

Known Limitations

  • Most benchmark disclosures currently come from MiniMax rather than independent evaluation labs.
  • Open-weight model files and the full technical report were announced but were not yet broadly released at launch.
  • Real-world reliability across production environments is still being validated by the developer community.
  • Million-token context workloads may incur higher operational costs and latency than standard inference workloads.

Representative Use Cases

Repository-Scale Software Engineering

Analyze large codebases, perform multi-file refactors, generate patches, review pull requests, and maintain long-term development context.

Autonomous Research Agents

Support literature review, document synthesis, benchmark analysis, and long-running research workflows requiring hundreds of thousands of tokens.

Multimodal Technical Analysis

Interpret screenshots, architecture diagrams, charts, technical documents, and video content within the same reasoning workflow.

Terminal and DevOps Automation

Execute complex engineering workflows involving testing, deployment orchestration, dependency management, and iterative debugging.

Enterprise Knowledge Systems

Search and reason over large collections of policies, contracts, technical documentation, and internal knowledge repositories.

Model Version and Availability

MiniMax M3 was officially introduced in June 2026 as the flagship successor within the MiniMax model lineup. The model is available through the MiniMax API ecosystem and CometAPI.

FAQ

Can MiniMax M3 process a full software repository in a single context window?

Yes. MiniMax M3 supports up to a 1,000,000-token context window, allowing large repositories, documentation sets, and long-running agent sessions to be analyzed within a single conversation.

How does MiniMax M3 compare to Claude Opus 4.7 for coding tasks?

M3 approaches Claude Opus 4.7 on several coding and agent benchmarks while offering a 1M-token context window and planned open-weight availability. Independent third-party comparisons are still emerging.

What makes MiniMax M3 different from previous MiniMax models?

MiniMax M3 introduces the MiniMax Sparse Attention (MSA) architecture, native multimodal training, stronger agent capabilities, and significantly larger context support than previous M2-series models.

Does the MiniMax M3 API support multimodal inputs?

Yes. MiniMax M3 is natively multimodal and supports image and video understanding in addition to text-based inputs.

What benchmark scores has MiniMax M3 achieved?

MiniMax reports 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 74.2% on MCP Atlas, and 83.5 on BrowseComp, positioning M3 among leading coding and agent-focused models.

Is MiniMax M3 suitable for autonomous AI agents?

Yes. The model was specifically optimized for long-horizon agent workflows including planning, tool use, task decomposition, terminal execution, and multi-step problem solving.

When should developers choose MiniMax M3 instead of Gemini 3.1 Pro?

MiniMax M3 is particularly attractive when extremely long context windows, coding-heavy workflows, or open-weight deployment options are priorities. Gemini 3.1 Pro may remain preferable for teams already standardized on Google's ecosystem.

Pricing for MiniMax-M3

Explore competitive pricing for MiniMax-M3, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how MiniMax-M3 can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.48/M
Output:$1.92/M
Input:$0.6/M
Output:$2.4/M
-20%

Sample code and API for MiniMax-M3

Access comprehensive sample code and API resources for MiniMax-M3 to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of MiniMax-M3 in your projects.
POST
/v1/chat/completions
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="minimax-m3",
    messages=[
        {
            "role": "system",
            "content": (
                "You are a senior backend reviewer focused on correctness, "
                "reliability, and maintainability."
            ),
        },
        {
            "role": "user",
            "content": (
                "Task: review the API migration plan and identify the "
                "highest-impact improvements.

"
                "Context: the team is moving a customer support workflow from "
                "blocking chat calls to an async job queue. Prioritize data "
                "safety, retry behavior, observability, and rollback.

"
                "Output format:
"
                "Return a table with columns: Area, Risk, Recommendation, "
                "Priority. Keep each recommendation actionable and under 40 words."
            ),
        },
    ],
    max_completion_tokens=800,
    extra_body={"reasoning_split": True},
)

if not completion.choices:
    print(completion.model_dump_json(indent=2))
    raise SystemExit

message = completion.choices[0].message

reasoning_details = getattr(message, "reasoning_details", None)
if reasoning_details:
    print("Thinking:")
    print(reasoning_details[0]["text"])
    print()

print("Response:")
print(message.content)

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="minimax-m3",
    messages=[
        {
            "role": "system",
            "content": (
                "You are a senior backend reviewer focused on correctness, "
                "reliability, and maintainability."
            ),
        },
        {
            "role": "user",
            "content": (
                "Task: review the API migration plan and identify the "
                "highest-impact improvements.\n\n"
                "Context: the team is moving a customer support workflow from "
                "blocking chat calls to an async job queue. Prioritize data "
                "safety, retry behavior, observability, and rollback.\n\n"
                "Output format:\n"
                "Return a table with columns: Area, Risk, Recommendation, "
                "Priority. Keep each recommendation actionable and under 40 words."
            ),
        },
    ],
    max_completion_tokens=800,
    extra_body={"reasoning_split": True},
)

if not completion.choices:
    print(completion.model_dump_json(indent=2))
    raise SystemExit

message = completion.choices[0].message

reasoning_details = getattr(message, "reasoning_details", None)
if reasoning_details:
    print("Thinking:")
    print(reasoning_details[0]["text"])
    print()

print("Response:")
print(message.content)

JavaScript Code Example

import OpenAI from "openai";

// Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY || "<YOUR_COMETAPI_KEY>";
const base_url = "https://api.cometapi.com/v1";

const openai = new OpenAI({
  apiKey: api_key,
  baseURL: base_url,
});

const completion = await openai.chat.completions.create({
  model: "minimax-m3",
  messages: [
    {
      role: "system",
      content:
        "You are a senior backend reviewer focused on correctness, reliability, and maintainability.",
    },
    {
      role: "user",
      content:
        "Task: review the API migration plan and identify the highest-impact improvements.\n\n" +
        "Context: the team is moving a customer support workflow from blocking chat calls " +
        "to an async job queue. Prioritize data safety, retry behavior, observability, and rollback.\n\n" +
        "Output format:\n" +
        "Return a table with columns: Area, Risk, Recommendation, Priority. " +
        "Keep each recommendation actionable and under 40 words.",
    },
  ],
  max_completion_tokens: 800,
  reasoning_split: true,
});

if (!completion.choices?.length) {
  console.log(JSON.stringify(completion, null, 2));
  process.exit(0);
}

const message = completion.choices[0].message;

if (message.reasoning_details?.length) {
  console.log("Thinking:");
  console.log(message.reasoning_details[0].text);
  console.log();
}

console.log("Response:");
console.log(message.content);

Curl Code Example

# Get your CometAPI key from https://www.cometapi.com/console/token
# Export it as: export COMETAPI_KEY="your-key-here"
curl https://api.cometapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -d '{
    "model": "minimax-m3",
    "messages": [
      {
        "role": "system",
        "content": "You are a senior backend reviewer focused on correctness, reliability, and maintainability."
      },
      {
        "role": "user",
        "content": "Task: review the API migration plan and identify the highest-impact improvements.\n\nContext: the team is moving a customer support workflow from blocking chat calls to an async job queue. Prioritize data safety, retry behavior, observability, and rollback.\n\nOutput format:\nReturn a table with columns: Area, Risk, Recommendation, Priority. Keep each recommendation actionable and under 40 words."
      }
    ],
    "max_completion_tokens": 800,
    "reasoning_split": true
  }'

Versions of MiniMax-M3

The reason MiniMax-M3 has multiple snapshots may include potential factors such as variations in output after updates requiring older snapshots for consistency, providing developers a transition period for adaptation and migration, and different snapshots corresponding to global or regional endpoints to optimize user experience. For detailed differences between versions, please refer to the official documentation.
version
minimax-m3