Can MiniMax M3 process a full software repository in a single context window?

Yes. MiniMax M3 supports up to a 1,000,000-token context window, allowing large repositories, documentation sets, and long-running agent sessions to be analyzed within a single conversation.

How does MiniMax M3 compare to Claude Opus 4.7 for coding tasks?

M3 approaches Claude Opus 4.7 on several coding and agent benchmarks while offering a 1M-token context window and planned open-weight availability. Independent third-party comparisons are still emerging.

What makes MiniMax M3 different from previous MiniMax models?

MiniMax M3 introduces the MiniMax Sparse Attention (MSA) architecture, native multimodal training, stronger agent capabilities, and significantly larger context support than previous M2-series models.

Does the MiniMax M3 API support multimodal inputs?

Yes. MiniMax M3 is natively multimodal and supports image and video understanding in addition to text-based inputs.

What benchmark scores has MiniMax M3 achieved?

MiniMax reports 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 74.2% on MCP Atlas, and 83.5 on BrowseComp, positioning M3 among leading coding and agent-focused models.

Is MiniMax M3 suitable for autonomous AI agents?

Yes. The model was specifically optimized for long-horizon agent workflows including planning, tool use, task decomposition, terminal execution, and multi-step problem solving.

When should developers choose MiniMax M3 instead of Gemini 3.1 Pro?

MiniMax M3 is particularly attractive when extremely long context windows, coding-heavy workflows, or open-weight deployment options are priorities. Gemini 3.1 Pro may remain preferable for teams already standardized on Google's ecosystem.

Affordable MiniMax-M3 API | text-to-text

Playground for MiniMax-M3

Explore MiniMax-M3's Playground — an interactive environment to test models, run queries in real time. Try prompts, adjust parameters, and iterate instantly to accelerate development and validate use cases.

Technical Specifications of MiniMax M3

Item	MiniMax M3
Model family	MiniMax M3 frontier foundation model
Provider	MiniMax
Architecture	MiniMax Sparse Attention (MSA)
Input types	Text, Image, Video
Output types	Text
Context window	Up to 1,000,000 tokens (minimum guaranteed 512K)
Primary strengths	Coding, agentic workflows, multimodal reasoning, long-context processing
Reasoning mode	Thinking on/off modes
Tool use	Agent workflows, tool invocation, terminal-task execution
Deployment	API, MiniMax Code, Token Plan, upcoming open-weight release
Multimodal support	Native multimodal pretraining from step zero
Release date	June 2026

What is MiniMax M3?

MiniMax M3 is a frontier-scale AI model designed around three capabilities that have historically been limited to closed-source systems: advanced coding performance, million-token context processing, and native multimodal understanding. Unlike models that add vision as a later extension, M3 was trained as a multimodal model from the beginning, allowing deeper alignment between visual and textual reasoning.

The model is built on MiniMax Sparse Attention (MSA), a sparse-attention architecture designed to make million-token contexts computationally practical while preserving performance on coding, reasoning, and agentic tasks.

Main Features of MiniMax M3

1M-token context window: Supports extremely large repositories, lengthy research corpora, multi-document analysis, and long-running agent sessions.
Agent-oriented architecture: Designed for autonomous task decomposition, tool calling, iterative planning, and multi-step execution.
Native multimodality: Processes text, images, diagrams, screenshots, and video inputs without relying on a separate vision stack.
Advanced coding capability: Strong performance on software-engineering benchmarks including SWE-Bench Pro, Terminal-Bench, and KernelBench.
Long-horizon execution: Demonstrated multi-hour autonomous workflows including research reproduction and CUDA optimization projects.
Configurable reasoning: Thinking mode can be enabled for deeper reasoning workloads or disabled for lower-latency interactions.

Benchmark Performance of MiniMax M3

MiniMax reports frontier-level benchmark results across coding, agentic execution, and multimodal evaluation tasks. Reported results include:

Benchmark	Score
SWE-Bench Pro	59.0%
Terminal-Bench 2.1	66.0%
SWE-fficiency	34.8%
KernelBench Hard	28.8%
MCP Atlas	74.2%
BrowseComp	83.5
PostTrainBench	37.1

The company also reports that M3 surpasses GPT-5.5 and Gemini 3.1 Pro on several coding-oriented benchmarks while approaching Claude Opus 4.7 performance in selected evaluations. These claims originate from MiniMax's internal benchmark disclosures and should be interpreted alongside independent third-party testing as it becomes available.

Long-Context Architecture and MSA

MiniMax Sparse Attention (MSA) is the architectural innovation behind M3's million-token context capability. Instead of applying full quadratic attention across the entire sequence, MSA performs block-level routing and sparse attention over selected regions of context.

According to MiniMax, this reduces compute requirements substantially at large context lengths and delivers:

More than 9× faster prefill performance at 1M context length
More than 15× faster decoding performance
Approximately 1/20 of previous-generation per-token compute at 1M context scale

These improvements are intended to make repository-scale coding and long-horizon agent workflows practical.

MiniMax M3 vs Claude Opus 4.7 vs Gemini 3.1 Pro

Capability	MiniMax M3	Claude Opus 4.7	Gemini 3.1 Pro
Context Window	Up to 1M	Smaller publicly available context tiers	Large-context multimodal
Native Multimodal Training	Yes	Yes	Yes
Agentic Coding Focus	Very strong	Very strong	Strong
SWE-Bench Pro	59.0%	Higher according to MiniMax reporting	Lower according to MiniMax reporting
Open-Weight Availability	Planned	No	No
Long-Horizon Agent Workflows	Major design focus	Strong	Strong

Known Limitations

Most benchmark disclosures currently come from MiniMax rather than independent evaluation labs.
Open-weight model files and the full technical report were announced but were not yet broadly released at launch.
Real-world reliability across production environments is still being validated by the developer community.
Million-token context workloads may incur higher operational costs and latency than standard inference workloads.

Representative Use Cases

Repository-Scale Software Engineering

Analyze large codebases, perform multi-file refactors, generate patches, review pull requests, and maintain long-term development context.

Autonomous Research Agents

Support literature review, document synthesis, benchmark analysis, and long-running research workflows requiring hundreds of thousands of tokens.

Multimodal Technical Analysis

Interpret screenshots, architecture diagrams, charts, technical documents, and video content within the same reasoning workflow.

Terminal and DevOps Automation

Execute complex engineering workflows involving testing, deployment orchestration, dependency management, and iterative debugging.

Enterprise Knowledge Systems

Search and reason over large collections of policies, contracts, technical documentation, and internal knowledge repositories.

Model Version and Availability

MiniMax M3 was officially introduced in June 2026 as the flagship successor within the MiniMax model lineup. The model is available through the MiniMax API ecosystem and CometAPI.

FAQ

Pricing for MiniMax-M3

Explore competitive pricing for MiniMax-M3, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how MiniMax-M3 can enhance your projects while keeping costs manageable.

Comet Price (USD / M Tokens)	Official Price (USD / M Tokens)	Discount
Input:$0.48/M Output:$1.92/M	Input:$0.6/M Output:$2.4/M	-20%

Sample code and API for MiniMax-M3

Access comprehensive sample code and API resources for MiniMax-M3 to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of MiniMax-M3 in your projects.

POST

/v1/chat/completions

from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="minimax-m3",
    messages=[
        {
            "role": "system",
            "content": (
                "You are a senior backend reviewer focused on correctness, "
                "reliability, and maintainability."
            ),
        },
        {
            "role": "user",
            "content": (
                "Task: review the API migration plan and identify the "
                "highest-impact improvements.

"
                "Context: the team is moving a customer support workflow from "
                "blocking chat calls to an async job queue. Prioritize data "
                "safety, retry behavior, observability, and rollback.

"
                "Output format:
"
                "Return a table with columns: Area, Risk, Recommendation, "
                "Priority. Keep each recommendation actionable and under 40 words."
            ),
        },
    ],
    max_completion_tokens=800,
    extra_body={"reasoning_split": True},
)

if not completion.choices:
    print(completion.model_dump_json(indent=2))
    raise SystemExit

message = completion.choices[0].message

reasoning_details = getattr(message, "reasoning_details", None)
if reasoning_details:
    print("Thinking:")
    print(reasoning_details[0]["text"])
    print()

print("Response:")
print(message.content)

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="minimax-m3",
    messages=[
        {
            "role": "system",
            "content": (
                "You are a senior backend reviewer focused on correctness, "
                "reliability, and maintainability."
            ),
        },
        {
            "role": "user",
            "content": (
                "Task: review the API migration plan and identify the "
                "highest-impact improvements.\n\n"
                "Context: the team is moving a customer support workflow from "
                "blocking chat calls to an async job queue. Prioritize data "
                "safety, retry behavior, observability, and rollback.\n\n"
                "Output format:\n"
                "Return a table with columns: Area, Risk, Recommendation, "
                "Priority. Keep each recommendation actionable and under 40 words."
            ),
        },
    ],
    max_completion_tokens=800,
    extra_body={"reasoning_split": True},
)

if not completion.choices:
    print(completion.model_dump_json(indent=2))
    raise SystemExit

message = completion.choices[0].message

reasoning_details = getattr(message, "reasoning_details", None)
if reasoning_details:
    print("Thinking:")
    print(reasoning_details[0]["text"])
    print()

print("Response:")
print(message.content)

JavaScript Code Example

import OpenAI from "openai";

// Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY || "<YOUR_COMETAPI_KEY>";
const base_url = "https://api.cometapi.com/v1";

const openai = new OpenAI({
  apiKey: api_key,
  baseURL: base_url,
});

const completion = await openai.chat.completions.create({
  model: "minimax-m3",
  messages: [
    {
      role: "system",
      content:
        "You are a senior backend reviewer focused on correctness, reliability, and maintainability.",
    },
    {
      role: "user",
      content:
        "Task: review the API migration plan and identify the highest-impact improvements.\n\n" +
        "Context: the team is moving a customer support workflow from blocking chat calls " +
        "to an async job queue. Prioritize data safety, retry behavior, observability, and rollback.\n\n" +
        "Output format:\n" +
        "Return a table with columns: Area, Risk, Recommendation, Priority. " +
        "Keep each recommendation actionable and under 40 words.",
    },
  ],
  max_completion_tokens: 800,
  reasoning_split: true,
});

if (!completion.choices?.length) {
  console.log(JSON.stringify(completion, null, 2));
  process.exit(0);
}

const message = completion.choices[0].message;

if (message.reasoning_details?.length) {
  console.log("Thinking:");
  console.log(message.reasoning_details[0].text);
  console.log();
}

console.log("Response:");
console.log(message.content);

Curl Code Example

# Get your CometAPI key from https://www.cometapi.com/console/token
# Export it as: export COMETAPI_KEY="your-key-here"
curl https://api.cometapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -d '{
    "model": "minimax-m3",
    "messages": [
      {
        "role": "system",
        "content": "You are a senior backend reviewer focused on correctness, reliability, and maintainability."
      },
      {
        "role": "user",
        "content": "Task: review the API migration plan and identify the highest-impact improvements.\n\nContext: the team is moving a customer support workflow from blocking chat calls to an async job queue. Prioritize data safety, retry behavior, observability, and rollback.\n\nOutput format:\nReturn a table with columns: Area, Risk, Recommendation, Priority. Keep each recommendation actionable and under 40 words."
      }
    ],
    "max_completion_tokens": 800,
    "reasoning_split": true
  }'

Uptime

Request success rate over the last 30 days, reflecting the reliability of each model provider. CometAPI monitors all connected providers in real time, 24/7.

RespondLIVE

769msAvg. Response

UptimeLIVE

100.0%Avg. Uptime

Versions of MiniMax-M3

The reason MiniMax-M3 has multiple snapshots may include potential factors such as variations in output after updates requiring older snapshots for consistency, providing developers a transition period for adaptation and migration, and different snapshots corresponding to global or regional endpoints to optimize user experience. For detailed differences between versions, please refer to the official documentation.

version
minimax-m3