Technical Specifications of MiniMax M3
| Item | MiniMax M3 |
|---|---|
| Model family | MiniMax M3 frontier foundation model |
| Provider | MiniMax |
| Architecture | MiniMax Sparse Attention (MSA) |
| Input types | Text, Image, Video |
| Output types | Text |
| Context window | Up to 1,000,000 tokens (minimum guaranteed 512K) |
| Primary strengths | Coding, agentic workflows, multimodal reasoning, long-context processing |
| Reasoning mode | Thinking on/off modes |
| Tool use | Agent workflows, tool invocation, terminal-task execution |
| Deployment | API, MiniMax Code, Token Plan, upcoming open-weight release |
| Multimodal support | Native multimodal pretraining from step zero |
| Release date | June 2026 |
What is MiniMax M3?
MiniMax M3 is a frontier-scale AI model designed around three capabilities that have historically been limited to closed-source systems: advanced coding performance, million-token context processing, and native multimodal understanding. Unlike models that add vision as a later extension, M3 was trained as a multimodal model from the beginning, allowing deeper alignment between visual and textual reasoning.
The model is built on MiniMax Sparse Attention (MSA), a sparse-attention architecture designed to make million-token contexts computationally practical while preserving performance on coding, reasoning, and agentic tasks.
Main Features of MiniMax M3
- 1M-token context window: Supports extremely large repositories, lengthy research corpora, multi-document analysis, and long-running agent sessions.
- Agent-oriented architecture: Designed for autonomous task decomposition, tool calling, iterative planning, and multi-step execution.
- Native multimodality: Processes text, images, diagrams, screenshots, and video inputs without relying on a separate vision stack.
- Advanced coding capability: Strong performance on software-engineering benchmarks including SWE-Bench Pro, Terminal-Bench, and KernelBench.
- Long-horizon execution: Demonstrated multi-hour autonomous workflows including research reproduction and CUDA optimization projects.
- Configurable reasoning: Thinking mode can be enabled for deeper reasoning workloads or disabled for lower-latency interactions.
Benchmark Performance of MiniMax M3
MiniMax reports frontier-level benchmark results across coding, agentic execution, and multimodal evaluation tasks. Reported results include:
| Benchmark | Score |
|---|---|
| SWE-Bench Pro | 59.0% |
| Terminal-Bench 2.1 | 66.0% |
| SWE-fficiency | 34.8% |
| KernelBench Hard | 28.8% |
| MCP Atlas | 74.2% |
| BrowseComp | 83.5 |
| PostTrainBench | 37.1 |
The company also reports that M3 surpasses GPT-5.5 and Gemini 3.1 Pro on several coding-oriented benchmarks while approaching Claude Opus 4.7 performance in selected evaluations. These claims originate from MiniMax's internal benchmark disclosures and should be interpreted alongside independent third-party testing as it becomes available.
Long-Context Architecture and MSA
MiniMax Sparse Attention (MSA) is the architectural innovation behind M3's million-token context capability. Instead of applying full quadratic attention across the entire sequence, MSA performs block-level routing and sparse attention over selected regions of context.
According to MiniMax, this reduces compute requirements substantially at large context lengths and delivers:
- More than 9× faster prefill performance at 1M context length
- More than 15× faster decoding performance
- Approximately 1/20 of previous-generation per-token compute at 1M context scale
These improvements are intended to make repository-scale coding and long-horizon agent workflows practical.
MiniMax M3 vs Claude Opus 4.7 vs Gemini 3.1 Pro
| Capability | MiniMax M3 | Claude Opus 4.7 | Gemini 3.1 Pro |
|---|---|---|---|
| Context Window | Up to 1M | Smaller publicly available context tiers | Large-context multimodal |
| Native Multimodal Training | Yes | Yes | Yes |
| Agentic Coding Focus | Very strong | Very strong | Strong |
| SWE-Bench Pro | 59.0% | Higher according to MiniMax reporting | Lower according to MiniMax reporting |
| Open-Weight Availability | Planned | No | No |
| Long-Horizon Agent Workflows | Major design focus | Strong | Strong |
Known Limitations
- Most benchmark disclosures currently come from MiniMax rather than independent evaluation labs.
- Open-weight model files and the full technical report were announced but were not yet broadly released at launch.
- Real-world reliability across production environments is still being validated by the developer community.
- Million-token context workloads may incur higher operational costs and latency than standard inference workloads.
Representative Use Cases
Repository-Scale Software Engineering
Analyze large codebases, perform multi-file refactors, generate patches, review pull requests, and maintain long-term development context.
Autonomous Research Agents
Support literature review, document synthesis, benchmark analysis, and long-running research workflows requiring hundreds of thousands of tokens.
Multimodal Technical Analysis
Interpret screenshots, architecture diagrams, charts, technical documents, and video content within the same reasoning workflow.
Terminal and DevOps Automation
Execute complex engineering workflows involving testing, deployment orchestration, dependency management, and iterative debugging.
Enterprise Knowledge Systems
Search and reason over large collections of policies, contracts, technical documentation, and internal knowledge repositories.
Model Version and Availability
MiniMax M3 was officially introduced in June 2026 as the flagship successor within the MiniMax model lineup. The model is available through the MiniMax API ecosystem and CometAPI.