MiniMax ยท Released March 20, 2026 ยท First model to participate in its own training evolution
During development, M2.7 was used to run its own RL experiments, build complex skills, and improve the training harness based on results.
Frontier-level closed model designed for AI agents, third-party harnesses, and tools like OpenClaw, Claude Code, and Kilo Code.
M2.5 (Feb 2026) focused on polyglot code mastery. M2.7 is built for real-world engineering โ causal reasoning in live production systems.
Built to power agentic workflows โ complex multi-step tasks, Agent Teams, dynamic tool search, and persistent memory at scale.
Goal: full autonomy in model training and inference architecture without human involvement.
3 trials ร 24 hours each. Best run: 9 gold, 5 silver, 1 bronze medals. Average 66.6% medal rate โ second only to Claude Opus 4.6 (75.7%) and GPT-5.4 (71.2%). Ties with Gemini 3.1.
M2.7 can now autonomously perform 30-50% of a reinforcement learning researcher's workflow โ data pipelines, training, infra, cross-team collab, persistent memory.
56.22% on SWE-Pro โ matches GPT-5.3-Codex, the highest levels of global competitors. Focus on real-world production debugging, not toy benchmarks.
Elo 1495 on GDPval-AA document processing โ highest among open-source-accessible models globally.
34% hallucination rate โ significantly lower than Claude Sonnet 4.6 (46%) and Gemini 3.1 Pro Preview (50%). More reliable outputs.
66.6% medal rate on ML competitions โ ties Gemini 3.1, behind only Claude Opus 4.6 (75.7%) and GPT-5.4 (71.2%). Runs on a single A30 GPU.
Massive leap: +1 on AA-Omniscience vs -40 for M2.5. Virtually eliminated factual errors that plagued the previous generation.
97% skill adherence rate with 40+ complex skills (each >2,000 tokens). Stays on task even with massive instruction sets.
Coordinates multiple specialized sub-agents working in parallel โ complex multi-step tasks decomposed and executed autonomously.
Handles 40+ complex skills simultaneously with 97% adherence โ each skill exceeding 2,000 tokens. Stays focused across massive instruction sets.
Discovers and uses tools dynamically based on task requirements โ doesn't need pre-configured toolsets for every scenario.
Maintains context across long sessions โ updates its own memory files to retain state across multi-day agentic workflows.
Strong character consistency and emotional intelligence โ reliable persona maintenance across long conversations and role-based deployments.
Designed as backend for OpenClaw, Claude Code, Kilo Code, and other third-party agent harnesses โ drop-in frontier model.
Log analysis for bug hunting in live systems โ causal reasoning across distributed logs to find root causes fast.
Understands large codebases holistically โ refactors with awareness of downstream impact, not just local changes.
Identifies security vulnerabilities, injection risks, and insecure patterns across real production codebases.
Writes and debugs ML training pipelines, data preprocessing, and model evaluation code โ with deep understanding of the full ML workflow.
Full Android development capability โ Kotlin, Jetpack Compose, architecture patterns, and platform-specific debugging.
Matches GPT-5.3-Codex on the most challenging software engineering benchmark. Significant leap over M2.5 which focused on polyglot code breadth.
M2.7's 34% hallucination rate vs Claude Sonnet 4.6's 46% โ a significant reliability advantage for production agentic deployments.
Gemini 3.1 Pro Preview sits at 50% hallucination rate โ M2.7 is substantially more reliable for factual tasks and document processing.
AA-Omniscience Index jumped from -40 (M2.5) to +1 (M2.7) โ virtually eliminating the factual error problem that limited the previous generation.
Elo 1495 on GDPval-AA โ highest among open-source-accessible models globally for professional document understanding and processing.
One of the most exciting Chinese AI startups โ frontier-level LLMs with open-source licenses, competing directly with OpenAI and Anthropic.
Before M2.7, MiniMax built Hailuo โ one of the best AI video generation models globally. M2 series is their pivot to language/agent models.
M2.5 launched February 2026. M2.7 drops March 2026. MiniMax is shipping faster than almost any other frontier lab.
Unlike pure closed models, M2.7 is accessible via API and integrates into open ecosystems โ OpenClaw, Kilo Code, Claude Code.
The end goal is clear: a model that trains itself without human involvement. M2.7 is step one. This is the most ambitious AI research agenda in 2026.
Frontier performance at a fraction of the cost of GPT-5.4 or Claude Opus 4.6 โ democratizing access to top-tier model capabilities.
Better model โ better RL experiments โ better training โ even better model. Each generation accelerates the next. The compounding effect is the breakthrough.
If a model can run 30-50% of its own RL research, each iteration cycle is dramatically faster. Human researchers focus on direction, not execution.
"The industry focus has shifted from simple chat interfaces to agentic workflows capable of executing complex, multi-step tasks without human intervention." โ M2.7 is at the frontier of that shift.
Full autonomy in model training and inference architecture without human involvement. MiniMax is the only lab publicly pursuing this as an explicit product milestone.
minimax.io/models/text/m27
API access and model documentation
minimax.io/news/minimax-m27-en
"Early Echoes of Self-Evolution" โ full technical writeup
"New MiniMax M2.7 proprietary AI model is self-evolving and can perform 30-50% of reinforcement learning research workflow"
M2.7 works as a drop-in backend for OpenClaw โ configure via models.json for agent deployments on your existing infrastructure.
hailuoai.video
MiniMax's AI video generation product โ world-class video from text prompts.