MiniMax M2.7

Early Echoes of Self-Evolution

56.22%
SWE-Pro Score
34%
Hallucination Rate
66.6%
MLE-Bench Medal Rate
30-50%
RL Research Automated

MiniMax ยท Released March 20, 2026 ยท First model to participate in its own training evolution

What is M2.7?

The First Model That Helped Build Itself

๐Ÿงฌ M2.7 is MiniMax's first model to deeply participate in its own evolution โ€” updating its own memory, building its own skills, and improving its own training process.

๐Ÿค– Self-Evolving Architecture

During development, M2.7 was used to run its own RL experiments, build complex skills, and improve the training harness based on results.

โš™๏ธ Proprietary LLM

Frontier-level closed model designed for AI agents, third-party harnesses, and tools like OpenClaw, Claude Code, and Kilo Code.

๐Ÿ“ˆ Major Leap from M2.5

M2.5 (Feb 2026) focused on polyglot code mastery. M2.7 is built for real-world engineering โ€” causal reasoning in live production systems.

๐ŸŽฏ Agent-First Design

Built to power agentic workflows โ€” complex multi-step tasks, Agent Teams, dynamic tool search, and persistent memory at scale.

Goal: full autonomy in model training and inference architecture without human involvement.

How Self-Evolution Works

The Feedback Loop That Trained M2.7

1
Run Experiment
M2.7 runs a reinforcement learning experiment autonomously
2
Short-Term Memory
Generates a markdown memory file capturing what happened
3
Self-Criticism
Critiques its own results and identifies optimization directions
4
Self-Optimize
Next round uses full memory + feedback chain to improve
5
Repeat
Cycle continues โ€” models trained by M2.7 improve continuously over 24h runs

๐Ÿ… MLE-Bench Results

3 trials ร— 24 hours each. Best run: 9 gold, 5 silver, 1 bronze medals. Average 66.6% medal rate โ€” second only to Claude Opus 4.6 (75.7%) and GPT-5.4 (71.2%). Ties with Gemini 3.1.

๐Ÿ“Š 30-50% Automation

M2.7 can now autonomously perform 30-50% of a reinforcement learning researcher's workflow โ€” data pipelines, training, infra, cross-team collab, persistent memory.

Benchmark Performance

Where M2.7 Stands Against the World's Best

56.22%
SWE-Pro (= GPT-5.3-Codex)
1495
GDPval-AA Elo (Best OSS)
+1
AA-Omniscience (vs -40 for M2.5)
34%
Hallucination Rate

๐Ÿ’ป Software Engineering

56.22% on SWE-Pro โ€” matches GPT-5.3-Codex, the highest levels of global competitors. Focus on real-world production debugging, not toy benchmarks.

๐Ÿ“„ Professional Office

Elo 1495 on GDPval-AA document processing โ€” highest among open-source-accessible models globally.

๐Ÿง  Hallucination Rate

34% hallucination rate โ€” significantly lower than Claude Sonnet 4.6 (46%) and Gemini 3.1 Pro Preview (50%). More reliable outputs.

๐Ÿ† MLE-Bench

66.6% medal rate on ML competitions โ€” ties Gemini 3.1, behind only Claude Opus 4.6 (75.7%) and GPT-5.4 (71.2%). Runs on a single A30 GPU.

๐Ÿ“ Omniscience Index

Massive leap: +1 on AA-Omniscience vs -40 for M2.5. Virtually eliminated factual errors that plagued the previous generation.

๐Ÿค– System Prompt Adherence

97% skill adherence rate with 40+ complex skills (each >2,000 tokens). Stays on task even with massive instruction sets.

Agent Capabilities

Built for Real-World Agentic Workflows

๐Ÿค– M2.7 is purpose-built as an agent backbone โ€” not just a chatbot with tool use bolted on.

๐Ÿ‘ฅ Agent Teams

Coordinates multiple specialized sub-agents working in parallel โ€” complex multi-step tasks decomposed and executed autonomously.

๐Ÿ› ๏ธ Complex Skills (40+)

Handles 40+ complex skills simultaneously with 97% adherence โ€” each skill exceeding 2,000 tokens. Stays focused across massive instruction sets.

๐Ÿ” Dynamic Tool Search

Discovers and uses tools dynamically based on task requirements โ€” doesn't need pre-configured toolsets for every scenario.

๐Ÿง  Persistent Memory

Maintains context across long sessions โ€” updates its own memory files to retain state across multi-day agentic workflows.

๐ŸŽญ Character Consistency

Strong character consistency and emotional intelligence โ€” reliable persona maintenance across long conversations and role-based deployments.

๐Ÿ”— Harness Integration

Designed as backend for OpenClaw, Claude Code, Kilo Code, and other third-party agent harnesses โ€” drop-in frontier model.

Software Engineering Capabilities

Real-World Production Engineering โ€” Not Toy Benchmarks

๐Ÿ› Production Debugging

Log analysis for bug hunting in live systems โ€” causal reasoning across distributed logs to find root causes fast.

๐Ÿ”„ Refactoring

Understands large codebases holistically โ€” refactors with awareness of downstream impact, not just local changes.

๐Ÿ”’ Code Security

Identifies security vulnerabilities, injection risks, and insecure patterns across real production codebases.

๐Ÿค– Machine Learning Code

Writes and debugs ML training pipelines, data preprocessing, and model evaluation code โ€” with deep understanding of the full ML workflow.

๐Ÿ“ฑ Android Development

Full Android development capability โ€” Kotlin, Jetpack Compose, architecture patterns, and platform-specific debugging.

โšก SWE-Pro: 56.22%

Matches GPT-5.3-Codex on the most challenging software engineering benchmark. Significant leap over M2.5 which focused on polyglot code breadth.

Hallucination & Reliability

The Most Reliable Frontier Model for Production Use

34%
M2.7 Hallucination Rate
46%
Claude Sonnet 4.6
50%
Gemini 3.1 Pro Preview
+1
AA-Omniscience (was -40)

๐Ÿ“‰ 26% Lower Than Claude Sonnet

M2.7's 34% hallucination rate vs Claude Sonnet 4.6's 46% โ€” a significant reliability advantage for production agentic deployments.

๐Ÿ“‰ 32% Lower Than Gemini 3.1

Gemini 3.1 Pro Preview sits at 50% hallucination rate โ€” M2.7 is substantially more reliable for factual tasks and document processing.

๐ŸŽฏ Omniscience Leap

AA-Omniscience Index jumped from -40 (M2.5) to +1 (M2.7) โ€” virtually eliminating the factual error problem that limited the previous generation.

๐Ÿ† Best OSS Document Processing

Elo 1495 on GDPval-AA โ€” highest among open-source-accessible models globally for professional document understanding and processing.

M2.7 vs The Competition

Where MiniMax Fits in the Frontier Model Landscape

โœ… M2.7 Wins On

  • Hallucination rate (34% vs 46-50%)
  • Document processing (best OSS Elo)
  • Self-evolving architecture (unique)
  • Agent harness integration
  • 97% skill adherence at 40+ skills
  • Cost efficiency vs closed models

โš–๏ธ Ties With

  • Gemini 3.1: MLE-Bench 66.6%
  • GPT-5.3-Codex: SWE-Pro 56.22%
  • Frontier-level reasoning tasks

โŒ Behind On

  • MLE-Bench: Claude Opus 4.6 (75.7%) and GPT-5.4 (71.2%) still lead
  • Multimodal: vision/audio less mature than GPT-5.4
  • Brand recognition vs OpenAI/Anthropic
M2.7 is the most compelling frontier alternative for agent deployments โ€” lower hallucination, strong coding, open-accessible, and getting smarter by training itself.

Why MiniMax Matters

The Chinese AI Lab Punching Above Its Weight

๐Ÿ‡จ๐Ÿ‡ณ Chinese Frontier Lab

One of the most exciting Chinese AI startups โ€” frontier-level LLMs with open-source licenses, competing directly with OpenAI and Anthropic.

๐ŸŽฌ Hailuo Video

Before M2.7, MiniMax built Hailuo โ€” one of the best AI video generation models globally. M2 series is their pivot to language/agent models.

โšก Rapid Iteration

M2.5 launched February 2026. M2.7 drops March 2026. MiniMax is shipping faster than almost any other frontier lab.

๐Ÿ”“ Open-Accessible

Unlike pure closed models, M2.7 is accessible via API and integrates into open ecosystems โ€” OpenClaw, Kilo Code, Claude Code.

๐Ÿงฌ Self-Evolution Vision

The end goal is clear: a model that trains itself without human involvement. M2.7 is step one. This is the most ambitious AI research agenda in 2026.

๐Ÿ’ฐ Cost Efficiency

Frontier performance at a fraction of the cost of GPT-5.4 or Claude Opus 4.6 โ€” democratizing access to top-tier model capabilities.

The Road to Full Autonomy

What Self-Evolution Means for AI Development

M2.5
Feb 2026
Polyglot code mastery โ€” human-led fine-tuning, open weights
M2.7
Mar 2026
30-50% RL workflow automated โ€” model participates in its own training
M3.x
2026?
Full autonomy โ€” model trains itself end-to-end without human involvement

๐Ÿ”„ The Flywheel

Better model โ†’ better RL experiments โ†’ better training โ†’ even better model. Each generation accelerates the next. The compounding effect is the breakthrough.

โšก Speed Advantage

If a model can run 30-50% of its own RL research, each iteration cycle is dramatically faster. Human researchers focus on direction, not execution.

๐ŸŒ Industry Implication

"The industry focus has shifted from simple chat interfaces to agentic workflows capable of executing complex, multi-step tasks without human intervention." โ€” M2.7 is at the frontier of that shift.

๐ŸŽฏ The Goal

Full autonomy in model training and inference architecture without human involvement. MiniMax is the only lab publicly pursuing this as an explicit product milestone.

Get Started with M2.7

Links & Resources

๐ŸŒ MiniMax M2.7

minimax.io/models/text/m27
API access and model documentation

๐Ÿ“– Official Blog

minimax.io/news/minimax-m27-en
"Early Echoes of Self-Evolution" โ€” full technical writeup

๐Ÿ“ฐ VentureBeat Coverage

"New MiniMax M2.7 proprietary AI model is self-evolving and can perform 30-50% of reinforcement learning research workflow"

๐Ÿฆž OpenClaw Integration

M2.7 works as a drop-in backend for OpenClaw โ€” configure via models.json for agent deployments on your existing infrastructure.

๐ŸŽฌ Hailuo Video

hailuoai.video
MiniMax's AI video generation product โ€” world-class video from text prompts.

boxmining.com | AI Tricks That Actually Work