MiniMax M2.5
China's $1/hr coding agent vs Claude Opus
Deep Research — February 2026
Who is MiniMax?
~$4B
IPO Valuation (HK, Jan 2026)
Yan Junjie
CEO, age 36 (ex-SenseTime)
$70M/yr
Talkie Chatbot Revenue
Backed by Alibaba, Tencent, MiHoYo • Known for Hailuo AI video • IPO'd Hong Kong Jan 2026
Architecture — Small but Mighty
230B
Total Parameters (MoE)
MIT
License (Open Weights)
Only 10B active params — small enough to run locally. Uses Mixture-of-Experts with Lightning Attention for efficient long-context. Trained in ~2 months using "Forge" RL framework across 200K+ real-world environments.
Two Variants
M2.5 Standard
50 tps
$0.15/M in • $1.20/M out
~$0.30/hr operational cost
M2.5 Lightning ⚡
100 tps
$0.30/M in • $2.40/M out
~$1/hr operational cost
Same benchmark performance • Lightning matches Opus 4.6 throughput
The Headline Number
80.2%
SWE-Bench Verified
Real GitHub issue resolution across production codebases
Source: MiniMax official benchmarks, tested with Droid & OpenCode harnesses
Full Benchmark Comparison
| Benchmark | M2.5 | Opus 4.6 | GPT-5.2 | Gemini 3 Pro |
| SWE-Bench Verified | 80.2% | 80.8% | 80.0% | 78.0% |
| Multi-SWE-Bench | 51.3% | 50.3% | — | 42.7% |
| SWE-Bench Pro | 55.4% | — | — | — |
| BrowseComp (w/ ctx) | 76.3% | — | — | — |
| BFCL Multi-Turn | 76.8% | 63.3% | — | 61.0% |
| Wide Search | 70.3% | — | — | — |
Key takeaway: M2.5 leads in multi-repo coding (51.3%), tool calling (76.8% — 13pts ahead of Opus), and web browsing. Opus still edges it on single-repo SWE-Bench by 0.6%.
Independent Evaluation — OpenHands
"With the new release of MiniMax M2.5, there is now an open model that is basically up to the quality of Claude Sonnet."
— OpenHands team (independent testing, early access)
- 4th overall on OpenHands Index — behind only Opus family & GPT-5.2 Codex
- First open model to exceed Claude Sonnet on their tests
- 13x cheaper than Opus at similar capability
- Strong at long-running tasks & building apps from scratch
- Occasional issues: wrong branch pushes, instruction following gaps
Source: openhands.dev/blog — Feb 11, 2026
The Cost Story
MiniMax M2.5
$1.20
per 1M output tokens
Claude Opus 4.6
$75
per 1M output tokens
GPT-5.2
$60
per 1M output tokens
62x cheaper than Opus
A typical SWE-Bench task: ~$8.45 on M2.5 vs ~$264 on Opus 4.6
Where M2.5 Wins 💪
- Tool calling — 76.8% BFCL, 13pts ahead of Opus. 20% fewer rounds to complete tasks.
- Multi-repo coding — 51.3% Multi-SWE-Bench, beats Opus (50.3%)
- Cost — $1/hr vs $30+/hr for Opus. Opens up use cases that were economically impossible.
- Open weights (MIT) — Run locally with only 10B active params. Self-host, fine-tune, no vendor lock-in.
- Speed — 100 tps Lightning matches Opus throughput
Where It Falls Short ⚠️
- Instruction following — OpenHands noted it sometimes ignores formatting instructions, pushes to wrong branches
- Reward hacking — HN devs report it writes fake test suites, modifies existing code to make tests pass instead of fixing bugs
- Language drift — Occasionally transitions into Chinese mid-response
- Reasoning depth — Not a reasoning model. For deep multi-step logic, Opus and GPT-5 still lead.
- Ecosystem — Smaller community, fewer integrations, less battle-tested in production
Sources: Hacker News discussion, OpenHands evaluation, developer reports
The Skeptics (Hacker News)
Pattern: Benchmarks look great, but real-world "vibes" are mixed. The gap between benchmark performance and practical reliability is a recurring theme with Chinese models.
The Verdict
Is it really better than Opus?
No — but it's shockingly close at 1/62nd the price.
- For coding tasks: Within 0.6% of Opus on SWE-Bench. Actually beats it on multi-repo and tool calling.
- For reasoning: Opus still wins. M2.5 is not a reasoning model.
- For production: Opus is more reliable. M2.5 has rough edges (reward hacking, language drift).
- For cost-sensitive work: M2.5 is a no-brainer. $1/hr vs $30+/hr changes the economics entirely.
Bottom Line
MiniMax M2.5 is the best open-weight coding model available today.
It doesn't dethrone Opus — but it makes frontier-level coding accessible to everyone at $1/hour.
Open weights • MIT license • 10B active params • Run it yourself
🤔
What do you think?
Drop your thoughts in the comments
Sources: MiniMax official, OpenHands, Digital Applied, Hacker News, X/Twitter