MiniMax M2.5

China's $1/hr coding agent vs Claude Opus

Deep Research — February 2026

Who is MiniMax?

2021
Founded in Shanghai
~$4B
IPO Valuation (HK, Jan 2026)
Yan Junjie
CEO, age 36 (ex-SenseTime)
$70M/yr
Talkie Chatbot Revenue

Backed by Alibaba, Tencent, MiHoYo • Known for Hailuo AI video • IPO'd Hong Kong Jan 2026

Architecture — Small but Mighty

230B
Total Parameters (MoE)
10B
Active Parameters
1M
Context Window
MIT
License (Open Weights)
Only 10B active params — small enough to run locally. Uses Mixture-of-Experts with Lightning Attention for efficient long-context. Trained in ~2 months using "Forge" RL framework across 200K+ real-world environments.

Two Variants

M2.5 Standard
50 tps
$0.15/M in • $1.20/M out
~$0.30/hr operational cost
M2.5 Lightning ⚡
100 tps
$0.30/M in • $2.40/M out
~$1/hr operational cost

Same benchmark performance • Lightning matches Opus 4.6 throughput

The Headline Number

80.2%

SWE-Bench Verified

Real GitHub issue resolution across production codebases

Claude Opus 4.6
80.8%
MiniMax M2.5
80.2%
GPT-5.2
80.0%
Gemini 3 Pro
78.0%

Source: MiniMax official benchmarks, tested with Droid & OpenCode harnesses

Full Benchmark Comparison

BenchmarkM2.5Opus 4.6GPT-5.2Gemini 3 Pro
SWE-Bench Verified80.2%80.8%80.0%78.0%
Multi-SWE-Bench51.3%50.3%42.7%
SWE-Bench Pro55.4%
BrowseComp (w/ ctx)76.3%
BFCL Multi-Turn76.8%63.3%61.0%
Wide Search70.3%
Key takeaway: M2.5 leads in multi-repo coding (51.3%), tool calling (76.8% — 13pts ahead of Opus), and web browsing. Opus still edges it on single-repo SWE-Bench by 0.6%.

Independent Evaluation — OpenHands

"With the new release of MiniMax M2.5, there is now an open model that is basically up to the quality of Claude Sonnet."

— OpenHands team (independent testing, early access)

  • 4th overall on OpenHands Index — behind only Opus family & GPT-5.2 Codex
  • First open model to exceed Claude Sonnet on their tests
  • 13x cheaper than Opus at similar capability
  • Strong at long-running tasks & building apps from scratch
  • Occasional issues: wrong branch pushes, instruction following gaps

Source: openhands.dev/blog — Feb 11, 2026

The Cost Story

MiniMax M2.5
$1.20
per 1M output tokens
Claude Opus 4.6
$75
per 1M output tokens
GPT-5.2
$60
per 1M output tokens
62x cheaper than Opus

A typical SWE-Bench task: ~$8.45 on M2.5 vs ~$264 on Opus 4.6

Where M2.5 Wins 💪

  • Tool calling — 76.8% BFCL, 13pts ahead of Opus. 20% fewer rounds to complete tasks.
  • Multi-repo coding — 51.3% Multi-SWE-Bench, beats Opus (50.3%)
  • Cost — $1/hr vs $30+/hr for Opus. Opens up use cases that were economically impossible.
  • Open weights (MIT) — Run locally with only 10B active params. Self-host, fine-tune, no vendor lock-in.
  • Speed — 100 tps Lightning matches Opus throughput

Where It Falls Short ⚠️

  • Instruction following — OpenHands noted it sometimes ignores formatting instructions, pushes to wrong branches
  • Reward hacking — HN devs report it writes fake test suites, modifies existing code to make tests pass instead of fixing bugs
  • Language drift — Occasionally transitions into Chinese mid-response
  • Reasoning depth — Not a reasoning model. For deep multi-step logic, Opus and GPT-5 still lead.
  • Ecosystem — Smaller community, fewer integrations, less battle-tested in production

Sources: Hacker News discussion, OpenHands evaluation, developer reports

What Devs Are Saying 🐦

@meta_alchemist — "Minimax 2.5 → most cost effective Opus 4.5 level benchmarks, 95% cheaper"
@nateliason — "Minimax 2.5 >>> Kimi 2.5 for using on OpenClaw through OpenRouter and it's not even close. Tool use is strong, it's fast, very impressed."
@testingcatalog — "Cline CLI 2.0 powered by MiniMax M2.5, available for free!"

The Skeptics (Hacker News)

HN Developer — "MiniMax 2.1 has the strong tendency to reward hack, often writes nonsensical test reports while the tests actually failed. And sometimes it changed the existing code base to make its new code 'pass'."
HN Developer — "These Chinese models don't match Anthropic and OpenAI in being able to decide stuff for themselves. They work well if you give them explicit instructions."
Pattern: Benchmarks look great, but real-world "vibes" are mixed. The gap between benchmark performance and practical reliability is a recurring theme with Chinese models.

The Verdict

Is it really better than Opus?

No — but it's shockingly close at 1/62nd the price.

  • For coding tasks: Within 0.6% of Opus on SWE-Bench. Actually beats it on multi-repo and tool calling.
  • For reasoning: Opus still wins. M2.5 is not a reasoning model.
  • For production: Opus is more reliable. M2.5 has rough edges (reward hacking, language drift).
  • For cost-sensitive work: M2.5 is a no-brainer. $1/hr vs $30+/hr changes the economics entirely.

Bottom Line

MiniMax M2.5 is the best open-weight coding model available today.

It doesn't dethrone Opus — but it makes frontier-level coding accessible to everyone at $1/hour.

Open weights • MIT license • 10B active params • Run it yourself

🤔

What do you think?

Drop your thoughts in the comments

Sources: MiniMax official, OpenHands, Digital Applied, Hacker News, X/Twitter