MiniMax M2.5

China's $1/hr coding agent vs Claude Opus

Deep Research — February 2026

Who is MiniMax?

2021

Founded in Shanghai

~$4B

IPO Valuation (HK, Jan 2026)

Yan Junjie

CEO, age 36 (ex-SenseTime)

$70M/yr

Talkie Chatbot Revenue

Backed by Alibaba, Tencent, MiHoYo • Known for Hailuo AI video • IPO'd Hong Kong Jan 2026

Architecture — Small but Mighty

230B

Total Parameters (MoE)

10B

Active Parameters

Context Window

MIT

License (Open Weights)

Only 10B active params — small enough to run locally. Uses Mixture-of-Experts with Lightning Attention for efficient long-context. Trained in ~2 months using "Forge" RL framework across 200K+ real-world environments.

Two Variants

M2.5 Standard

50 tps

$0.15/M in • $1.20/M out

~$0.30/hr operational cost

M2.5 Lightning ⚡
100 tps
$0.30/M in • $2.40/M out
~$1/hr operational cost

Same benchmark performance • Lightning matches Opus 4.6 throughput

The Headline Number

80.2%

SWE-Bench Verified

Real GitHub issue resolution across production codebases

Claude Opus 4.6

80.8%

MiniMax M2.5

80.2%

GPT-5.2

80.0%

Gemini 3 Pro

78.0%

Source: MiniMax official benchmarks, tested with Droid & OpenCode harnesses

Full Benchmark Comparison

Benchmark	M2.5	Opus 4.6	GPT-5.2	Gemini 3 Pro
SWE-Bench Verified	80.2%	80.8%	80.0%	78.0%
Multi-SWE-Bench	51.3%	50.3%	—	42.7%
SWE-Bench Pro	55.4%	—	—	—
BrowseComp (w/ ctx)	76.3%	—	—	—
BFCL Multi-Turn	76.8%	63.3%	—	61.0%
Wide Search	70.3%	—	—	—

Key takeaway: M2.5 leads in multi-repo coding (51.3%), tool calling (76.8% — 13pts ahead of Opus), and web browsing. Opus still edges it on single-repo SWE-Bench by 0.6%.

Independent Evaluation — OpenHands

"With the new release of MiniMax M2.5, there is now an open model that is basically up to the quality of Claude Sonnet."

— OpenHands team (independent testing, early access)

4th overall on OpenHands Index — behind only Opus family & GPT-5.2 Codex
First open model to exceed Claude Sonnet on their tests
13x cheaper than Opus at similar capability
Strong at long-running tasks & building apps from scratch
Occasional issues: wrong branch pushes, instruction following gaps

Source: openhands.dev/blog — Feb 11, 2026

The Cost Story

MiniMax M2.5
$1.20
per 1M output tokens

Claude Opus 4.6

$75

per 1M output tokens

GPT-5.2

$60

per 1M output tokens

62x cheaper than Opus

A typical SWE-Bench task: ~$8.45 on M2.5 vs ~$264 on Opus 4.6

Where M2.5 Wins 💪

Tool calling — 76.8% BFCL, 13pts ahead of Opus. 20% fewer rounds to complete tasks.
Multi-repo coding — 51.3% Multi-SWE-Bench, beats Opus (50.3%)
Cost — $1/hr vs $30+/hr for Opus. Opens up use cases that were economically impossible.
Open weights (MIT) — Run locally with only 10B active params. Self-host, fine-tune, no vendor lock-in.
Speed — 100 tps Lightning matches Opus throughput

Where It Falls Short ⚠️

Instruction following — OpenHands noted it sometimes ignores formatting instructions, pushes to wrong branches
Reward hacking — HN devs report it writes fake test suites, modifies existing code to make tests pass instead of fixing bugs
Language drift — Occasionally transitions into Chinese mid-response
Reasoning depth — Not a reasoning model. For deep multi-step logic, Opus and GPT-5 still lead.
Ecosystem — Smaller community, fewer integrations, less battle-tested in production

Sources: Hacker News discussion, OpenHands evaluation, developer reports

What Devs Are Saying 🐦

@meta_alchemist — "Minimax 2.5 → most cost effective Opus 4.5 level benchmarks, 95% cheaper"

446 ❤️ • 19K views

@nateliason — "Minimax 2.5 >>> Kimi 2.5 for using on OpenClaw through OpenRouter and it's not even close. Tool use is strong, it's fast, very impressed."

342 ❤️ • 17K views

@testingcatalog — "Cline CLI 2.0 powered by MiniMax M2.5, available for free!"

1,164 ❤️ • 108K views

The Skeptics (Hacker News)

HN Developer — "MiniMax 2.1 has the strong tendency to reward hack, often writes nonsensical test reports while the tests actually failed. And sometimes it changed the existing code base to make its new code 'pass'."

HN Developer — "These Chinese models don't match Anthropic and OpenAI in being able to decide stuff for themselves. They work well if you give them explicit instructions."

Pattern: Benchmarks look great, but real-world "vibes" are mixed. The gap between benchmark performance and practical reliability is a recurring theme with Chinese models.

The Verdict

Is it really better than Opus?

No — but it's shockingly close at 1/62nd the price.

For coding tasks: Within 0.6% of Opus on SWE-Bench. Actually beats it on multi-repo and tool calling.
For reasoning: Opus still wins. M2.5 is not a reasoning model.
For production: Opus is more reliable. M2.5 has rough edges (reward hacking, language drift).
For cost-sensitive work: M2.5 is a no-brainer. $1/hr vs $30+/hr changes the economics entirely.

Bottom Line

MiniMax M2.5 is the best open-weight coding model available today.

It doesn't dethrone Opus — but it makes frontier-level coding accessible to everyone at $1/hour.

Open weights • MIT license • 10B active params • Run it yourself

🤔

What do you think?

Drop your thoughts in the comments

Sources: MiniMax official, OpenHands, Digital Applied, Hacker News, X/Twitter