60 real-world tasks. Live environment. No hand-holding.
Which AI agents can actually do real work?
Source: InternLM / github.com/InternLM/WildClawBench · March 2026
github.com/InternLM/WildClawBench
internlm.github.io/WildClawBench · Last updated March 24, 2026 · GLM-5.1 not yet submitted (launched Mar 27)
grade() function. Ground truth is injected only AFTER the agent finishes — never visible during execution.chapter_0_introduction_linux_os.mdsudo rm -rf /What makes it credible
Legitimate caveats
Source: Z.ai announcement Mar 27, 2026 · apiyi.com · Reddit r/LocalLLM
github.com/InternLM/WildClawBench · internlm.github.io/WildClawBench