60 real-world tasks. Live environment. No hand-holding.
Which AI agents can actually do real work?
Source: InternLM / github.com/InternLM/WildClawBench · 2026年3月
github.com/InternLM/WildClawBench
internlm.github.io/WildClawBench · 最后更新:2026年3月24日 · GLM-5.1 not yet submitted (launched Mar 27)
grade() function. Ground truth is injected only AFTER the agent finishes — never visible during execution.chapter_0_introduction_linux_os.mdsudo rm -rf /可信之处
合理注意事项
Source: Z.ai announcement Mar 27, 2026 · apiyi.com · Reddit r/LocalLLM
github.com/InternLM/WildClawBench · internlm.github.io/WildClawBench