the ranking

#ModelSWE-bench ProBest for
1 Claude Opus 4.8Anthropic 69.2% The hardest agentic refactors and long, autonomous multi-file tasks where every point of accuracy saves a human review cycle.
2 Claude Sonnet 5Anthropic 63.2% The best closed-model value — near-Opus scores at ~2.5× less, and the default daily driver for most developers.
3 Qwen3.7 MaxAlibaba 60.6% The best non-Claude score on the hardest benchmark — 60.6% SWE-bench Pro — built for long-horizon coding agents.
4 MiniMax M3 openMiniMax 59.0% Open weights with 1M context, multimodal input and computer use — beats GPT-5.5 on SWE-bench Pro at 5–10% of the cost.
5 GPT-5.5OpenAI 58.6% OpenAI's strongest agentic coder, with the deepest tooling and ecosystem breadth of the closed labs.
6 Kimi K2.6 openMoonshot AI 58.6% A top-three open coder whose 58.6% SWE-bench Pro beats several closed flagships.

From our full AI Coding Leaderboard (2026-07-02). We only rank scores confirmed against primary sources.