the ranking
| # | Model | SWE-bench Pro | Best for |
|---|---|---|---|
| 1 | Claude Opus 4.8Anthropic | 69.2% | The hardest agentic refactors and long, autonomous multi-file tasks where every point of accuracy saves a human review cycle. |
| 2 | Claude Sonnet 5Anthropic | 63.2% | The best closed-model value — near-Opus scores at ~2.5× less, and the default daily driver for most developers. |
| 3 | Qwen3.7 MaxAlibaba | 60.6% | The best non-Claude score on the hardest benchmark — 60.6% SWE-bench Pro — built for long-horizon coding agents. |
| 4 | MiniMax M3 openMiniMax | 59.0% | Open weights with 1M context, multimodal input and computer use — beats GPT-5.5 on SWE-bench Pro at 5–10% of the cost. |
| 5 | GPT-5.5OpenAI | 58.6% | OpenAI's strongest agentic coder, with the deepest tooling and ecosystem breadth of the closed labs. |
| 6 | Kimi K2.6 openMoonshot AI | 58.6% | A top-three open coder whose 58.6% SWE-bench Pro beats several closed flagships. |
From our full AI Coding Leaderboard (2026-07-02). We only rank scores confirmed against primary sources.