the ranking
| # | Model | SWE-bench Verified | Best for |
|---|---|---|---|
| 1 | Claude Opus 4.8Anthropic | ~86% | The hardest agentic refactors and long, autonomous multi-file tasks where every point of accuracy saves a human review cycle. |
| 2 | Claude Sonnet 5Anthropic | 85.2% | The best closed-model value — near-Opus scores at ~2.5× less, and the default daily driver for most developers. |
| 3 | GPT-5.5OpenAI | 82.6% | OpenAI's strongest agentic coder, with the deepest tooling and ecosystem breadth of the closed labs. |
| 4 | DeepSeek V4 Pro openDeepSeek | 80.6% | The cheapest frontier-class coder — top open-weights score at ~11× less than Opus. Best pick when cost or self-hosting rules. |
| 5 | Gemini 3.1 ProGoogle DeepMind | 80.6% | Google's strongest coding model today, with deep Workspace/Cloud integration. (A 3.5 Pro is expected but not shipped.) |
| 6 | MiniMax M3 openMiniMax | 80.5% | Open weights with 1M context, multimodal input and computer use — beats GPT-5.5 on SWE-bench Pro at 5–10% of the cost. |
From our full AI Coding Leaderboard (2026-07-02). We only rank scores confirmed against primary sources.