the ranking

#ModelSWE-bench VerifiedBest for
1 Claude Opus 4.8Anthropic ~86% The hardest agentic refactors and long, autonomous multi-file tasks where every point of accuracy saves a human review cycle.
2 Claude Sonnet 5Anthropic 85.2% The best closed-model value — near-Opus scores at ~2.5× less, and the default daily driver for most developers.
3 GPT-5.5OpenAI 82.6% OpenAI's strongest agentic coder, with the deepest tooling and ecosystem breadth of the closed labs.
4 DeepSeek V4 Pro openDeepSeek 80.6% The cheapest frontier-class coder — top open-weights score at ~11× less than Opus. Best pick when cost or self-hosting rules.
5 Gemini 3.1 ProGoogle DeepMind 80.6% Google's strongest coding model today, with deep Workspace/Cloud integration. (A 3.5 Pro is expected but not shipped.)
6 MiniMax M3 openMiniMax 80.5% Open weights with 1M context, multimodal input and computer use — beats GPT-5.5 on SWE-bench Pro at 5–10% of the cost.

From our full AI Coding Leaderboard (2026-07-02). We only rank scores confirmed against primary sources.