head-to-head
| Metric | Claude Sonnet 5 | GPT-5.5 |
|---|---|---|
| SWE-bench Verified | 85.2% | 82.6% |
| SWE-bench Pro | 63.2% | 58.6% |
| Terminal-Bench | 80.4% (TB2.1) | 82.7% (TB2.0) |
| Input $ / 1M | $2 | — |
| Context | 1M | — |
| Open weights | No | No |
| Maker | Anthropic | OpenAI |
when to pick each
The best closed-model value — near-Opus scores at ~2.5× less, and the default daily driver for most developers.
OpenAI's strongest agentic coder, with the deepest tooling and ecosystem breadth of the closed labs.
Full reviewsClaude Sonnet 5, decoded
Ranked on our AI Coding Leaderboard, updated 2026-07-02. Scores are confirmed against primary sources; prices are per 1M input tokens and can change.
- AnthropicGENZ TECH — Claude Sonnet 5, decoded — Anthropic-reported. Intro pricing $2/$10 per 1M through Aug 31, 2026, then $3/$15.
- OpenAIvals.ai — SWE-bench Verified (independent) — Verified score from vals.ai independent eval; Pro is OpenAI-reported (rivals flag possible memorization on Pro).
- BenchmarkSWE-bench — the real-GitHub-issue benchmark