head-to-head
| Metric | Claude Fable 5 | GPT-5.5 |
|---|---|---|
| SWE-bench Verified | 95.0% | 82.6% |
| SWE-bench Pro | 80.3% | 58.6% |
| Terminal-Bench | — | 82.7% (TB2.0) |
| Input $ / 1M | $10 | — |
| Context | 1M | — |
| Open weights | No | No |
| Maker | Anthropic | OpenAI |
when to pick each
Mythos-class flagship for long-horizon agentic runs: the model to reach for when a task spans hours and hundreds of tool calls and has to actually finish.
OpenAI's strongest agentic coder, with the deepest tooling and ecosystem breadth of the closed labs.
Full reviewsClaude Fable 5, decoded
Ranked on our AI Coding Leaderboard, updated 2026-07-03. Scores are confirmed against primary sources; prices are per 1M input tokens and can change.
- AnthropicGENZ TECH — Claude Fable 5 returns — SWE-bench Verified 95.0% (vals.ai independent eval) is the highest confirmed score of any model. SWE-bench Pro 80.3% uses Anthropic's own scaffolding and is contested. Restored Jul 1, 2026 after a 20-day export-control suspension. Pricing $10/$50 per 1M.
- OpenAIvals.ai — SWE-bench Verified (independent) — Verified score from vals.ai independent eval; Pro is OpenAI-reported (rivals flag possible memorization on Pro).
- BenchmarkSWE-bench — the real-GitHub-issue benchmark