head-to-head
| Metric | GPT-5.5 | DeepSeek V4 Pro |
|---|---|---|
| SWE-bench Verified | 82.6% | 80.6% |
| SWE-bench Pro | 58.6% | 55.4% |
| Terminal-Bench | 82.7% (TB2.0) | 67.9% (TB2.0) |
| Input $ / 1M | — | $0.435 |
| Context | — | 1M |
| Open weights | No | Yes |
| Maker | OpenAI | DeepSeek |
when to pick each
OpenAI's strongest agentic coder, with the deepest tooling and ecosystem breadth of the closed labs.
The cheapest frontier-class coder — top open-weights score at ~11× less than Opus. Best pick when cost or self-hosting rules.
Ranked on our AI Coding Leaderboard, updated 2026-07-02. Scores are confirmed against primary sources; prices are per 1M input tokens and can change.
- OpenAIvals.ai — SWE-bench Verified (independent) — Verified score from vals.ai independent eval; Pro is OpenAI-reported (rivals flag possible memorization on Pro).
- DeepSeekDeepSeek V4 — specs & benchmarks — Independent tracker (llm-stats, June 2026); tied with Gemini 3.1 Pro on Verified, ahead on Pro.
- BenchmarkSWE-bench — the real-GitHub-issue benchmark