Claude Opus 4
by Anthropic·USA·Released
Anthropic's frontier model with extended thinking, leading SWE-bench Verified.
About this model
Claude Opus 4 is Anthropic's frontier model, released in May 2025. It introduced 'extended thinking' — an optional mode where the model spends additional compute on internal reasoning before responding. On Anthropic's own SWE-bench Verified evaluation it scored 72.5%, putting it at the top of the leaderboard for autonomous coding agents at launch.
Opus 4 is designed for long-horizon work where reliability matters more than per-token cost. The model is the default backbone for Claude Code (Anthropic's CLI coding agent) and is widely used inside Cursor, Cline, Aider, and Windsurf via the Model Context Protocol — Anthropic's open spec for tool use that has been adopted across most major coding assistants.
The trade-off is price: at $15/M input and $75/M output, Opus 4 is the most expensive widely-available frontier model. Most production teams reserve it for the hardest queries and route the rest to Sonnet 4.
Strengths
- •Top-tier SWE-bench Verified at launch (72.5%)
- •Extended thinking mode for hard reasoning problems
- •Native MCP tool calls supported across most coding assistants
- •Computer-use API (browser/desktop control) — most reliable in the industry
- •200K context with strong recall through the full window
Limitations
- •Most expensive widely-available frontier model ($75/M output)
- •No image or video generation — text output only
- •Closed weights; no fine-tuning or on-prem option
- •Extended thinking adds noticeable latency for queries that don't need it
When to use it
- →Long-horizon autonomous coding (Claude Code, Cline, Cursor agent mode)
- →Whole-repo refactors and architectural changes
- →High-stakes legal / financial document analysis
- →Computer-use agents driving browsers or desktop apps
- →Research and writing tasks where accuracy beats throughput
Architecture & training
Anthropic has not disclosed parameter count or architecture details beyond 'transformer-based.' The published model card emphasises Constitutional AI post-training — Anthropic's RLAIF technique where a critique model rewrites outputs against a written constitution — alongside standard RLHF from human preferences. Training data is described as 'a diverse mix of publicly available internet data, licensed data from third parties, and data provided by users or contractors,' with opt-out enabled by default for API customers.
Benchmarks
| Benchmark | Score | Bar |
|---|---|---|
| GPQA | 79.6 | |
| MMLU | 88.8 | |
| SWE-bench Verified | 72.5 |