Claude Opus 4

by Anthropic·USA·Released May 22, 2025

Anthropic's frontier model with extended thinking, leading SWE-bench Verified.

textvisioncodechatreasoningagentstoolslong-contextcomputer-use

Vendor site

— · 0 reviews

About this model

Claude Opus 4 is Anthropic's frontier model, released in May 2025. It introduced 'extended thinking' — an optional mode where the model spends additional compute on internal reasoning before responding. On Anthropic's own SWE-bench Verified evaluation it scored 72.5%, putting it at the top of the leaderboard for autonomous coding agents at launch.

Opus 4 is designed for long-horizon work where reliability matters more than per-token cost. The model is the default backbone for Claude Code (Anthropic's CLI coding agent) and is widely used inside Cursor, Cline, Aider, and Windsurf via the Model Context Protocol — Anthropic's open spec for tool use that has been adopted across most major coding assistants.

The trade-off is price: at $15/M input and $75/M output, Opus 4 is the most expensive widely-available frontier model. Most production teams reserve it for the hardest queries and route the rest to Sonnet 4.

Strengths

•Top-tier SWE-bench Verified at launch (72.5%)
•Extended thinking mode for hard reasoning problems
•Native MCP tool calls supported across most coding assistants
•Computer-use API (browser/desktop control) — most reliable in the industry
•200K context with strong recall through the full window

Limitations

•Most expensive widely-available frontier model ($75/M output)
•No image or video generation — text output only
•Closed weights; no fine-tuning or on-prem option
•Extended thinking adds noticeable latency for queries that don't need it

When to use it

→Long-horizon autonomous coding (Claude Code, Cline, Cursor agent mode)
→Whole-repo refactors and architectural changes
→High-stakes legal / financial document analysis
→Computer-use agents driving browsers or desktop apps
→Research and writing tasks where accuracy beats throughput

Architecture & training

Anthropic has not disclosed parameter count or architecture details beyond 'transformer-based.' The published model card emphasises Constitutional AI post-training — Anthropic's RLAIF technique where a critique model rewrites outputs against a written constitution — alongside standard RLHF from human preferences. Training data is described as 'a diverse mix of publicly available internet data, licensed data from third parties, and data provided by users or contractors,' with opt-out enabled by default for API customers.

Benchmarks

Benchmark	Score	Bar
GPQA	79.6
MMLU	88.8
SWE-bench Verified	72.5

Claude Opus 4

About this model

Strengths

Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Stories about Claude Opus 4

Anthropic Details Claude Fable 5 Cyber Safeguards and Jailbreak Framework

Anthropic rolls out Claude Code Artifacts to Pro and Max subscribers

Report: Anthropic in Talks With Samsung to Develop Custom AI Chip

Anthropic announces $100K 'Built with Claude: Life Sciences' hackathon

Compare against

Claude Sonnet 4

Claude 3.5 Haiku

GLM-4.5

Qwen3-Coder

About this model

✓ Strengths

× Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Stories about Claude Opus 4

Anthropic Details Claude Fable 5 Cyber Safeguards and Jailbreak Framework

Anthropic rolls out Claude Code Artifacts to Pro and Max subscribers

Report: Anthropic in Talks With Samsung to Develop Custom AI Chip

Anthropic announces $100K 'Built with Claude: Life Sciences' hackathon

Compare against

Claude Sonnet 4

Claude 3.5 Haiku

GLM-4.5

Qwen3-Coder

Strengths

Limitations