o1

by OpenAI·USA·Released Dec 5, 2024

OpenAI's reasoning flagship — chain-of-thought trained via large-scale RL.

textvisionreasoningmathcode

Vendor site Paper

— · 0 reviews

About this model

o1 (December 2024) was OpenAI's first publicly-released reasoning model — trained via large-scale reinforcement learning to spend more time 'thinking' before responding. The model generates a long internal chain-of-thought (hidden from the user) and then produces a final answer. On competition math (AIME 2024) o1 scored 83.3%, vs ~13% for GPT-4o, demonstrating that test-time compute scaling works.

o1's reasoning trace is not visible to the user — OpenAI hides it both for product clarity and to discourage distillation of the chain-of-thought into competitor models. The hidden tokens are still billed at the output rate, which can make o1 surprisingly expensive on hard problems.

o1 doesn't support function calling or streaming in the same way as GPT-4o — it's a 'thinking, then answer' model rather than a conversational agent. For agent workloads, o3-mini or GPT-4.1 are usually better choices.

Strengths

•Massive leap on competition math (AIME, MATH benchmarks)
•Strong on PhD-level science (GPQA Diamond at 78%)
•First publicly-released model demonstrating test-time compute scaling works

Limitations

•No function calling or streaming — not built for agent workflows
•Hidden reasoning tokens still billed at output rate; hard queries get expensive
•Slow: typical responses take 10-60 seconds
•Beaten by o3-mini on most benchmarks at a fraction of the cost

When to use it

→Hard math and competition-style problems
→PhD-level scientific reasoning
→Code generation requiring careful step-by-step thinking
→One-shot answers where latency doesn't matter

Architecture & training

OpenAI's published o1 system card describes the model as 'trained with reinforcement learning to think before responding.' The key innovation is large-scale RL on chain-of-thought generation — the model learns to produce longer, more useful internal reasoning traces. Architecture details are not disclosed. The o1 family was followed in early 2025 by o3 (then o3-mini) which extended the approach with significantly improved math and code performance.

Benchmarks

Benchmark	Score	Bar
AIME	83.3
GPQA	78.0
MATH	94.8

o1

About this model

Strengths

Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Stories about o1

OpenAI reportedly weighed giving US government 5% stake in company

OpenAI Proposes Giving U.S. Government 5% Stake Worth $42.6 Billion

OpenAI Used Population-Level Core-Dump Analysis to Trace Crashes to a Bad Azure Host and an 18-Year-Old libunwind Bug

OpenAI previews GPT-5.6 Sol, a next-generation model

Compare against

o3-mini

GLM-4.5

Qwen3-Coder

Kimi K2

About this model

✓ Strengths

× Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Stories about o1

OpenAI reportedly weighed giving US government 5% stake in company

OpenAI Proposes Giving U.S. Government 5% Stake Worth $42.6 Billion

OpenAI Used Population-Level Core-Dump Analysis to Trace Crashes to a Bad Azure Host and an 18-Year-Old libunwind Bug

OpenAI previews GPT-5.6 Sol, a next-generation model

Compare against

o3-mini

GLM-4.5

Qwen3-Coder

Kimi K2

Strengths

Limitations