GPT-4o

by OpenAI·USA·Released May 13, 2024

OpenAI's multimodal flagship — text, vision, audio, and image in one model.

textvisionaudiocodechattoolslong-contextvoice

Vendor site

— · 0 reviews

About this model

GPT-4o (released May 2024) was OpenAI's first model with native voice, vision, and text in a single network. The 'o' stands for 'omni' — it can accept any combination of those modalities as input and respond in voice or text in real time. ChatGPT's Advanced Voice Mode is powered by GPT-4o.

On text tasks GPT-4o is broadly comparable to Claude 3.5 Sonnet — 88.7% MMLU, 90.2% HumanEval — with the multimodal capability as the main differentiator. Pricing has dropped multiple times since launch and currently sits at $2.50/M input, $10/M output as of the most recent revision.

For text-only workloads, newer specialist models (GPT-4.1 for long context, o1/o3-mini for reasoning) often outperform GPT-4o. But for any workflow that needs vision + voice + text in one model, GPT-4o remains the default OpenAI choice.

Strengths

•Native multimodal: voice, vision, text, image generation in one model
•Real-time Advanced Voice Mode in ChatGPT
•Aggressive price drops since launch — now $2.50/M input
•Mature ecosystem of fine-tuning and tool support

Limitations

•Beaten by GPT-4.1 on coding and long-context tasks
•Beaten by o1/o3-mini on reasoning-heavy tasks
•128K context — smaller than GPT-4.1 (1M) or Gemini 2.5 Pro (2M)
•No native video generation (Sora is a separate model)

When to use it

→Voice-enabled assistants (ChatGPT Advanced Voice Mode)
→Multimodal chat with image upload and analysis
→Customer support combining voice + vision
→General-purpose default when one model needs all modalities

Architecture & training

OpenAI has not disclosed parameter count for GPT-4o. The technical innovation is the unified end-to-end architecture: voice input is tokenised directly into the same model rather than routed through a separate ASR system, which is why response latency is much lower than the original GPT-4-with-Whisper voice pipeline. Post-training uses RLHF and follows OpenAI's standard model spec.

Benchmarks

Benchmark	Score	Bar
MATH	76.6
MMLU	88.7
HumanEval	90.2

GPT-4o

About this model

Strengths

Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Stories about GPT-4o

OpenAI reportedly weighed giving US government 5% stake in company

OpenAI Proposes Giving U.S. Government 5% Stake Worth $42.6 Billion

OpenAI Used Population-Level Core-Dump Analysis to Trace Crashes to a Bad Azure Host and an 18-Year-Old libunwind Bug

OpenAI previews GPT-5.6 Sol, a next-generation model

Compare against

GLM-4.5

Qwen3-Coder

Kimi K2

MiniMax-M1

About this model

✓ Strengths

× Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Stories about GPT-4o

OpenAI reportedly weighed giving US government 5% stake in company

OpenAI Proposes Giving U.S. Government 5% Stake Worth $42.6 Billion

OpenAI Used Population-Level Core-Dump Analysis to Trace Crashes to a Bad Azure Host and an 18-Year-Old libunwind Bug

OpenAI previews GPT-5.6 Sol, a next-generation model

Compare against

GLM-4.5

Qwen3-Coder

Kimi K2

MiniMax-M1

Strengths

Limitations