GPT-4o
by OpenAI·USA·Released
OpenAI's multimodal flagship — text, vision, audio, and image in one model.
About this model
GPT-4o (released May 2024) was OpenAI's first model with native voice, vision, and text in a single network. The 'o' stands for 'omni' — it can accept any combination of those modalities as input and respond in voice or text in real time. ChatGPT's Advanced Voice Mode is powered by GPT-4o.
On text tasks GPT-4o is broadly comparable to Claude 3.5 Sonnet — 88.7% MMLU, 90.2% HumanEval — with the multimodal capability as the main differentiator. Pricing has dropped multiple times since launch and currently sits at $2.50/M input, $10/M output as of the most recent revision.
For text-only workloads, newer specialist models (GPT-4.1 for long context, o1/o3-mini for reasoning) often outperform GPT-4o. But for any workflow that needs vision + voice + text in one model, GPT-4o remains the default OpenAI choice.
Strengths
- •Native multimodal: voice, vision, text, image generation in one model
- •Real-time Advanced Voice Mode in ChatGPT
- •Aggressive price drops since launch — now $2.50/M input
- •Mature ecosystem of fine-tuning and tool support
Limitations
- •Beaten by GPT-4.1 on coding and long-context tasks
- •Beaten by o1/o3-mini on reasoning-heavy tasks
- •128K context — smaller than GPT-4.1 (1M) or Gemini 2.5 Pro (2M)
- •No native video generation (Sora is a separate model)
When to use it
- →Voice-enabled assistants (ChatGPT Advanced Voice Mode)
- →Multimodal chat with image upload and analysis
- →Customer support combining voice + vision
- →General-purpose default when one model needs all modalities
Architecture & training
OpenAI has not disclosed parameter count for GPT-4o. The technical innovation is the unified end-to-end architecture: voice input is tokenised directly into the same model rather than routed through a separate ASR system, which is why response latency is much lower than the original GPT-4-with-Whisper voice pipeline. Post-training uses RLHF and follows OpenAI's standard model spec.
Benchmarks
| Benchmark | Score | Bar |
|---|---|---|
| MATH | 76.6 | |
| MMLU | 88.7 | |
| HumanEval | 90.2 |