Gemini 2.0 Flash
by Google DeepMind·USA·Released
Fast, cheap, multimodal — 1M context at the lowest tier-1 pricing.
About this model
Gemini 2.0 Flash (December 2024) is Google's fast, cheap, multimodal tier — $0.10/M input, $0.40/M output, with 1M-token context as a default rather than a premium feature. The combination of price and context length is genuinely unique: no other frontier-class model offers 1M tokens at this price point.
Flash isn't quite at Claude Sonnet 4 quality on hard reasoning, but it's competitive enough for most production use cases — and the cost-per-token gap is enormous. For high-volume RAG pipelines, Workspace copilots, or batch processing, Flash is often the right choice.
Strengths
- •Cheapest 1M-context model on the market
- •Multimodal: vision, audio, video input — not text-only
- •Sub-second latency for short prompts
- •Generous free tier via Google AI Studio
Limitations
- •Quality gap vs Gemini 2.5 Pro on hard reasoning tasks
- •Tool-call format is Google-specific
- •Audio/video input quotas can be restrictive on the free tier
When to use it
- →High-volume RAG pipelines
- →Batch document processing at low cost
- →Real-time chat where latency budget is tight
- →Multimodal classification (image / video / audio tagging)
Architecture & training
Same Gemini 2.x architectural family as 2.5 Pro — sparse MoE with native multimodal training — but with a smaller activated parameter count. Optimised aggressively for tokens-per-second on Google's TPU serving infrastructure.
Benchmarks
| Benchmark | Score | Bar |
|---|---|---|
| MATH | 75.6 | |
| MMLU | 78.3 |