WED, 03 JUN 2026 · 18:33:40 UTC

Gemini 2.0 Flash

by Google DeepMind·USA·Released

Fast, cheap, multimodal — 1M context at the lowest tier-1 pricing.

textvisionaudiovideochattoolslong-context
Vendor site
· 0 reviews

About this model

Gemini 2.0 Flash (December 2024) is Google's fast, cheap, multimodal tier — $0.10/M input, $0.40/M output, with 1M-token context as a default rather than a premium feature. The combination of price and context length is genuinely unique: no other frontier-class model offers 1M tokens at this price point.

Flash isn't quite at Claude Sonnet 4 quality on hard reasoning, but it's competitive enough for most production use cases — and the cost-per-token gap is enormous. For high-volume RAG pipelines, Workspace copilots, or batch processing, Flash is often the right choice.

Strengths

  • Cheapest 1M-context model on the market
  • Multimodal: vision, audio, video input — not text-only
  • Sub-second latency for short prompts
  • Generous free tier via Google AI Studio

Limitations

  • Quality gap vs Gemini 2.5 Pro on hard reasoning tasks
  • Tool-call format is Google-specific
  • Audio/video input quotas can be restrictive on the free tier

When to use it

  • High-volume RAG pipelines
  • Batch document processing at low cost
  • Real-time chat where latency budget is tight
  • Multimodal classification (image / video / audio tagging)

Architecture & training

Same Gemini 2.x architectural family as 2.5 Pro — sparse MoE with native multimodal training — but with a smaller activated parameter count. Optimised aggressively for tokens-per-second on Google's TPU serving infrastructure.

Benchmarks

BenchmarkScoreBar
MATH75.6
MMLU78.3

Reviews · 0

Sign in to leave a rating.

Stories about Gemini 2.0 Flash

More →

Compare against

All models →