Gemini 3.1 Flash-Lite launches at scale pricing
New model delivers 2.5X faster time-to-first-token than 2.5 Flash at $0.25/1M input tokens, targeting high-volume inference workloads with selectable reasoning depth.
May 20, 2026
Summary
Reduces inference cost and latency for production translation, moderation, and UI generation pipelines. Thinking levels let you dial reasoning up/down per request, managing cost-quality tradeoffs at scale.
Why it matters
Reduces inference cost and latency for production translation, moderation, and UI generation pipelines. Thinking levels let you dial reasoning up/down per request, managing cost-quality tradeoffs at scale.
Implementation verdict
Replaces 2.5 Flash for latency-sensitive, high-volume tasks. Requires migrating inference calls to Gemini API or Vertex AI; preview status means production readiness TBD. Worth benchmarking against your current model on actual workloads now.
Sources
- 1.Priced at just $0.25/1M input tokens and $1.50/1M output tokens
- 2.2.5X faster Time to First Answer Token and 45% increase in output speed, according to the Artificial Analysis benchmark
- 3.comes standard with thinking levels in AI Studio and Vertex AI, giving developers the control and flexibility to select how much the model "thinks" for a task
- 4.3.1 Flash-Lite achieves an impressive Elo score of 1432 on the Arena.ai Leaderboard
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.