gpt-5.6 pricing prompt-caching model-tiers

GPT-5.6 launches with three tiered models

Terra matches GPT-5.5 performance at half cost; Luna offers lowest-cost baseline; prompt caching now supports explicit breakpoints and 30-minute minimums.

Summary

Cost-per-inference drops significantly across tiers, directly impacting feasibility of production inference workloads. Predictable cache behavior with explicit breakpoints reduces latency variance in cached query patterns.

Why it matters

Implementation verdict

Terra replaces GPT-5.5 for cost-sensitive deployments. Luna targets budget-constrained batch inference. Requires: token budget recalculation, cache-breakpoint architecture decisions, pricing model updates. Limited preview access blocks immediate adoption—watch for general availability rollout.

Sources

1.Terra has competitive performance to GPT‑5.5 while being 2x cheaper and Luna brings strong capability at our lowest cost
2.GPT‑5.6 is priced per 1M tokens across three model sizes: Sol is $5 input / $30 output; Terra is $2.50 input / $15 output; and Luna is $1 input / $6 output
3.GPT‑5.6 also introduces more predictable prompt caching, including support for explicit cache breakpoints and a 30-minute minimum cache life
4.cache writes are billed at 1.25x the model's uncached input rate, while cache reads continue to receive the 90% cached-input discount

Dev Signal

Get briefs like this in your inbox — free, every weekday.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs