Terra matches GPT-5.5 performance at half cost; Luna offers lowest-cost baseline; prompt caching now supports explicit breakpoints and 30-minute minimums.
Summary
Cost-per-inference drops significantly across tiers, directly impacting feasibility of production inference workloads. Predictable cache behavior with explicit breakpoints reduces latency variance in cached query patterns.
Why it matters
Cost-per-inference drops significantly across tiers, directly impacting feasibility of production inference workloads. Predictable cache behavior with explicit breakpoints reduces latency variance in cached query patterns.
Implementation verdict
Terra replaces GPT-5.5 for cost-sensitive deployments. Luna targets budget-constrained batch inference. Requires: token budget recalculation, cache-breakpoint architecture decisions, pricing model updates. Limited preview access blocks immediate adoption—watch for general availability rollout.
Sources
Dev Signal
Get briefs like this in your inbox — free, every weekday.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.