[STATUS] headers and task-tier model routing eliminate redundant re-parsing and overprovisioning, cutting session tokens from 12,400 to 5,100 with zero quality loss.
Summary
Agent loops burn tokens re-reading unchanged history and running expensive models on cheap tasks. These patterns shift cost from per-token overhead to architecture, freeing budget for longer reasoning where it matters.
Why it matters
Agent loops burn tokens re-reading unchanged history and running expensive models on cheap tasks. These patterns shift cost from per-token overhead to architecture, freeing budget for longer reasoning where it matters.
Implementation verdict
Replaces full-context-recap patterns with append-only [STATUS] blocks and task-based model selection. Requires prompt restructuring and router config; no new infrastructure. Worth trying immediately—[STATUS] pattern alone saves ~15% per session with zero risk.
Sources
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.