Routing prompts offline, durable agents, Gemini consolidation

Tool of the Week

Route prompts offline by complexity, skip model calls

Wayfinder scores prompt structure deterministically (0–1.0 complexity) without an API call, routing cheap queries to local models and hard ones to expensive tiers, replacing learned routers that add latency and cost to the routing decision itself.

Eliminates the cost and latency tax of calling a classifier or LLM to decide which model to use—the routing decision is now free and reproducible. Developers pay top-tier prices only for prompts that actually need them, not for summarization or typo fixes.

Replaces RouteLLM and hosted routers (NotDiamond, OpenRouter Auto) with a local-first alternative. Requires two tiers (local + cloud), OpenAI-compatible endpoints, and a single TOML config. Ready now—zero-install CLI demo available. Honest tradeoff: wins on structural complexity (length, code, lists), loses on pure-semantic hard cases ('what is the 100th prime number?'). Worth trying if you're already multi-tier; skip if prompts are semantically subtle.

“deterministic, sub-millisecond, and entirely offline — no API key, no network, no model call to make it”
“Cheap prompts stay local and hard ones go to the expensive model, so you stop paying top-tier prices for "summarize this" and "fix my typo."”
“Most routers decide by calling a model: a trained classifier, an LLM judge, or a hosted API. That adds latency, cost, and randomness to the exact step meant to save you money.”
“a double-blind test showed the lift doesn't generalize (it caught ~20% of unseen hard prompts and lost to a plain word-count baseline)”
“pip install wayfinder-router”

routingmodel-selectioncost-optimizationinference-gatewayoffline-scoring

Dev Signal

Get issues like this in your inbox — free, every weekday.

Quick Signals

Agents SDK adds durable background runs, unified turn entry

Detached sub-agents now survive deploys and evictions via durable backbone, eliminating fire-and-forget patterns; unified runTurn() entry point consolidates three admission modes into one.

Background work no longer abandons on deploy or reconnect, cutting boilerplate around completion callbacks and recovery plumbing. Single turn-admission path eliminates deadlock risks from nested blocking calls.

Replaces manual onFinish wiring and turn-admission branching (saveMessages/continueLastTurn/chat). Requires updating runAgentTool calls to use detached config and runTurn() for all admission. Worth adopting now if running long-lived agent workflows; backward-compatible for existing code.

“first-class detached (background) sub-agent runs with live progress and durable milestones”
“Durable, exactly-once-on-the-happy-path completion via a warm fast path plus a self-scheduling reconcile backbone that survives eviction and deploys”
“all entry points now route through a shared internal admission path that throws a clear error on nested blocking admissions that previously could deadlock”

agents-sdkbackground-jobsdurabilitycloudflare-workersai-agents

Vercel adds observability dashboard for eve agents

Agent Runs tab in Vercel dashboard now surfaces trigger, duration, token usage, and per-step execution traces with dual views for developers and auditors.

Debugging agent failures moves from parsing function logs to correlated step-by-step execution traces. Dual modes (raw JSON for engineers, plain-English summaries for reviewers) reduce context switching between technical and non-technical stakeholders.

Data Point

MLLMs fail hidden social norms in embodied planning

State-of-the-art multimodal models achieve explicit goals 67.3% of the time but comply with hidden social norms only 26.4%—gap stems from context grounding, not social knowledge.

If you're building embodied AI agents or egocentric planners, this benchmark exposes a critical failure mode: your model may solve the stated task while violating implicit constraints that users expect. NormPerceptor's cue-generation approach offers a concrete pattern for injecting norm-detection into planning pipelines.

NormAct benchmark replaces hand-coded norm checks with systematic evaluation; NormPerceptor requires a separate norm-inference stage before action planning. The 24.2% → 46.7% improvement on Task Success suggests this is worth integrating now, but only if your deployment context involves repeated human interaction where norm violations compound.

“models achieve explicit goals in 67.3% of cases, but comply with hidden norms in only 26.4%”
“a context-conditioned cue generator that infers scene-relevant norms prior to planning, increasing Task Success from 24.2% to 46.7%”
“challenges in activating and grounding relevant norms in context”

embodied-aimllm-planningnorm-compliancebenchmarksocial-constraints

Enjoying Dev Signal? Get every issue in your inbox.

Free forever · 3 issues a week · One-click unsubscribe

Refer a friend →

Earn rewards for every developer you bring in.

Go premium →

Sponsor-free feed · full archive search · $149 lifetime.

Routing prompts offline, durable agents, Gemini consolidation

Route prompts offline by complexity, skip model calls

Quick Signals

Agents SDK adds durable background runs, unified turn entry

Vercel adds observability dashboard for eve agents

MLLMs fail hidden social norms in embodied planning

GPT-5.6 launches with three tiered models

Vercel releases Eve agent framework with durable execution

Google consolidates Gemini behind one agent API