Route prompts offline by complexity, skip model calls
Wayfinder scores prompt structure deterministically (0–1.0 complexity) without an API call, routing cheap queries to local models and hard ones to expensive tiers, replacing learned routers that add latency and cost to the routing decision itself.
Eliminates the cost and latency tax of calling a classifier or LLM to decide which model to use—the routing decision is now free and reproducible. Developers pay top-tier prices only for prompts that actually need them, not for summarization or typo fixes.
Replaces RouteLLM and hosted routers (NotDiamond, OpenRouter Auto) with a local-first alternative. Requires two tiers (local + cloud), OpenAI-compatible endpoints, and a single TOML config. Ready now—zero-install CLI demo available. Honest tradeoff: wins on structural complexity (length, code, lists), loses on pure-semantic hard cases ('what is the 100th prime number?'). Worth trying if you're already multi-tier; skip if prompts are semantically subtle.
“deterministic, sub-millisecond, and entirely offline — no API key, no network, no model call to make it”
“Cheap prompts stay local and hard ones go to the expensive model, so you stop paying top-tier prices for "summarize this" and "fix my typo."”
“Most routers decide by calling a model: a trained classifier, an LLM judge, or a hosted API. That adds latency, cost, and randomness to the exact step meant to save you money.”
“a double-blind test showed the lift doesn't generalize (it caught ~20% of unseen hard prompts and lost to a plain word-count baseline)”
Detached sub-agents now survive deploys and evictions via durable backbone, eliminating fire-and-forget patterns; unified runTurn() entry point consolidates three admission modes into one.
Background work no longer abandons on deploy or reconnect, cutting boilerplate around completion callbacks and recovery plumbing. Single turn-admission path eliminates deadlock risks from nested blocking calls.
Replaces manual onFinish wiring and turn-admission branching (saveMessages/continueLastTurn/chat). Requires updating runAgentTool calls to use detached config and runTurn() for all admission. Worth adopting now if running long-lived agent workflows; backward-compatible for existing code.
“first-class detached (background) sub-agent runs with live progress and durable milestones”
“Durable, exactly-once-on-the-happy-path completion via a warm fast path plus a self-scheduling reconcile backbone that survives eviction and deploys”
“all entry points now route through a shared internal admission path that throws a clear error on nested blocking admissions that previously could deadlock”
Vercel adds observability dashboard for eve agents
Agent Runs tab in Vercel dashboard now surfaces trigger, duration, token usage, and per-step execution traces with dual views for developers and auditors.
Debugging agent failures moves from parsing function logs to correlated step-by-step execution traces. Dual modes (raw JSON for engineers, plain-English summaries for reviewers) reduce context switching between technical and non-technical stakeholders.
Data Point
MLLMs fail hidden social norms in embodied planning
State-of-the-art multimodal models achieve explicit goals 67.3% of the time but comply with hidden social norms only 26.4%—gap stems from context grounding, not social knowledge.
If you're building embodied AI agents or egocentric planners, this benchmark exposes a critical failure mode: your model may solve the stated task while violating implicit constraints that users expect. NormPerceptor's cue-generation approach offers a concrete pattern for injecting norm-detection into planning pipelines.
NormAct benchmark replaces hand-coded norm checks with systematic evaluation; NormPerceptor requires a separate norm-inference stage before action planning. The 24.2% → 46.7% improvement on Task Success suggests this is worth integrating now, but only if your deployment context involves repeated human interaction where norm violations compound.
“models achieve explicit goals in 67.3% of cases, but comply with hidden norms in only 26.4%”
“a context-conditioned cue generator that infers scene-relevant norms prior to planning, increasing Task Success from 24.2% to 46.7%”
“challenges in activating and grounding relevant norms in context”
3 issues a week · Free forever · 4,200+ developers
Replaces custom logging within eve projects for production monitoring. Requires no setup—automatically appears for all eve projects on Vercel. Retention limits (12 hours Hobby, 1 day Pro, 3 days Enterprise) demand Observability Plus upgrade for compliance-heavy workflows. Ready now if you're already on Vercel; consider data retention floor before adopting.
“Agent Runs tab appears automatically for every eve project, surfacing trigger, duration, and token usage for each session at a glance”
“Drill into any run to inspect every turn, model call, and tool call in the conversation”
“Runtime errors that previously vanished into function logs now correlate to the failing step”
“Run data is encrypted by default”
“Teams using a custom OpenTelemetry backend can still export AI SDK spans from agent/instrumentation.ts to any destination, such as Braintrust or Datadog”
Terra matches GPT-5.5 performance at half cost; Luna offers lowest-cost baseline; prompt caching now supports explicit breakpoints and 30-minute minimums.
Cost-per-inference drops significantly across tiers, directly impacting feasibility of production inference workloads. Predictable cache behavior with explicit breakpoints reduces latency variance in cached query patterns.
Terra replaces GPT-5.5 for cost-sensitive deployments. Luna targets budget-constrained batch inference. Requires: token budget recalculation, cache-breakpoint architecture decisions, pricing model updates. Limited preview access blocks immediate adoption—watch for general availability rollout.
“Terra has competitive performance to GPT‑5.5 while being 2x cheaper and Luna brings strong capability at our lowest cost”
“GPT‑5.6 is priced per 1M tokens across three model sizes: Sol is $5 input / $30 output; Terra is $2.50 input / $15 output; and Luna is $1 input / $6 output”
“GPT‑5.6 also introduces more predictable prompt caching, including support for explicit cache breakpoints and a 30-minute minimum cache life”
“cache writes are billed at 1.25x the model's uncached input rate, while cache reads continue to receive the 90% cached-input discount”
gpt-5.6pricingprompt-cachingmodel-tiers
Vercel releases Eve agent framework with durable execution
Eliminates boilerplate for production agent patterns: pause/resume across failures, human-in-the-loop approvals, integrated observability, and code sandboxing now built-in rather than bolted together. Accelerates shipping agents from prototype to production deployment.
Eve replaces manual orchestration layers (LangGraph + separate durable execution + approval systems + observability glue). Requires TypeScript, Vercel deployment, and filesystem discipline. Ready now in public preview, but vendor lock-in risk remains until portability proven outside Vercel ecosystem.
“filesystem-based project structure that organizes an agent into directories for instructions, tools, skills, subagents, communication channels, and scheduled tasks”
“Each conversation is stored as a durable workflow that can pause, survive failures or deployments, and resume from the last completed step”
“Agent-generated code executes inside isolated sandboxes that can run locally using Docker or other adapters, or in production using Vercel Sandbox”
“already used internally to operate more than one hundred production agents supporting functions such as analytics, customer support, sales operations, and content review”
Interactions API moves agent state, routing, and background execution server-side—the plumbing that closes coordination gaps in multi-step AI pipelines now belongs inside one endpoint instead of spread across four frameworks.
End-to-end reliability collapses in seams between models and tools (6-step pipeline at 97% per step = 83% total); this API eliminates the three-month scaffolding tax teams burn stitching together state stores, queues, and routing layers before shipping product logic.
Replaces LangGraph orchestration patterns and custom agent state management for Gemini teams. Requires migration from chat-completions style (resend full context) to server-side state model (interact by ID). Ready now—GA shipped June 26, 2026 with stable schema and all docs defaulting to it. Worth adopting if you're on Gemini; cost model and quota limits not disclosed in source.
“a single unified endpoint for Gemini models and agents with server-side state, background execution, tool combination and multimodal generation”
“now our primary API”
“A six-step agentic pipeline where each step is 97% reliable is only about 83% reliable end-to-end”
“Whether you're calling a model or running an agent, the Interactions API gets you there in a few lines of code”