AI Gateway routing rules block and redirect models
Apply firewall-style model rewrites and denials at the gateway level—no code changes needed when models fail or get retired.
Eliminates deploy cycles for model switching. When a model goes down or you need to enforce team policies, a single CLI command applies instantly across all requests using those credentials.
Replaces in-app fallback logic and manual code deployments for model substitution. Requires Vercel AI Gateway and CLI access; works immediately for Rewrite (swap one model for another) and Deny (block requests with 403). Ready now in beta—low friction if already on Vercel stack.
“Routing rules are firewall-style rules that control which models your team can use, applied at the gateway level instead of in your application code”
“When a model goes down or gets retired, you usually have to ship a code change to move off it. With routing rules, you push one rule and every request reroutes instantly”
“Rules apply to every request made with your team's AI Gateway credentials”
“Routing rules are in beta”
ai-gatewayroutingmodel-fallbackverceldeployments
Dev Signal
Get issues like this in your inbox — free, every weekday.
Quick Signals
Nano Banana 2 Lite replaces first-gen image model
Swappable image model generates 1K images in 4 seconds at $0.034/1K; pair with Gemini Omni Flash ($0.10/sec video) to chain image-to-video workflows.
Reduces latency for interactive prototyping and cuts per-image costs by optimizing for speed-first pipelines. Omni Flash adds video generation and natural-language editing to the same API surface, enabling end-to-end multimedia workflows without external tools.
Nano Banana 2 Lite is a drop-in replacement for gemini-2.5-flash-image; use it now for drafting and ideation. Omni Flash (gemini-omni-flash-preview) is production-ready for video generation but limited to 10-second outputs and lacks audio/scene extension on API. Start with image generation first, then test Omni Flash if your workflow needs video.
“Delivers text-to-image outputs in 4 seconds”
“Cost-efficient choice for developers focused on drafting, ideating, managing operational budgets or low-bandwidth usage”
“priced competitively at $0.10 per second of video output”
“it's our recommended replacement for developers currently using our first version of Nano Banana (gemini-2.5-flash-image), you can swap it out now for immediate benefits”
“Omni offers 10-second video generations currently, with longer durations coming soon”
Ornith-1.0 open-source coding agents ship four sizes
MIT-licensed agentic models (9B–397B) trained with RL to optimize both solution rollouts and search scaffolds, available in dense and MoE variants with 256K context and OpenAI-compatible serving.
Data Point
Agents fail to teach users what they want
Interactive recommender agents achieve only 56% accuracy because they don't expand user knowledge during conversation—the bottleneck is preference formation, not item search.
If you're building agentic recommendation or preference-elicitation systems, this paper quantifies a hard constraint: clarifying questions alone don't work when users lack domain knowledge. Your agent needs explicit teaching mechanisms (examples, explanations) to move the needle on task specification.
This doesn't replace existing systems yet—it's a diagnostic. CoShop benchmark reveals that five-turn interactions with frontier models don't actually educate users about their own preferences. If you're shipping an agent that relies on user clarity, this is a warning: invest in knowledge-building dialog actions before optimizing search.
“no agent exceeds 56% accuracy on CoShop despite five turns of interaction”
“Failures stem not from agents' ability to find items, but from how little the interaction expands what users know about what they want”
“Users often lack the domain knowledge to have completely specified preferences”
Replaces closed-model dependency for agentic coding workflows; dense 9B runs on single 80GB GPU, MoE variants shard across multi-GPU nodes. Benchmark numbers show competitive or better performance than comparable open baselines on SWE-bench, Terminal-Bench, and NL2Repo.
Production-ready for self-hosted deployment. Requires transformers ≥5.8.1, vLLM ≥0.19.1, or SGLang ≥0.5.9. Model outputs reasoning traces in <think> blocks; parsers surface tool_calls and reasoning_content separately. Dense 9B is lowest-friction entry point; MoE 35B/397B demand multi-GPU infrastructure. Worth trying now if you control serving infra.
“Available in 9B-Dense, 31B-Dense, 35B-MoE, and 397B-MoE”
“MIT licensed, globally accessible, and free from regional limitations”
“the model discovers better search trajectories and generates higher-quality solutions”
Private blob storage now GA with OIDC token auth and scoped signed URLs—replaces static credential management for sensitive file access.
Eliminates long-lived credentials from environment variables and enables temporary, operation-scoped access tokens for client-side uploads without exposing server credentials. Critical for handling user data, invoices, and agent memory with fine-grained access control.
Ready now. Drop-in API change (`access: 'private'` parameter). Requires Vercel runtime for OIDC auto-rotation; CLI supports OIDC for local workflows. Signed URLs replace presigned S3 patterns. Worth adopting immediately if storing sensitive files—reduces credential sprawl and audit surface.
“Vercel Private Blob is now generally available for all plans”
“Functions running on Vercel now authenticate to Vercel Private Blob with a short-lived, auto-rotating OIDC token scoped to the project, with no static read-write token in your environment”
“Mint a URL scoped to a single operation, pathname, and an expiration date you choose for up to 7 days”
“Specify `access: 'private'` when uploading a blob”
4B-parameter multilingual TTS model achieves 70ms latency and zero-shot voice adaptation from 3-second samples, priced at $0.016/1k characters via API.
Replaces ElevenLabs for cost-sensitive voice agent deployments where latency matters; zero-shot cross-lingual voice adaptation enables speech-to-speech translation pipelines without separate retraining. Available now in API and open weights.
Production-ready. Integrates into existing STT+LLM stacks. Requires 3-5s voice sample for adaptation. Trade-off: human evaluation shows parity with ElevenLabs v3 quality but superior to v2.5 Flash on naturalness metrics. Worth adopting now for cost optimization if multilingual support needed; otherwise ElevenLabs remains faster iteration path.
“4B parameters, making Voxtral-powered agents natural, reliable, and cost-effective at scale”
“achieves a model latency of 70ms for a typical input voice sample of 10 seconds and 500 characters”
“Voxtral TTS achieves superior naturalness compared to ElevenLabs Flash v2.5 while maintaining similar Time-to-First-Audio (TTFA)”
“the model can adapt to a custom voice with a reference as little as 3s”
“available now via API at $0.016 per 1k characters”
Mistral releases connectors API for enterprise tool integration
Register integrations once via MCP protocol, expose them as native tools across all Mistral APIs without rewriting auth or duplicating code.
Eliminates repeated integration scaffolding across teams—OAuth setup, token refresh, pagination handling now live platform-side, reducing security drift and maintenance burden. Direct tool calling and human-in-the-loop approval give you deterministic control over agent execution.
Replaces ad-hoc tool function maintenance and scattered OAuth implementations. Requires MCP server (local or remote) and SDK adoption; connectors work across Conversation API, Completions API, and Agent SDK. Ready now—in public preview with documented cookbook patterns for GitHub, web search, and custom MCPs.
“All built-in connectors, as well as custom MCPs, are now available via API/SDK to be used with all model and agent calls.”
“Set up once, run it all the time, everywhere.”
“teams keep rebuilding the same integration layer”
“A connector solves this by packaging an integration into a single, reusable entity using the MCP protocol.”
“tool_configuration controls the tool availability without modifying the connector itself”