Sonnet 5 launches: Opus performance at lower cost — Dev Signal
Dev Signal/Archive/Sonnet 5 launches: Opus performance at lower cost
Sonnet 5 launches: Opus performance at lower cost
Share:
Tool of the Week
Claude Sonnet 5 launches on Vercel AI Gateway
Sonnet 5 reaches Opus-level coding performance at Sonnet pricing; set `model` to `anthropic/claude-sonnet-5` in AI SDK to access it.
Reduces model selection friction for agentic workloads—you can now skip Opus for most tasks, cutting inference costs by 50–67% through August 31. Stronger document parsing and long-context handling directly improve RAG and multi-turn workflows.
Replaces Sonnet 4.6 for coding/agentic work. Requires updating model identifier in AI SDK; zero breaking changes. Launch pricing ($2/$10 per M tokens) expires end-August, then rises to $3/$15—migrate proofs-of-concept now while discounted. Ready immediately.
“Sonnet 5 improves on Sonnet 4.6 across coding and agentic work, reaching outcomes on many tasks that previously needed an Opus model, at Sonnet pricing.”
“Launch pricing of $2 per million input tokens and $10 per million output tokens runs through August 31, 2026.”
“The model is more agentic and follows instructions more closely. Document parsing and long-context memory use are also stronger.”
“set `model` to `anthropic/claude-sonnet-5` in the AI SDK”
Get issues like this in your inbox — free, every weekday.
Quick Signals
Sonnet 5 closes Opus gap at lower cost
Claude Sonnet 5 matches Opus 4.8 performance on agentic tasks—planning, tool use, coding—at $2/$10 per million tokens, replacing Sonnet 4.6 as the default reasoning model across all plans.
Developers can now deploy multi-step autonomous workflows (bug fixes, data exploration, form automation) without paying Opus prices. Early testers report tasks that previously stalled midway now complete end-to-end, reducing manual intervention in agent loops.
Drop-in replacement for Sonnet 4.6 via `claude-sonnet-5` API endpoint. Requires zero integration changes; pricing is lower through August 31 2026 then steps to $3/$15. Worth migrating existing agents immediately if you're hitting Sonnet 4.6 limits on brownfield code, tool use, or multi-step reasoning. Start with staging deployment to verify your cost-per-task improvement.
“its performance is close to that of Opus 4.8, but at lower prices”
“It's a substantial improvement over its predecessor, Sonnet 4.6, on important aspects of agentic performance like reasoning, tool use, coding, and knowledge work”
“it is the default model for Free and Pro plans”
“Sonnet 5 is much more agentic than its predecessors”
“Claude Sonnet 5 was never able to develop a full working exploit”
GitHub Copilot moves from ACP Registry plugin to native agent in JetBrains, no setup required, but requires separate GitHub Copilot subscription.
Data Point
AI agents fail framework migration despite code generation wins
ScarfBench benchmark reveals frontier agents achieve less than 10% behavioral success on Java framework migrations, exposing that compilation success masks deployment and runtime failures.
Before deploying AI-assisted modernization to production, you need realistic benchmarks. ScarfBench exposes that agents are overconfident in their own success—Claude reported 29/30 builds succeeded when only 22 actually built—and the real work is dependency resolution across config, infrastructure, and runtime layers, not source translation.
This doesn't replace your modernization strategy yet. Agents solve portions of migration but cannot independently validate outcomes. Use ScarfBench to benchmark your own tools before production deployment; expect to own build validation, configuration tuning, and environmental troubleshooting regardless of agent success rates.
“Even the strongest current agents achieve less than 10% behavioral success”
“Claude Code reported successful builds for 29 out of 30 whole applications. Only 22 of those applications actually built successfully”
“agents repeatedly returned to configuration-related artifacts while resolving framework differences and dependency issues”
“Migration difficulty depends strongly on the target framework, with Jakarta EE proving particularly challenging”
“The biggest challenge in framework modernization is not translating Java code. It is managing the web of dependencies across configuration, infrastructure, and runtime environments”
3 issues a week · Free forever · 4,200+ developers
Eliminates ACP configuration overhead and improves reliability for developers already in JetBrains IDEs. Copilot CLI commands like /remote and /chronicle now work directly in IDE chat.
Replaces the ACP Registry integration path. Requires: JetBrains IDE update, active GitHub Copilot subscription (separate from JetBrains AI), OAuth login via GitHub account. Ready now—select Copilot from agent picker and authenticate.
“Copilot was previously accessible via the ACP Registry, but this integration takes things further”
“greater stability and availability”
“JetBrains AI users will need an active GitHub Copilot subscription to use this integration. It is not included with your JetBrains AI subscription”
“Copilot authenticates exclusively via OAuth through your GitHub account”
Gemini 3.1 Flash Lite Image now available via Vercel's unified API at $0.034 per 1K images—half the cost of prior Nano Banana, sub-4s latency, multimodal text+image in single call.
Reduces image generation costs and latency for production deployments while consolidating billing/retry logic through a single SDK call. Cuts iteration time on multimodal workflows that previously required separate model calls.
Replaces Nano Banana 2 for cost-sensitive image tasks. Requires setting `model` to `google/gemini-3.1-flash-lite-image` and adding `responseModalities: ['TEXT', 'IMAGE']` to provider options. Ready now—native AI SDK support, no breaking changes.
“generates images alongside text responses”
“generates images alongside text in <4s”
“1K images at $0.034 each, about half the cost of Nano Banana 2”
“model to `google/gemini-3.1-flash-lite-image`”
“AI Gateway reflects provider pricing with no markup and does not charge a platform fee on inference”
Claude Sonnet 5 completes all GitLab benchmark tasks
Claude Sonnet 5 finishes multi-step agent workflows without stalling; GitLab reports 8.8% more issues resolved than Sonnet 4.6, reducing cost per completed task.
Agent task failures mid-execution waste time on diagnosis and re-prompting. A model that finishes reliably shifts developer effort from restarting runs to code review, making agents delegatable rather than supervisory.
Replaces Sonnet 4.6 as the default model tier on GitLab Duo Agent Platform for everyday dev work. Requires GitLab Premium/Ultimate or paid credits; available now across all deployment models. Worth switching today if you're running agents in production—reliability improvements compound with cost efficiency.
“the first model in GitLab's evaluation suite to complete all of our benchmark tasks”
“Sonnet 4.6, its predecessor, completed 93.8% of them”
“It's the first model in our evaluation suite to finish every benchmark task”
“8.8% more issues resolved”
“The most expensive agent failure is often the one that stops halfway”
Graph-first orchestration replaces ad-hoc control flow for multi-agent apps, with built-in human-in-the-loop and durable state that survives process restarts.
Developers can now express complex agent branching, fan-out, approval gates, and retry logic as declarative graphs instead of brittle imperative code. State persists across restarts and runtimes, eliminating manual orchestration plumbing.
Replaces manual agent composition and custom orchestration layers. Requires Go 1.22+ (iter.Seq2) and adoption of graph construction API. Ready now for new projects; existing ADK 1.x agents are compatible. Worth migrating if you're building multi-step workflows with branches or HITL.
“Production agents must classify, branch, fan out, ask a human to approve something, retry on failure, and loop until done”
“A graph is an agent. That wf is just an agent.Agent. It runs in the same runner, launcher, and console you already use — no special harness, no new server”
“the workflow durably waits for the answer”
“a workflow can resume after a process restart, or even across different runtimes, because the interrupt format is shared with Python ADK”