Agents SDK: Durable execution + new AI security tools — Dev Signal
Dev Signal/Archive/Agents SDK: Durable execution + new AI security tools
June 24, 2026
Agents SDK: Durable execution + new AI security tools
Share:
Tool of the Week
Agents SDK adds durable browser and code execution
Browser Run exposes Chrome DevTools Protocol directly to models; Code Mode adds durable execution logs and approval gates without custom pause-resume logic.
Eliminates fixed action lists and custom orchestration code for browser automation and external system integration. Agents now recover across deploys and connection churn while pausing for human approval on sensitive actions.
Replaces hand-rolled browser tool wrappers and approval-gate plumbing. Requires Agents SDK update and integration with Cloudflare Workers/Durable Objects. Worth trying now if building production agents that need browser interaction or approval workflows.
“Agents can now browse websites through Browser Run, write code against external tools through Code Mode, use client-provided tools when delegating to Think sub-agents, and recover more reliably from deploys, Durable Object evictions, and connection churn”
“the model writes code against the Chrome DevTools Protocol (CDP) and can inspect pages, capture screenshots, read rendered content, debug frontend behavior, and interact with live browser sessions”
“When the code reaches an approval-gated action, the runtime pauses execution and returns a pending approval. After approval, completed calls replay from the durable log, the approved action runs, and the same code continues”
Get issues like this in your inbox — free, every weekday.
Quick Signals
Mozilla releases open-source AI security scanner
0DIN Scanner bundles 179 real-world jailbreak probes from Mozilla's bug bounty program into a runnable test suite built on NVIDIA's GARAK framework, with graphical UI and cross-model comparative analysis.
Developers deploying AI systems now have access to adversarial test cases derived from production attacks rather than textbook examples, reducing the gap between security assumptions and actual threat surface. The free tier eliminates the excuse of "no red team bandwidth" for organizations doing AI in production.
Replaces manual prompt-injection testing and generic benchmarking suites. Requires model API access and a few minutes to configure scans. Ready now: code is open-source on GitHub, free assessments available, and six novel attack techniques are publicly named for the first time. Start with the free assessment if you're shipping AI without adversarial testing.
“179 community probes covering 35 vulnerability families”
“Built on NVIDIA's GARAK open-source framework”
“six specialty probes drawn exclusively from our bug bounty library”
“probes drawn directly from our bug bounty program, where security researchers compete to find novel techniques to manipulate, extract data from, and subvert AI systems”
“Security teams can see attack success rates, a vulnerability breakdown, and a comparison against the frontier models”
Claw Patrol intercepts agent tool calls at the network layer, parsing and filtering by protocol semantics (SQL verbs, K8s resources, HTTP paths) before credentials are injected, eliminating the trust problem of giving agents production access.
Enjoying Dev Signal? Get every issue in your inbox.
Free forever · 3 issues a week · One-click unsubscribe
3 issues a week · Free forever · 4,200+ developers
Agents need real production system access to be useful, but credential theft via prompt injection or hallucination is one tool call away. Moving credential injection and request filtering outside the agent process removes the attack surface entirely—a compromised agent never holds the keys.
Replaces homegrown credential proxies and LLM gateways for non-HTTP protocols. Requires WireGuard/Tailscale tunnel setup, HCL rule authoring, and protocol support (currently K8s, SQL, HTTP; others require custom parsing). Alpha software with five-minute setup documented. Worth adopting now if you run agents against Postgres, Kubernetes, or multi-protocol backends; skip if agents only call REST APIs.
“An agent cannot be trusted to police itself. The agent process holds tools (psql, kubectl, gh, curl) and the credentials those tools need.”
“Credentials live on the gateway, not the agent. The agent sends a placeholder like {{github_pat}} and the gateway swaps in the real token on the wire.”
“Rules match on parsed protocol facets: HTTP method, path, and body; SQL verb, tables, and functions; Kubernetes verb, resource, and namespace.”
Injected errors turn AI agents into code executors
Attackers plant commands in Sentry error reports via publicly-exposed DSNs; agents execute them as trusted guidance through MCP, bypassing all standard security controls.
If your team routes Sentry issues to coding agents (Claude Code, Cursor, Codex), a single crafted error report can execute arbitrary code on developer machines with full access to credentials, CI/CD tokens, and cloud keys. This bypasses EDR, firewalls, and IAM because every step is authorized.
No patch exists yet—this is a fundamental model-layer flaw where agents cannot distinguish data from instructions. Immediate: audit Sentry DSN exposure in your codebase (Censys queries, GitHub searches). Rotate any exposed DSNs. Longer term: isolate AI agents in sandboxed runtimes with runtime controls that gate external command execution. Not safe to ignore if you use agent-based code fixing today.
“a single fake error report can turn an AI coding agent into a code-execution engine on a developer's own machine”
“The agent cannot tell the data it reads from an instruction to act.”
“Claude Code, Cursor, and Codex all acted on the injected errors, and the team logged more than 100 confirmed executions across separate organizations”
“EDR, WAF, IAM, VPNs, and firewalls register nothing worth flagging”
“Tenet reported 2,388 organizations with injectable DSNs found through passive reconnaissance, of which 71 rank in the Tranco top-1M list of busiest sites”
“Ron Bobrov, a Tenet researcher, reported an 85% success rate across the controlled validation waves”
Vercel Eve separates agents from communication channels
Eve's filesystem-first architecture decouples agent reasoning from platform transport—write once, expose via HTTP, Slack, Discord, or custom webhooks without conditional logic.
Eliminates boilerplate for multi-channel agent deployment and enforces durable session persistence by default, reducing crash-recovery complexity from scratch-built state management.
Replaces custom agent scaffolding and session handling. Requires Node.js runtime and pluggable backend selection (local files → Postgres/Redis/Vercel Workflow). Ready to try now for greenfield agents; integration complexity depends on existing channel requirements.
“Eve draws a hard line between what the agent is and how it communicates”
“Sessions survive server restarts and support reconnection at any event index (?startIndex=N)”
“Eve checkpoints progress at each step boundary (one model call + its tool calls = one step)”
“You can: Start a conversation, ask the agent something / Kill the server (Ctrl+C) / Restart it / Continue the conversation with the same sessionId and continuationToken”
“The World interface has three responsibilities: Storage (persisting runs, steps, hooks via an append-only event log), Queue (dispatching workflow/step invocations with at-least-once delivery), and Streams (real-time event delivery to clients)”
Microsoft packages poisoned to steal developer credentials
73 compromised Microsoft packages executed a 28 KB payload harvesting AWS/Azure/GCP credentials and OIDC tokens when opened in AI coding agents; assume systems are compromised if you used them.
AI agents automatically fetching and executing packages bypass manual code review. Compromised credentials across cloud providers and Kubernetes spread laterally through your infrastructure, not just your local machine.
This doesn't replace anything—it's a detection failure. Requirement: audit all recent AI agent package fetches against the 73 flagged Microsoft repos; rotate credentials for AWS, Azure, GCP, Kubernetes, and password managers immediately if you used them. The attack exploited stolen OIDC tokens to bypass build pipelines, so signature verification alone won't catch it. Worth taking action now if you run AI agents against untrusted package sources.
“73 packages were flagged as malicious”
“The compromise packages executed a 28 KB payload that steals credentials from AWS, Azure, GCP, Kubernetes, password managers, and over 90 developer tool configurations”
“It then spreads laterally through cloud infrastructures to infect other developer machines”
“the one last week made use of the functionality to steal a legitimate Microsoft OIDC token”