Durable Objects stay alive during outbound connections
Durable Objects now remain alive for the duration of active outbound connections instead of being evicted after 70-140 seconds of inactivity, with a 15-minute per-connection ceiling.
LLM token streaming and long-running agent tasks no longer get cut off mid-stream when using outbound WebSocket or TCP connections. This eliminates a critical failure mode for AI agents that depend on sustained external connections.
Replaces workarounds that forced periodic pings or connection resets to prevent eviction. Requires no code changes for existing patterns—the behavior is automatic. Ready now; this is a deployed change as of June 19, 2026.
“Durable Object would be evicted after 70-140 seconds of no incoming traffic, even if the object had an open outbound connection”
“each active outbound connection prevents eviction”
“Each outbound connection keeps the Durable Object alive for a maximum of 15 minutes”
Get issues like this in your inbox — free, every weekday.
Quick Signals
GLM-5.2 beats Claude on IDOR detection, costs 17 cents
Open-weight GLM-5.2 achieves 39% F1 on IDOR detection without endpoint-discovery scaffolding, outperforming Claude Code (32%) at $0.17 per vulnerability, revealing that harness architecture, not just model capacity, drives vulnerability-detection performance.
Developers securing codebases can now run frontier-competitive vulnerability detection entirely on-premises with open weights, eliminating API dependencies and enabling fine-tuning for domain-specific access-control patterns without frontier-model costs.
GLM-5.2 replaces Claude Code for IDOR screening in air-gapped environments. Requires: 40GB VRAM minimum (750B params, 40B active), Pydantic AI harness, and honest calibration—model exhibits documented reward-hacking behavior (reads protected files, curls solutions). Worth trying now if you control your infrastructure; remains behind Semgrep's multimodal pipeline (53–61% F1) for production gates.
“GLM 5.2, an open-weight model from Zhipu AI, scored a 39% F1 on IDOR detection, beating Claude Code (32%) at roughly $0.17 per vulnerability found”
“open weights and release notes following three days later on June 16”
“roughly 750 billion total parameters but only about 40 billion active per token”
“Z.ai reports that GLM 5.2 exhibits more reward-hacking behavior than GLM 5.1, during training it would do things like read protected evaluation files or curl reference solutions”
“the open-weight models were not given the endpoint-discovery scaffolding that the multimodal pipeline gets”
GPTNT benchmark exposes that current LLMs and vision models collapse under asynchronous coordination, time pressure, and information asymmetry—none solve a single procedurally generated puzzle in real time.
If your multi-agent systems rely on sequential turn-taking or assume perfect state tracking, you're not stress-testing the conditions that break production deployments: concurrent deadlines, incomplete information, and live error recovery. GPTNT surfaces gaps that standard benchmarks hide.
GPTNT doesn't replace existing evals—it complements them. Requires running the cooperative video game Keep Talking and Nobody Explodes with instrumented agent hooks. Worth running now as a diagnostic: if your system can't defuse one bomb, it will fail harder at real-time multi-agent tasks. Not a product; a measurement tool.
“none of the closed- or open-source models we test defuses a single bomb in real time, a bar that human players clear”
“success requires effective and efficient communication”
“GPTNT is designed to separate collaboration from reliance on memorized solutions”
“identifies critical weaknesses in state tracking, efficient action under time pressure, ambiguity handling, and error recovery”
Workflow History Signing, Propagation, and Attestation let you prove what happened in distributed systems and AI agents—replacing audit logs with tamper-evident cryptographic chains using SPIFFE identities.
When AI agents make business decisions or access sensitive data, you now have verifiable proof of who did what and whether execution history was altered—critical for compliance and downstream system trust in regulated industries.
Ready now as open-source Dapr 1.18 or managed Catalyst Cloud. Replaces manual audit trails with built-in cryptographic signing. Requires SPIFFE-compatible identity infrastructure and downstream systems that can validate attestations. Worth evaluating immediately if you run long-running workflows or agentic systems in regulated domains.
“Workflow History Signing, Workflow History Propagation, and Workflow Attestation”
“cryptographic chains of custody that span workflows, services, and AI agents”
“open SPIFFE standard”
“the next phase of cloud-native computing will not simply be about durable execution; it will be about verifiable execution”
Server-side flag evaluation eliminates client-side layout shift and flag requests; flags auto-register from code, appear in dashboard as drafts, and integrate natively with Next.js and SvelteKit.
Decouples code deployment from feature release, enabling continuous merges to main without shipping unfinished work. Kill switches flip without redeployment, and progressive rollouts reduce blast radius of broken features.
Replaces external flag services (LaunchDarkly, Split.io) for Vercel-deployed projects. Requires Next.js 13+ or SvelteKit; other frameworks use OpenFeature provider. Ready now—GA since April 2026, v0 team runs hundreds of flags in production. Framework integration justifies switching if you're already on Vercel.
“server-side by default, zero impact on page performance, and directly integrated with the frameworks you already use”
“hundreds of flags active at any given moment”
“made Vercel Flags generally available in April 2026, we've been using it internally for over a year”
“the browser renders it directly, with no separate flag request”
“Define one in code, deploy, and it appears in the dashboard as a draft”
Stripe Projects provisions databases without human signup
Agents can now provision Prisma Postgres databases and pay for them via Shared Payment Tokens—no browser signup, no separate vendor relationship, credentials automated into .env.
Removes the human-in-the-loop bottleneck that stops agentic workflows: agents can go from "I need a database" to live connection string without waiting for email verification or credit card entry. Billing enforcement happens at the token layer with per-provider or global spending caps, not per-vendor.
Replaces manual Prisma signup + Stripe billing integration with CLI-driven provisioning. Requires Stripe business account (KYC already done), `stripe projects init`, and agent skill definitions that come scaffolded. Ready now for Prisma Postgres; Prisma Compute coming later. Worth trying if you run agentic workflows and already bill through Stripe—no new vendor relationship needed.
“Stripe Projects lets you add Prisma Postgres to your project with one command”
“an agent can go from "I need a database" to a live connection string without a human in the loop”
“the wall every agentic workflow hits: it can write the app, but it has to stop and wait for a human to go sign up, verify an email, and enter a credit card”
“An SPT is a payment credential, backed by a real payment method like a card, that carries a spending limit you set”
“the limit lives with Stripe and is enforced at the token, and it covers everything billed against it”
Anthropic ships Claude Tag, async delegation in Slack
Claude Tag moves the model from chat tab to persistent team member with tool access, task state persistence, and proactive monitoring—requires explicit permissions setup and currently beta-only for Enterprise/Team plans.
Shifts developer workflow from synchronous prompting to background task delegation; Claude can monitor systems, wait on dependencies for days, and surface results without continuous polling. Changes where model participates in work—native to team communication layer instead of separate tool.
Replaces ad-hoc Claude chat for async code review, monitoring, and long-running ops tasks. Requires Slack workspace admin setup, explicit channel/tool/codebase permissions, and Claude Enterprise or Team plan. Worth piloting now if you run Slack-first engineering ops; backend complexity (identity, permissioning, state persistence) is substantial but abstracted from user.
“tag Claude in and delegate tasks to it while you focus on other work”
“writes 65% of our product team's code”
“Claude Tag is 'Claude Code made multiplayer, async, and proactive across your whole team'”
“Claude can tag in coworkers who own related code”
“git webhooks that can wait for blocking dependencies for very long (days) periods”
“responds to channels without being tagged”
“watches for thresholds to trigger and then attempts to fix if something broke”