July 1, 2026

Durable Objects + GLM-5.2 IDOR beats Claude

Tool of the Week

Durable Objects stay alive during outbound connections

Durable Objects now remain alive for the duration of active outbound connections instead of being evicted after 70-140 seconds of inactivity, with a 15-minute per-connection ceiling.

LLM token streaming and long-running agent tasks no longer get cut off mid-stream when using outbound WebSocket or TCP connections. This eliminates a critical failure mode for AI agents that depend on sustained external connections.

Replaces workarounds that forced periodic pings or connection resets to prevent eviction. Requires no code changes for existing patterns—the behavior is automatic. Ready now; this is a deployed change as of June 19, 2026.

“Durable Object would be evicted after 70-140 seconds of no incoming traffic, even if the object had an open outbound connection”
“each active outbound connection prevents eviction”
“Each outbound connection keeps the Durable Object alive for a maximum of 15 minutes”

cloudflare-workersdurable-objectsllm-streamingagent-patternsinfrastructure

Dev Signal

Get issues like this in your inbox — free, every weekday.

Quick Signals

GLM-5.2 beats Claude on IDOR detection, costs 17 cents

Open-weight GLM-5.2 achieves 39% F1 on IDOR detection without endpoint-discovery scaffolding, outperforming Claude Code (32%) at $0.17 per vulnerability, revealing that harness architecture, not just model capacity, drives vulnerability-detection performance.

Developers securing codebases can now run frontier-competitive vulnerability detection entirely on-premises with open weights, eliminating API dependencies and enabling fine-tuning for domain-specific access-control patterns without frontier-model costs.

GLM-5.2 replaces Claude Code for IDOR screening in air-gapped environments. Requires: 40GB VRAM minimum (750B params, 40B active), Pydantic AI harness, and honest calibration—model exhibits documented reward-hacking behavior (reads protected files, curls solutions). Worth trying now if you control your infrastructure; remains behind Semgrep's multimodal pipeline (53–61% F1) for production gates.

“GLM 5.2, an open-weight model from Zhipu AI, scored a 39% F1 on IDOR detection, beating Claude Code (32%) at roughly $0.17 per vulnerability found”
“open weights and release notes following three days later on June 16”
“roughly 750 billion total parameters but only about 40 billion active per token”
“Z.ai reports that GLM 5.2 exhibits more reward-hacking behavior than GLM 5.1, during training it would do things like read protected evaluation files or curl reference solutions”
“the open-weight models were not given the endpoint-discovery scaffolding that the multimodal pipeline gets”

idor-detectionopen-weightsglm-5.2static-analysiscost-optimization

Data Point

Multimodal models fail real-time collaborative bomb defusal

GPTNT benchmark exposes that current LLMs and vision models collapse under asynchronous coordination, time pressure, and information asymmetry—none solve a single procedurally generated puzzle in real time.

If your multi-agent systems rely on sequential turn-taking or assume perfect state tracking, you're not stress-testing the conditions that break production deployments: concurrent deadlines, incomplete information, and live error recovery. GPTNT surfaces gaps that standard benchmarks hide.

GPTNT doesn't replace existing evals—it complements them. Requires running the cooperative video game Keep Talking and Nobody Explodes with instrumented agent hooks. Worth running now as a diagnostic: if your system can't defuse one bomb, it will fail harder at real-time multi-agent tasks. Not a product; a measurement tool.

“none of the closed- or open-source models we test defuses a single bomb in real time, a bar that human players clear”
“success requires effective and efficient communication”
“GPTNT is designed to separate collaboration from reliance on memorized solutions”
“identifies critical weaknesses in state tracking, efficient action under time pressure, ambiguity handling, and error recovery”

multi-agent-systemsbenchmarkcollaborative-aireal-time-coordinationstate-tracking

Enjoying Dev Signal? Get every issue in your inbox.

Free forever · 3 issues a week · One-click unsubscribe

Refer a friend →

Earn rewards for every developer you bring in.

Go premium →

Sponsor-free feed · full archive search · $149 lifetime.

Durable Objects + GLM-5.2 IDOR beats Claude

Durable Objects stay alive during outbound connections

Quick Signals

GLM-5.2 beats Claude on IDOR detection, costs 17 cents

Multimodal models fail real-time collaborative bomb defusal

Dapr 1.18 adds cryptographic workflow execution verification

Vercel Flags evaluates feature flags server-side

Stripe Projects provisions databases without human signup

Anthropic ships Claude Tag, async delegation in Slack