May 22, 2026

SQL agents, Copilot billing changes, Node.js patches

Tool of the Week

Datasette Agent ships conversational SQL interface

Extensible AI assistant for Datasette that converts natural language to SQLite queries and charts via plugin system; runs on Gemini 3.1 Flash-Lite or local models like gemma-4-26b.

Eliminates manual SQL writing for data exploration workflows. Plugin architecture lets you inject domain-specific tools (image generation, code execution, charting) without forking core—critical for teams building on Datasette infrastructure.

Replaces manual SQL + charting workflows for Datasette users. Requires Datasette instance + Claude/OpenAI/local LLM with reliable tool calling. Ready now for exploration; production viability depends on query reliability against your schema. Start with the live demo at agent.datasette.io to validate behavior.

“Datasette Agent provides a conversational interface for asking questions of the data you have stored in Datasette”
“Add the datasette-agent-charts plugin and it can generate charts of your data as well”
“The live demo runs on Gemini 3.1 Flash-Lite—it's cheap, fast and has no trouble writing SQLite queries”
“like the rest of Datasette, it's extensible using plugins”
“Claude Code and OpenAI Codex are both proving excellent at writing plugins”

datasettellm-toolssql-generationplugin-systemlocal-models

Dev Signal

Get issues like this in your inbox — free, every weekday.

Quick Signals

GitHub Copilot shifts usage-based billing June 1st

Pro gets $15/month included usage ($10 base + $5 flex), Pro+ gets $70 ($39 base + $31 flex), new Max tier adds $200/month for sustained agent work; code completions remain unlimited.

If you run long agent chains or multi-step workflows, the flex allotment cushions overage risk without manual credit management. Base credits stay fixed 1:1 with subscription price—only flex varies as model costs shift, so you know your floor cost.

This replaces flat-rate Pro/Pro+ plans with metered billing. Requires zero action if on monthly plans (auto-migrates June 1st). Worth adopting now only if you currently hit usage ceilings; otherwise audit your actual consumption before upgrading to Max. The flex buffer buys runway, but no public benchmarks yet on whether $15 or $70 covers real agent workloads.

“Longer agent runs, multi-step work, and more capable models will all put pressure on the usage amounts”
“Base credits: matched 1:1 with your subscription price. These never change.”
“Flex allotment: variable additional usage on top of your base. Flex allotments will vary over time.”
“Code completions and next edit suggestions remain unlimited on paid plans and don't consume credits.”

copilotusage-based-billingpricingagentscost-control

Supabase adds PrivateLink, Claude connector, Postgres rules

PrivateLink routes AWS traffic through VPC without internet exposure; Claude connector enables direct database management via natural language; 30-rule Postgres ruleset teaches AI agents correct SQL patterns.

Data Point

HealthCraft measures LLM safety collapse under clinical pressure

RL environment with FHIR R4 state and dual-layer safety rubric exposes that frontier models fail multi-step workflows (Claude 1.0%, GPT-5.4 0.0%) despite partial single-step competence.

Static QA benchmarks miss failure modes that matter in production medical workflows—trajectory-level safety collapse and tool misuse under sustained pressure. Developers deploying clinical LLMs now have a measurement harness that catches what reaches real patients, not abstract accuracy.

Replaces toy medical QA evals with realistic multi-step task chains (195 tasks, 2,255 binary criteria, 515 safety-critical). Requires FHIR R4 integration, MCP tool support (24 exposed), and deterministic LLM-judge overlay for evaluator noise control. Ready to pilot now—code, tasks, Docker bundle released under Apache 2.0—but training-reward signal is not production-safe yet per authors' own 0.929 prevalence gameability finding. Use for benchmarking before deployment; training ablations pending.

“the first public reinforcement-learning environment that rewards trajectory-level safety under realistic emergency-medicine conditions”
“performance collapses to near zero (Claude 1.0%, GPT-5.4 0.0%) despite partial competence on individual steps”
“safety-failure rates of 27.5% and 34.0%”
“the reward signal is not drop-in training-safe: restraint criteria pass at 0.929 prevalence, a gameability an eval harness can tolerate but a training reward cannot”
“Environment, tasks, rubrics, and harness are released under Apache 2.0”

medical-aisafety-evalrl-environmentbenchmarkllm-robustness

Enjoying Dev Signal? Get every issue in your inbox.

Free forever · 3 issues a week · One-click unsubscribe

Refer a friend →

Earn rewards for every developer you bring in.

Go premium →

Sponsor-free feed · full archive search · $149 lifetime.

SQL agents, Copilot billing changes, Node.js patches

Datasette Agent ships conversational SQL interface

Quick Signals

GitHub Copilot shifts usage-based billing June 1st

Supabase adds PrivateLink, Claude connector, Postgres rules

HealthCraft measures LLM safety collapse under clinical pressure

Gemini Omni Flash generates video from multimodal input

Node.js patches nine vulnerabilities across active releases

Lock down AI agents with token and filesystem isolation