June 2, 2026

Ollama 0.30, Claude 4.6 context expansion, npm worm alert

Share:

Tool of the Week

Ollama 0.30 expands hardware support via llama.cpp

llama.cpp backend replaces MLX-only Apple Silicon constraint, adds NVIDIA perf gains and GGUF model support across wider hardware range.

Developers can now run fine-tuned GGUF models and Hugging Face variants on more hardware without reimplementing inference pipelines. Faster NVIDIA execution reduces iteration cycles.

Replaces prior Ollama versions; requires 0.30 upgrade. Ready now for GGUF workflows on Apple/NVIDIA. Avoid laguna-xs.2 and llama3.2-vision until next patch. Breaking change: nomic-embed-text now lowercase-converts inputs—audit existing inference if you depend on case preservation.

  • improved compatibility and performance using llama.cpp
  • support for a wider range of models, including GGUF-based models from Hugging Face and your own fine-tuned models along with faster performance on NVIDIA hardware
  • nomic-embed-text now converts inputs to lowercase per the model card where prior Ollama versions incorrectly preserved mixed case
ollamallama-cppggufinferencehardware-support

Dev Signal

Get issues like this in your inbox — free, 3x a week.

Quick Signals

Bonsai Image 4B runs diffusion inference on iPhones

Binary and ternary quantization reduce FLUX.2 Klein 4B diffusion transformer from 7.75 GB to 0.93–1.21 GB while retaining 88–95% quality, enabling local generation on Apple Silicon devices.

Eliminates cloud round-trip latency for iterative image generation workflows and keeps prompts/assets local. Developers can embed high-quality image generation in apps on hardware users already own, removing per-request costs and enabling faster creative loops.

Replaces cloud-only FLUX.2 Klein deployment for on-device use cases. Requires MLX (Apple Silicon) or Gemlite (CUDA) support; both variants ship as open weights. Ready now for iOS/macOS apps—9.4s per 512×512 on iPhone 17 Pro Max is practical for most UX patterns. Ternary variant recommended for quality; 1-bit for extreme memory pressure.

  • 1.125 effective bits per weight
  • 1.71 effective bits per weight
  • the first image model in its parameter class to run directly on an iPhone
  • mean-active memory is 1.5 GB and 1.96 GB, for the binary and ternary models, compared to 11.74 GB for the original FLUX.2 Klein 4B
  • retains 95% of the FLUX.2 Klein 4B accuracy across GenEval, HPSv3, and DPG-Bench, while reducing the diffusion transformer footprint by 6.4x
  • generation can sit directly inside the product experience
quantizationon-device-inferenceimage-generationapple-siliconopen-weights

Red Hat npm packages carry self-propagating credential worm

Malicious preinstall scripts in @redhat-cloud-services packages harvest credentials and spread via compromised maintainer accounts; treat as active incident if installed.

These are build-time dependencies for enterprise infrastructure, installed on developer workstations and CI runners where long-lived cloud credentials and registry tokens live. The worm replicates across any packages you can publish, turning a single install into organizational blast radius.

Immediate: audit lockfiles for @redhat-cloud-services versions <=7.7.2 (check Snyk advisories per package), pin away, reinstall with npm install --ignore-scripts, rotate every credential reachable from affected machines. This is not optional—assume any secrets touched those environments are exposed. Run Snyk to flag all affected projects; hunt for orphan repos with description 'Miasma: The Spreading Blight' and unexpected workflows requesting id-token: write.

  • malicious code embedded in at least 32 package releases published under the @redhat-cloud-services npm namespace
  • The affected packages average roughly 80,000 downloads per week combined
  • a preinstall script that runs an obfuscated payload the moment a package is installed, harvesting developer and cloud credentials and attempting to spread itself to other packages the victim can publish
  • Evidence indicates a Red Hat employee's GitHub account was compromised and used to push malicious orphan commits directly into two RedHatInsights repositories, bypassing code review
  • the payload: Harvests secrets and credentials from the local environment and CI context: environment variables, ~/.npmrc tokens, SSH keys, GitHub tokens, and CI/CD secrets
  • Snyk rates the lead advisory at 9.3 (Critical, CVSS v4.0) with an exploit maturity of Attacked
  • Most malicious versions had been revoked from npm within hours of disclosure
supply-chain-securitynpm-registrycredential-theftworm-propagationincident-response

Claude Sonnet 4.6 ships with 1M context window

Same API pricing as 4.5, but 80.2% on SWE-bench (up from 77.2%), 1M token context, and human-level computer use—single string change to migrate.

Existing Sonnet 4.5 deployments gain coding accuracy and long-context reasoning without cost increases. Agentic workflows and document processing unlock materially better performance at parity pricing.

Replaces Sonnet 4.5 for production agents, coding, and document work. Requires one model string update. Migration is ready now—zero API changes, identical rate limits. Start all new projects on 4.6.

  • Sonnet 4.6 ships with a 1M token context window in beta
  • early users are reporting human-level capability on tasks like navigating complex spreadsheets and completing multi-step web forms
  • The headline SWE-bench number for 4.6 is 80.2% with a prompt modification, up from 77.2% on 4.5
  • the API pricing is identical, and the capability uplift is real
  • for existing projects: migrate
claude-sonnet-4-6llm-benchmarksagentic-workflowscode-generationcontext-window

ChatGPT Google Sheets extension bypasses human approval

Indirect prompt injection in untrusted data sources lets attackers exfiltrate workbooks and run scripts even when auto-edit is disabled.

If you use ChatGPT for Google Sheets with imported data or connectors, attackers can steal all accessible spreadsheets and credentials via hidden injection payloads—approval toggles don't block it. OpenAI disabled Apps Script generation in response, but you need to audit connector sources and data imports now.

Disables the ability to use external scripts via ChatGPT for Google Sheets entirely (AppScript generation removed). Requires: disable the extension in Workspace settings > Permissions & roles until you audit data sources, or switch to manual spreadsheet workflows. Not ready for production use with untrusted data until sandboxing is redesigned.

  • A single indirect prompt injection attack triggered by a single benign user query can trigger all of the following effects at once: Exfiltration of many workbooks from across the victim's account
  • this attack succeeds even when the user has explicitly disabled automatic edits
  • over 185,000 downloads since its launch less than a month ago
  • we've taken immediate steps to protect users against potential attacks in this area by removing the model's ability to generate Apps Script code
prompt-injectiongoogle-sheetssecurityai-extensionsdata-exfiltration

Claude Code generates multi-agent workflows on demand

Dynamic Workflows lets Claude orchestrate parallel subagents for large tasks without manual configuration—trade: substantially higher token consumption.

Eliminates manual agent coordination for complex tasks like migrations, audits, and bug investigations. Formalizes workflows developers already assemble manually, reducing orchestration overhead.

Replaces manual multi-step prompting and agent setup. Requires Max/Team/Enterprise plan or API access; start on scoped tasks before large projects due to token cost. Research preview status means behavior may shift—worth testing now on non-critical work.

  • Available in research preview
  • can consume substantially more tokens than a typical Claude Code session
  • Claude to dynamically create orchestration scripts, break work into subtasks, run them in parallel, and validate results
  • Progress is saved throughout execution, allowing interrupted runs to resume without starting over
agent-orchestrationclaude-apiworkflow-automationmulti-agentcost-trade-off

Data Point

Interactive reasoning benchmark exposes LLM query efficiency gaps

474-game benchmark measures not just success rate but interaction efficiency and robustness under contextual perturbations—LLMs fail harder on counterfactual revision than baseline tasks.

Reveals whether your LLM can actually acquire evidence iteratively and adapt reasoning when assumptions break. Standard benchmarks hide interaction patterns that matter in production agentic systems.

Replaces single-shot eval frameworks with multi-turn reasoning assessment. Requires ability to run executable games and parse LLM query sequences. Not production-ready yet—preprint under review, no public benchmark release confirmed. Worth monitoring for agentic eval methodology.

  • multi-turn interactive framework for reasoning evaluation that treats reasoning as active evidence acquisition and belief updating
  • contextual perturbations cause moderate but consistent declines, whereas counterfactual revision and necessity judgment lead to much larger drops
  • benchmark of 474 executable games, each evaluated under five fixed configuration search spaces corresponding to five difficulty levels
reasoning-evalsmulti-turn-reasoningbenchmarkllm-robustnessinteractive-queries

Enjoying Dev Signal? Get every issue in your inbox.

Free forever · 3 issues a week · One-click unsubscribe