Gemma 4 ships local multimodal • TS 7.0 10x faster in Go

Tool of the Week

Gemma 4 12B runs multimodal, locally, encoder-free

Encoder-free architecture eliminates separate vision/audio encoders, feeding raw pixels and 16kHz audio directly to LLM backbone—cuts multimodal latency and runs on 16GB VRAM laptops.

Developers can now build local agentic agents with audio, vision, and text in a single 12B model without juggling frozen encoders or managing separate parameter sets. Fine-tuning the entire multimodal stack happens in one pass via LoRA or full tuning.

Replaces bloated encoder-decoder stacks (550M vision + 300M audio encoders) with 35M vision embedder and raw audio projection. Requires 16GB VRAM minimum for local inference, or cloud deployment via Cloud Run/GKE. Ready now: download from HuggingFace, run via llama.cpp/Ollama/LM Studio, or spin OpenAI-compatible server with `litert-lm serve`. Worth trying immediately if you need local multimodal agents.

“Multimodal data is fed straight into the LLM backbone, reducing multimodal latency”
“Small enough to run locally on dedicated GPU laptops with 16GB VRAM or unified memory”
“Raw 16 kHz audio signals are sliced into 40ms frames (640 floats each) and projected linearly to the LLM input space”
“because vision, audio, and text inputs share the exact same weights, you no longer have to co-tune separate frozen encoders”

gemma-4local-inferencemultimodalencoder-freeagentic-ai

Dev Signal

Get issues like this in your inbox — free, 3x a week.

Quick Signals

Mastra scope takeover injects stealer across 142 packages

A revoked maintainer credential republished the entire @mastra npm scope with a postinstall dropper that disables TLS validation, fetches a second-stage payload, and exfiltrates cryptocurrency wallets and credentials.

If you ran npm install on any @mastra package after June 17, 2026, your build environment and developer machines are credential and wallet exposure events. Lockfiles are the deciding factor—regenerated or absent lockfiles pulled the armed easy-day-js@1.11.22.

Audit your node_modules and lockfiles immediately for easy-day-js (it should never legitimately appear). If present, treat the host as compromised: rotate credentials, check browser wallet extensions, and scan for persistence artifacts (LaunchAgent on macOS, systemd service on Linux, PowerShell staging on Windows). Upgrade @mastra packages to versions forward-rolled after June 17, 2026. This is active supply chain incident, not a source-code vulnerability—npm publish hygiene (not zero-day) is the root cause. Do this now.

“entire @mastra scope, republished on June 17, 2026”
“@mastra/core alone pulls about 4 million downloads a month”
“essentially the entire scope was hit”
“Disables TLS certificate validation by setting NODE_TLS_REJECT_UNAUTHORIZED='0'”
“reads Chrome, Brave, and Edge profiles looking for a hardcoded list of cryptocurrency wallet browser extensions”
“The foothold was a stale maintainer credential”
“npm does not expire scope publish permissions on inactivity”
“The payload executes at install time, CI runners, ephemeral build agents, and developer laptops are all in scope”

supply-chainnpm-securitymalwaredependency-injectionincident-response

Data Point

Harness design outperforms model upgrades on SWE-Bench

A well-engineered adapter layer can deliver 54-point Pass@1 gains on the same model, matching or exceeding the impact of swapping LLMs entirely.

Most teams chase larger models while leaving harness architecture as fixed plumbing. Optimizing patch extraction, workspace contracts, and diff adapters is cheaper and faster than model scaling, and directly controls agent reliability on code tasks.

Replace your baseline harness with a modular, cost-aware adapter before buying a bigger LLM. Requires systematic testing of workspace contracts and patch-extraction strategies. Worth implementing now—the gains are large and the work is localized to your agent layer, not model training.

“a well‑engineered adapter lifts Pass@1 by over 50 percentage points while keeping the same model”
“a minimal direct‑diff adapter scores 19.1 % Pass@1, but the full adapter reaches 73.4 %, a 54.3‑point improvement generated solely by harness tweaks”
“model choice adds 29.4 pp whereas harness choice adds 27.4 pp”
“teams should prioritize a modular, cost‑aware adapter layer before investing in larger LLMs”

swe-benchagent-harnesscode-generationbenchmarkllm-architecture

Enjoying Dev Signal? Get every issue in your inbox.

Free forever · 3 issues a week · One-click unsubscribe

Refer a friend →

Earn rewards for every developer you bring in.

Go premium →

Sponsor-free feed · full archive search · $149 lifetime.

Gemma 4 ships local multimodal • TS 7.0 10x faster in Go

Gemma 4 12B runs multimodal, locally, encoder-free

Quick Signals

Mastra scope takeover injects stealer across 142 packages

Harness design outperforms model upgrades on SWE-Bench

JetBrains AI coding agent leaves beta

TypeScript 7.0 rewritten in Go ships 10x faster

Fifteen malicious JetBrains plugins stole API keys

Claude Code auto mode lands on Bedrock, Vertex, Azure