Gemma 4 multimodal agents + Claude Fable 5 GA

Tool of the Week

Gemma 4 12B runs multimodal agents on laptops

Encoder-free architecture projects audio and vision directly into LLM backbone, cutting memory footprint to 16GB VRAM while matching 26B model reasoning performance.

Developers can now deploy agentic multimodal workflows locally without separate vision/audio encoders, reducing latency and infrastructure costs. Native audio support and sub-26B performance unlock edge deployment patterns previously requiring cloud.

Replaces cloud-dependent multimodal inference and larger models for local workflows. Requires 16GB VRAM minimum; supports Ollama, LM Studio, llama.cpp, vLLM, Hugging Face Transformers. Ready now—Apache 2.0 licensed, weights on HuggingFace/Kaggle, official Skills Repository for agentic patterns included.

“Gemma 4 12B packages powerful capabilities inside a reduced memory footprint”
“Small enough to run locally with just 16GB of VRAM or unified memory”
“performance nearing our 26B MoE model on standard benchmarks, but at less than half the total memory footprint”
“we trained Gemma 4 12B with an encoder-free architecture to integrate audio and vision input directly”
“Gemma 4 models have now crossed 150 million downloads”

multimodal-llmedge-inferencelocal-deploymentagentic-aigemma

Dev Signal

Get issues like this in your inbox — free, every weekday.

Quick Signals

Claude Fable 5 launches on AI Gateway today

Set `anthropic/claude-fable-5` in AI SDK to access a Mythos-class model that sustains multi-day autonomous work with adaptive thinking and higher first-shot correctness on complex problems.

Reduces human check-in overhead on long-running tasks like code review and repository investigation. Parallel sub-agent dispatch and adaptive effort settings let you shift resource allocation from supervision to exception handling.

Replaces prior Claude models for multi-step autonomous work. Requires AI SDK update and Anthropic API key. Ready now—30-day retention policy (no ZDR) is a hard constraint; blocking classifiers on cybersecurity/biology tasks narrow the surface. Worth testing on bug-finding and performance debugging workflows immediately.

“a notable step up over prior Claude models on long-running, ambiguous, multi-step tasks, executing end-to-end on work that previously required frequent human check-ins”
“The model sustains productive output across multi-day runs and dependably dispatches parallel sub-agents”
“Prompts and completions are retained for 30 days and are not used to train Claude”
“set model to `anthropic/claude-fable-5` in the AI SDK”

claudeai-gatewaylong-contextautonomous-agentscode-intelligence

Gemini 3.5 Live Translate ships speech-to-speech translation

Streaming speech-to-speech model detects 70+ languages, generates continuous translated audio with <5 second latency via Gemini Live API, handles noise-robust inputs without manual language config.

Data Point

Run gpt-oss evals locally with LM Studio uv

Execute OpenAI's AIME 2025 eval suite against gpt-oss-20b running locally via LM Studio using uv for dependency management, yielding detailed HTML/JSON results with 45.4% accuracy on 240 prompts.

Developers can now benchmark reasoning models offline without API calls, capturing full prompt/response traces for debugging. Local eval iteration replaces cloud-dependent testing workflows.

Replaces manual OpenAI API eval runs with self-hosted benchmarking. Requires LM Studio, Python 3.13, uv, and 4+ hour runtime for full 240-prompt suite. Worth trying now if you need local model introspection; increase context length from default 4096 to avoid mid-run failures.

“uv run for the benchmark. This means I get all of the dependencies installed automatically without having to worry about setting up a virtual environment myself”
“the eval suite needs an OpenAI-compatible API to talk to. LM Studio runs one on port 1234”
“the above command runs 240 prompts and can take several hours”
“score is the most important number - the eval suite assigns a 1 for each correct answer and a 0 for incorrect answers and then displays the average”
“Reached context length of 4096 tokens with model (arch: gpt-oss) that does not currently support mid-generation context overflow”

eval-suitelm-studiogpt-osslocal-inferencebenchmark

Enjoying Dev Signal? Get every issue in your inbox.

Free forever · 3 issues a week · One-click unsubscribe

Refer a friend →

Earn rewards for every developer you bring in.

Go premium →

Sponsor-free feed · full archive search · $149 lifetime.

Gemma 4 multimodal agents + Claude Fable 5 GA

Gemma 4 12B runs multimodal agents on laptops

Quick Signals

Claude Fable 5 launches on AI Gateway today

Gemini 3.5 Live Translate ships speech-to-speech translation

Run gpt-oss evals locally with LM Studio uv

Claude Fable 5 reaches general availability on AWS

Ghostty 1.0 ships December 2024 as open-source

Astral secures CI/CD with hash-pinned actions