llama.cpp fixes Gemma 4 multimodal projector crash

Bump to b9509 resolves n_head=0 divide-by-zero crash affecting Gemma 4 12B multimodal inference on x86/CUDA across Linux and Windows.

June 5, 2026

Summary

If you're running Gemma 4 12B multimodal locally via llama.cpp, this unblocks inference that was crashing on common hardware/OS combinations. Eliminates blocker for x86 and CUDA-based deployments.

Why it matters

If you're running Gemma 4 12B multimodal locally via llama.cpp, this unblocks inference that was crashing on common hardware/OS combinations. Eliminates blocker for x86 and CUDA-based deployments.

Implementation verdict

Direct replacement: update llama.cpp to b9509 or later. No config changes required. Ready now—this is a critical crash fix, not experimental.

Sources

1.upstream Gemma 4 12B multimodal projector fixes for the n_head=0 divide-by-zero crash
2.x86/CUDA/Linux/Windows

Dev Signal

Get briefs like this in your inbox — free, every weekday.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs