llama.cpp fixes Gemma 4 multimodal projector crash
Bump to b9509 resolves n_head=0 divide-by-zero crash affecting Gemma 4 12B multimodal inference on x86/CUDA across Linux and Windows.
June 5, 2026
Summary
If you're running Gemma 4 12B multimodal locally via llama.cpp, this unblocks inference that was crashing on common hardware/OS combinations. Eliminates blocker for x86 and CUDA-based deployments.
Why it matters
If you're running Gemma 4 12B multimodal locally via llama.cpp, this unblocks inference that was crashing on common hardware/OS combinations. Eliminates blocker for x86 and CUDA-based deployments.
Implementation verdict
Direct replacement: update llama.cpp to b9509 or later. No config changes required. Ready now—this is a critical crash fix, not experimental.
Sources
- 1.upstream Gemma 4 12B multimodal projector fixes for the n_head=0 divide-by-zero crash
- 2.x86/CUDA/Linux/Windows
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.