Ollama switches to llama.cpp backend, adds GGUF support
Ollama 0.30.0-rc28 replaces its GGML foundation with direct llama.cpp integration and GGUF compatibility, with MLX acceleration on Apple Silicon.
May 28, 2026
Summary
Direct llama.cpp backend reduces abstraction layers, potentially improving performance and compatibility with the broader inference ecosystem. Developers can now use GGUF files directly, standardizing model format interchange.
Why it matters
Direct llama.cpp backend reduces abstraction layers, potentially improving performance and compatibility with the broader inference ecosystem. Developers can now use GGUF files directly, standardizing model format interchange.
Implementation verdict
Replaces GGML stack with llama.cpp; requires testing performance/memory on your hardware before production use. Two known gaps: laguna-xs.2 and llama3.2-vision unsupported. Worth trying in rc28 if you run models on Mac/Linux/Windows, but wait for 0.30.0 stable if you rely on those missing model types.
Sources
- 1.directly support llama.cpp instead of building on top of GGML
- 2.allows for compatibility with GGUF file format
- 3.MLX is used to accelerate model inference on Apple Silicon
- 4.laguna-xs.2 is not supported yet on this pre-release
- 5.llama3.2-vision is not supported yet on this pre-release
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.