ollama llama-cpp gguf inference apple-silicon

Ollama switches to llama.cpp backend, adds GGUF support

Ollama 0.30.0-rc28 replaces its GGML foundation with direct llama.cpp integration and GGUF compatibility, with MLX acceleration on Apple Silicon.

May 28, 2026

Summary

Direct llama.cpp backend reduces abstraction layers, potentially improving performance and compatibility with the broader inference ecosystem. Developers can now use GGUF files directly, standardizing model format interchange.

Why it matters

Implementation verdict

Replaces GGML stack with llama.cpp; requires testing performance/memory on your hardware before production use. Two known gaps: laguna-xs.2 and llama3.2-vision unsupported. Worth trying in rc28 if you run models on Mac/Linux/Windows, but wait for 0.30.0 stable if you rely on those missing model types.

Sources

1.directly support llama.cpp instead of building on top of GGML
2.allows for compatibility with GGUF file format
3.MLX is used to accelerate model inference on Apple Silicon
4.laguna-xs.2 is not supported yet on this pre-release
5.llama3.2-vision is not supported yet on this pre-release

Dev Signal

Get briefs like this in your inbox — free, every weekday.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs