Ollama 0.30 expands hardware support via llama.cpp
llama.cpp backend replaces MLX-only Apple Silicon constraint, adds NVIDIA perf gains and GGUF model support across wider hardware range.
June 2, 2026
Summary
Developers can now run fine-tuned GGUF models and Hugging Face variants on more hardware without reimplementing inference pipelines. Faster NVIDIA execution reduces iteration cycles.
Why it matters
Developers can now run fine-tuned GGUF models and Hugging Face variants on more hardware without reimplementing inference pipelines. Faster NVIDIA execution reduces iteration cycles.
Implementation verdict
Replaces prior Ollama versions; requires 0.30 upgrade. Ready now for GGUF workflows on Apple/NVIDIA. Avoid laguna-xs.2 and llama3.2-vision until next patch. Breaking change: nomic-embed-text now lowercase-converts inputs—audit existing inference if you depend on case preservation.
Sources
- 1.improved compatibility and performance using llama.cpp
- 2.support for a wider range of models, including GGUF-based models from Hugging Face and your own fine-tuned models along with faster performance on NVIDIA hardware
- 3.nomic-embed-text now converts inputs to lowercase per the model card where prior Ollama versions incorrectly preserved mixed case
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.