Bonsai Image 4B runs diffusion inference on iPhones
Binary and ternary quantization reduce FLUX.2 Klein 4B diffusion transformer from 7.75 GB to 0.93–1.21 GB while retaining 88–95% quality, enabling local generation on Apple Silicon devices.
June 2, 2026
Summary
Eliminates cloud round-trip latency for iterative image generation workflows and keeps prompts/assets local. Developers can embed high-quality image generation in apps on hardware users already own, removing per-request costs and enabling faster creative loops.
Why it matters
Eliminates cloud round-trip latency for iterative image generation workflows and keeps prompts/assets local. Developers can embed high-quality image generation in apps on hardware users already own, removing per-request costs and enabling faster creative loops.
Implementation verdict
Replaces cloud-only FLUX.2 Klein deployment for on-device use cases. Requires MLX (Apple Silicon) or Gemlite (CUDA) support; both variants ship as open weights. Ready now for iOS/macOS apps—9.4s per 512×512 on iPhone 17 Pro Max is practical for most UX patterns. Ternary variant recommended for quality; 1-bit for extreme memory pressure.
Sources
- 1.1.125 effective bits per weight
- 2.1.71 effective bits per weight
- 3.the first image model in its parameter class to run directly on an iPhone
- 4.mean-active memory is 1.5 GB and 1.96 GB, for the binary and ternary models, compared to 11.74 GB for the original FLUX.2 Klein 4B
- 5.retains 95% of the FLUX.2 Klein 4B accuracy across GenEval, HPSv3, and DPG-Bench, while reducing the diffusion transformer footprint by 6.4x
- 6.generation can sit directly inside the product experience
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.