Apple Silicon costs three times more than OpenRouter
M5 MacBook Pro runs Gemma 4 31b at $1.50–$4.79 per million tokens; OpenRouter's same model costs $0.38–$0.50, plus 3–7× faster throughput.
May 19, 2026
Summary
If you're evaluating local inference for production agents, hardware amortization dominates cost—not electricity. Speed matters more: cloud providers deliver 60–70 tokens/sec vs. your 10–20 local. For salary-bearing developers, token cost is noise; latency kills productivity.
Why it matters
If you're evaluating local inference for production agents, hardware amortization dominates cost—not electricity. Speed matters more: cloud providers deliver 60–70 tokens/sec vs. your 10–20 local. For salary-bearing developers, token cost is noise; latency kills productivity.
Implementation verdict
Don't replace cloud APIs with local M5 inference for latency-sensitive work. Local remains viable only for offline-first or air-gapped deployments where you have no choice. If you already own the hardware and have CPU cycles to spare, run it—but don't buy a MacBook Pro for this.
Sources
- 1.~50-100 watts under load, and ~$0.20 per kWh, my M5 MacbookPro will cost a few cents per hour
- 2.ammortized costs of ~$1.50 per million tokens
- 3.Openrouter for comparable models is 1/3rd the price and ~2x the speed
- 4.10-40 tokens per second range for a serious model like Gemma4:31b
- 5.OpenRouter has Gemma4 31b at ~38-50 cents per million tokens
- 6.Local inference is slower than cloud inference. Some of the gemma 4 providers on openrouter get up to 60-70 tokens per second, which is 3-7 times faster than what I'm seeing with the pro max (~10-20 tokens per second)
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.