AI code gen doesn't fix slow review queues
Code-gen tools compress keystroke time (already the bottleneck) but leave review latency, CI duration, and deploy windows untouched—the rows that own most wall-clock time.
Adopting Copilot or Claude Code without addressing review SLA, CI duration, and PR size creates more WIP and longer cycle times despite faster individual writing. The productivity gain is local, not systemic.
Don't adopt AI coding tools expecting cycle-time improvement until you've fixed review latency (target: four-hour SLA), CI flakes, and PR size (200-line limit). AI works for cold-start boilerplate and solo drafts. For team throughput, cut review queue and CI duration first—those changes compound faster than keystroke savings.
- “Faster code generation without faster review increases work-in-progress, not throughput.”
- “cycle time = WIP / throughput. Adding code without adding review capacity raises WIP and stretches every PR's wall-clock time.”
- “The breakdown looks roughly like this: Specifying what to build: hours to days, Writing the code: hours, Code review wait: hours to days, CI runs and flaky retries: 20-90 minutes per push”
- “Teams that enforce 200-line-or-less PRs (with rare exceptions) review faster, ship faster, and find more bugs per PR.”
- “Set a service level on first review — four hours during business hours is achievable for most teams.”
code-gencycle-timepr-reviewbottleneck-analysis
Bun 1.2 replaces tooling, not Node runtime
Bun consolidates install, test, build, and script execution into one binary; the real win is CI speedup and cold-start latency, not production throughput parity.
Developers running heavy CI pipelines and TypeScript toolchains can eliminate 800ms+ per script invocation and dramatically faster installs without rewriting application code. The productivity gain is measurable in dev and CI, not in deployed services.
Bun replaces tsx, ts-node, jest/vitest, esbuild, and npm/pnpm for new projects with zero native dependencies. It requires vetting native modules (node-gyp bindings fail first) and avoiding runtime deployment to Lambda/Vercel. Start with bun install --frozen-lockfile in CI; defer full migration until you hit tsx or install bottlenecks. Node 22+ has absorbed most UX wins; the gap is real but smaller than 2023.
- “bun install on a cold cache finishes in roughly the time npm install spends just resolving the dependency graph”
- “Bun's built-in Bun.serve outperforms Node's http module and most Node frameworks in synthetic benchmarks”
- “If tsx adds 800ms to every script invocation, bun run gives that back”
- “Bun targets Node.js API compatibility as a feature”
- “Packages with node-gyp bindings may fail to build or run”
- “Bun is most useful for local dev, CI, and self-hosted containers”
runtime-performanceci-toolingpackage-managertypescriptmigration-guide
Apple Silicon costs three times more than OpenRouter
M5 MacBook Pro runs Gemma 4 31b at $1.50–$4.79 per million tokens; OpenRouter's same model costs $0.38–$0.50, plus 3–7× faster throughput.
If you're evaluating local inference for production agents, hardware amortization dominates cost—not electricity. Speed matters more: cloud providers deliver 60–70 tokens/sec vs. your 10–20 local. For salary-bearing developers, token cost is noise; latency kills productivity.
Don't replace cloud APIs with local M5 inference for latency-sensitive work. Local remains viable only for offline-first or air-gapped deployments where you have no choice. If you already own the hardware and have CPU cycles to spare, run it—but don't buy a MacBook Pro for this.
- “~50-100 watts under load, and ~$0.20 per kWh, my M5 MacbookPro will cost a few cents per hour”
- “ammortized costs of ~$1.50 per million tokens”
- “Openrouter for comparable models is 1/3rd the price and ~2x the speed”
- “10-40 tokens per second range for a serious model like Gemma4:31b”
- “OpenRouter has Gemma4 31b at ~38-50 cents per million tokens”
- “Local inference is slower than cloud inference. Some of the gemma 4 providers on openrouter get up to 60-70 tokens per second, which is 3-7 times faster than what I'm seeing with the pro max (~10-20 tokens per second)”
local-inferencecost-analysisapple-siliconllm-opstokenomics
AI becomes table stakes, not differentiation
Build products that survive if the AI gets cheaper or breaks; use AI to improve them, not replace the underlying value.
Most ChatGPT wrappers are collapsing because they lack defensibility when model vendors ship competing features or API costs drop. Your workflow depends on knowing whether you're building a durable product or a temporary arbitrage on model availability.
Replaces the "AI-first" positioning that dominated 2024–2025 pitch decks. Requires you to define your product's value without the AI layer first. Apply this test now to any AI feature roadmap: if the model swapped to a competitor tomorrow, would users stay? If no, stop building.
- “when a capability becomes table stakes, branding around the capability becomes incoherent”
- “a thin UI sits on top of someone else's model, charging a subscription for prompts the user could write themselves”
- “The model vendor undercuts you”
- “The switching cost is zero”
- “build something that would still be useful if the AI got much cheaper or much worse, and then use AI to make it better”
- “if the underlying model swapped to a competitor tomorrow, would it still be the best thing for the job?”
ai-productsmarket-dynamicsstrategywrappersdefensibility
LLMs stay blank, you absorb the learning
Models don't retain context between sessions—the burden of prompting skill accumulates entirely on you, not the tool.
Understanding that LLMs are stateless changes how you architect workflows. You're not building persistent agent memory; you're optimizing your own prompt patterns and context-passing strategy.
This replaces the assumption that fine-tuning or continued learning happens server-side. It requires you to externalize everything—store conversation context, version your prompt templates, treat the model as a pure stateless function. Worth acknowledging now before you design for the wrong mental model.
- “An LLM doesn't work that way. It learns nothing about me between sessions.”
- “The chisel didn't change. I did.”
- “You become the patina.”
- “The tool is patient and unchanged.”
stateless-modelsprompt-engineeringllm-workflowcontext-managementmental-models