May 19, 2026

Cut token use by 98%, Apple Silicon, and LLMs

Tool of the Week

Semble indexes codebases, cuts agent token use 98%

Natural-language code search library that returns only relevant snippets to agents via MCP or bash, replacing grep+read workflows with ~250ms indexing and ~1.5ms queries on CPU.

Agents waste tokens reading full files to find code; Semble returns only matched chunks, reducing context window pressure and latency on every retrieval step. Replaces manual grep exploration with semantic search agents can call directly.

Ready now. Drop-in MCP server (Claude Code, Cursor, Codex, OpenCode) or bash tool; no setup beyond `pip install semble`. Replaces grep+find workflows entirely. Requires uv for MCP or pip for CLI. Worth testing immediately if you run agents against large codebases.

“returns the exact code snippets they need instantly, using ~98% fewer tokens than grep+read”
“indexes an average repo in ~250 ms and answers queries in ~1.5 ms, all on CPU”
“NDCG@10 of 0.854 on our benchmarks, on par with code-specialized transformer models”
“Everything runs on CPU with no API keys, GPU, or external services”
“~200x faster indexing and ~10x faster queries than a code-specialized transformer”

code-searchagentsmcptoken-efficiencylocal-first

Dev Signal

Get issues like this in your inbox — free, every weekday.

Quick Signals

AI code gen doesn't fix slow review queues

Code-gen tools compress keystroke time (already the bottleneck) but leave review latency, CI duration, and deploy windows untouched—the rows that own most wall-clock time.

Adopting Copilot or Claude Code without addressing review SLA, CI duration, and PR size creates more WIP and longer cycle times despite faster individual writing. The productivity gain is local, not systemic.

Don't adopt AI coding tools expecting cycle-time improvement until you've fixed review latency (target: four-hour SLA), CI flakes, and PR size (200-line limit). AI works for cold-start boilerplate and solo drafts. For team throughput, cut review queue and CI duration first—those changes compound faster than keystroke savings.

“Faster code generation without faster review increases work-in-progress, not throughput.”
“cycle time = WIP / throughput. Adding code without adding review capacity raises WIP and stretches every PR's wall-clock time.”
“The breakdown looks roughly like this: Specifying what to build: hours to days, Writing the code: hours, Code review wait: hours to days, CI runs and flaky retries: 20-90 minutes per push”
“Teams that enforce 200-line-or-less PRs (with rare exceptions) review faster, ship faster, and find more bugs per PR.”
“Set a service level on first review — four hours during business hours is achievable for most teams.”

code-gencycle-timepr-reviewbottleneck-analysis

Bun 1.2 replaces tooling, not Node runtime

Bun consolidates install, test, build, and script execution into one binary; the real win is CI speedup and cold-start latency, not production throughput parity.

Enjoying Dev Signal? Get every issue in your inbox.

Free forever · 3 issues a week · One-click unsubscribe

Refer a friend →

Earn rewards for every developer you bring in.

Go premium →

Sponsor-free feed · full archive search · $149 lifetime.

Cut token use by 98%, Apple Silicon, and LLMs

Semble indexes codebases, cuts agent token use 98%

Quick Signals

AI code gen doesn't fix slow review queues

Bun 1.2 replaces tooling, not Node runtime

Apple Silicon costs three times more than OpenRouter

AI becomes table stakes, not differentiation

LLMs stay blank, you absorb the learning