Gate every AI request, not sessions alone
Inference theft scales because attackers can amortize auth checks across thousands of proxied calls—you must verify on every request, not per-session, using invisible bot detection that runs server-side before inference.
June 5, 2026
Summary
A single stolen frontier-model call costs ~$2 while your HTTP endpoint costs fractions of a cent; attackers resell at 5-10% discount for pure margin. Without per-request gates, your AI budget bleeds tens of thousands per attack cycle.
Why it matters
A single stolen frontier-model call costs ~$2 while your HTTP endpoint costs fractions of a cent; attackers resell at 5-10% discount for pure margin. Without per-request gates, your AI budget bleeds tens of thousands per attack cycle.
Implementation verdict
Replaces session-layer rate limits and IP blocks with per-request bot classification. Requires Vercel BotID client/server setup (~15 lines of code) or equivalent invisible CAPTCHA. Production-ready now—Vercel's own docs endpoint blocks >10k bot requests within minutes using this pattern.
Sources
- 1.a single prompt to an agent on a frontier model can cost $2
- 2.Vercel charges ~$2/million, a fraction of a cent per call
- 3.verification has to run on every AI request
- 4.Any check that runs per session amortizes the attacker's bypass cost across every subsequent inference call
- 5.BotID deep analysis detected and blocked more than ten thousand bot requests in the first minutes of the spike
- 6.inference cost run rate of over ten thousand dollars per day
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.