Gate every AI request, not sessions alone

Inference theft scales because attackers can amortize auth checks across thousands of proxied calls—you must verify on every request, not per-session, using invisible bot detection that runs server-side before inference.

June 5, 2026

Summary

A single stolen frontier-model call costs ~$2 while your HTTP endpoint costs fractions of a cent; attackers resell at 5-10% discount for pure margin. Without per-request gates, your AI budget bleeds tens of thousands per attack cycle.

Why it matters

A single stolen frontier-model call costs ~$2 while your HTTP endpoint costs fractions of a cent; attackers resell at 5-10% discount for pure margin. Without per-request gates, your AI budget bleeds tens of thousands per attack cycle.

Implementation verdict

Replaces session-layer rate limits and IP blocks with per-request bot classification. Requires Vercel BotID client/server setup (~15 lines of code) or equivalent invisible CAPTCHA. Production-ready now—Vercel's own docs endpoint blocks >10k bot requests within minutes using this pattern.

Sources

  1. 1.a single prompt to an agent on a frontier model can cost $2
  2. 2.Vercel charges ~$2/million, a fraction of a cent per call
  3. 3.verification has to run on every AI request
  4. 4.Any check that runs per session amortizes the attacker's bypass cost across every subsequent inference call
  5. 5.BotID deep analysis detected and blocked more than ten thousand bot requests in the first minutes of the spike
  6. 6.inference cost run rate of over ten thousand dollars per day

Dev Signal

Get briefs like this in your inbox — free, 3x a week.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.