Stream LLM tokens to browser with fetch, not EventSource

Use fetch + response.body.getReader() to consume model streams and re-emit as SSE events, giving you POST support, custom headers, and AbortController cancellation that EventSource lacks.

June 5, 2026

Summary

Token streaming over 15-40 seconds feels responsive to users instead of a frozen spinner; proper cancellation prevents orphaned GPU jobs and duplicate billing; anti-buffering headers (no-transform, X-Accel-Buffering: no) force proxies to flush tokens immediately instead of batching them at the end.

Why it matters

Token streaming over 15-40 seconds feels responsive to users instead of a frozen spinner; proper cancellation prevents orphaned GPU jobs and duplicate billing; anti-buffering headers (no-transform, X-Accel-Buffering: no) force proxies to flush tokens immediately instead of batching them at the end.

Implementation verdict

Replaces naive response-waiting patterns and EventSource for LLM endpoints. Requires Next.js 15 Route Handler, AbortController wiring through streamModel generator, TextDecoder buffering to respect TCP boundaries, and maxDuration tuning on Vercel. Ready now—this is production code from spectr-ai's security report tool.

Sources

  1. 1.the server forwards hundreds of text fragments coming out of the model in real time
  2. 2.EventSource is the obvious tool for SSE, and it handles reconnection for free. But it only does GET requests
  3. 3.controller.enqueue(encoder.encode(`data: ${JSON.stringify(event)}\n\n`))
  4. 4.no-transform and X-Accel-Buffering: no. These are the anti-buffering headers. no-transform tells proxies not to gzip-buffer the body, and X-Accel-Buffering: no disables nginx's response buffer
  5. 5.When the browser aborts, Next.js aborts request.signal, which I pass into streamModel, which passes it to the model fetch

Dev Signal

Get briefs like this in your inbox — free, 3x a week.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.