OpenAI models switch endpoints for interleaved reasoning

GPT-5 class models now route through /v1/responses instead of /v1/chat/completions, exposing summarized reasoning tokens in CLI output with optional suppression flags.

May 15, 2026

Summary

Developers can inspect model reasoning steps during tool-use interactions without parsing hidden state. The -R flag lets you suppress noise in production workflows where reasoning visibility isn't needed.

Why it matters

Developers can inspect model reasoning steps during tool-use interactions without parsing hidden state. The -R flag lets you suppress noise in production workflows where reasoning visibility isn't needed.

Implementation verdict

Replaces /v1/chat/completions routing for reasoning-capable models. Requires updating llm CLI to 0.32a2+. Ready now as alpha—test against your reasoning-heavy prompts before production dependency, but no blockers identified.

Sources

  1. 1.Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions
  2. 2.This enables interleaved reasoning across tool calls for GPT-5 class models
  3. 3.Use the -R or --hide-reasoning flags if you don't want to see that

Dev Signal

Get briefs like this in your inbox — free, 3x a week.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.