OpenAI models switch endpoints for interleaved reasoning
GPT-5 class models now route through /v1/responses instead of /v1/chat/completions, exposing summarized reasoning tokens in CLI output with optional suppression flags.
May 15, 2026
Summary
Developers can inspect model reasoning steps during tool-use interactions without parsing hidden state. The -R flag lets you suppress noise in production workflows where reasoning visibility isn't needed.
Why it matters
Developers can inspect model reasoning steps during tool-use interactions without parsing hidden state. The -R flag lets you suppress noise in production workflows where reasoning visibility isn't needed.
Implementation verdict
Replaces /v1/chat/completions routing for reasoning-capable models. Requires updating llm CLI to 0.32a2+. Ready now as alpha—test against your reasoning-heavy prompts before production dependency, but no blockers identified.
Sources
- 1.Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions
- 2.This enables interleaved reasoning across tool calls for GPT-5 class models
- 3.Use the -R or --hide-reasoning flags if you don't want to see that
Dev Signal
Get briefs like this in your inbox — free, 3x a week.
100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.