github-copilot byok-custom-api openai-compatible local-inference cost-routing

Point GitHub Copilot Chat at any OpenAI-compatible API

BYOK support lets Copilot Chat and CLI use Claude, Gemini, or local vLLM via environment variables or UI form—inline completions still use GitHub's infra.

May 27, 2026

Summary

Developers can escape Copilot's model roster without leaving the editor, route inference spend to preferred providers, and test proprietary or self-hosted models in real workflows. Inline completions remain unaffected, so code ghosttext latency budgets stay met.

Why it matters

Implementation verdict

BYOK replaces the need to context-switch to other IDEs for model choice. Requires: valid OpenAI-compatible endpoint, API key, 30 seconds of configuration (UI form in VS Code stable; three env vars for CLI). Ready now—GA confirmed April 2026 changelog. Static credentials only; telemetry still flows to GitHub; rate limiting is your responsibility. Inline code completions do not participate—this is chat and agents only.

Sources

1.GitHub Copilot now lets you point Chat (VS Code) and the Copilot CLI at any OpenAI-compatible endpoint
2.Inline completions are unaffected — they still run on Copilot's own infra
3.The split exists because completions need single-digit-millisecond latency budgets that arbitrary endpoints can't promise
4.GA was confirmed in the April 2026 GitHub changelog
5.Telemetry still flows to GitHub. BYOK changes where the inference happens, not where the usage telemetry goes

Dev Signal

Get briefs like this in your inbox — free, every weekday.

100+ sources compressed into one 4-minute read. Ranked, cited, implementation-ready.

Read the full issue →All briefs