Add models command, formalize /responses support, harden defaults#263
Open
RazonIn4K wants to merge 9 commits into
Open
Add models command, formalize /responses support, harden defaults#263RazonIn4K wants to merge 9 commits into
RazonIn4K wants to merge 9 commits into
Conversation
added 9 commits
June 12, 2026 00:50
Server is now local-only unless --host is explicitly passed. Docker entrypoint passes --host 0.0.0.0 so published ports keep working.
Covers images, tool_result ordering, mixed text/tool streams, invalid tool JSON, and cache token accounting.
- Skip malformed or non-JSON SSE events instead of crashing the stream - Warn when tool definitions/calls are dropped for /responses-only models
Drives the real Hono route and asserts the claude (1.15) and grok (1.03) multipliers, the 346/480-token tool overhead, the mcp__/claude-code beta exemption, and the invalid-JSON fallback.
Map chat-completions tools/tool_choice into the Responses API shape, convert assistant tool_calls + tool results into function_call/function_call_output input items, and translate function_call output items and function_call_arguments deltas back into chat-completions tool_calls (non-stream and streaming). Replaces the previous drop-with-warning behavior.
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds support for routing certain models through the Copilot Responses endpoint while preserving an OpenAI-compatible chat completions surface, and improves CLI/deployment ergonomics and edge-case handling.
Changes:
- Add Responses API adapter (payload translation + streaming event → chat chunk conversion) and route selection based on model supported endpoints.
- Add
modelsCLI subcommand and enhance server startup options with--host(plus Docker binding + docs updates). - Add tests covering Responses adapter, token counting logic, and Anthropic translation edge-cases (tools, images, cached tokens, streaming behavior).
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/create-responses.test.ts | Adds unit tests for Responses→ChatCompletion conversion and streaming chunk mapping (text + tool_calls). |
| tests/count-tokens-handler.test.ts | Adds tests for /count_tokens multiplication/overhead rules and invalid JSON fallback. |
| tests/anthropic-edge-cases.test.ts | Adds edge-case tests for Anthropic↔OpenAI translations (images, tool_result ordering, cached tokens, stream block boundaries). |
| src/services/copilot/create-responses.ts | Implements Responses endpoint adapter, including payload conversion, tool forwarding, and SSE event mapping. |
| src/routes/chat-completions/handler.ts | Switches to Responses endpoint for models that only support /responses, including streaming bridging. |
| src/services/copilot/get-models.ts | Extends model shape with supported_endpoints to drive endpoint selection. |
| src/routes/messages/non-stream-translation.ts | Prevents crashes on malformed tool-call JSON by safely parsing tool arguments. |
| src/start.ts | Adds --host bind option, updates server URL display, and warns on non-local binding. |
| src/models.ts | Adds models subcommand to list available Copilot models (optionally JSON). |
| src/main.ts | Registers new models subcommand. |
| package.json | Adds bun run models script. |
| Dockerfile / entrypoint.sh | Runs as bun user; binds server to 0.0.0.0 inside container to support published ports. |
| README.md / AGENTS.md | Updates docs for Docker paths, CLI usage, models command, and build/test notes. |
| .github/workflows/deploy-pages.yml | Changes Pages deploy trigger branch to master. |
| src/services/get-vscode-version.ts | Removes stray top-level invocation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+68
to
+75
| const displayHost = options.host === "0.0.0.0" ? "localhost" : options.host | ||
| const serverUrl = `http://${displayHost}:${options.port}` | ||
|
|
||
| if (options.host !== "127.0.0.1" && options.host !== "localhost") { | ||
| consola.warn( | ||
| `Server will listen on ${options.host} and may be reachable from other machines. Use the default host (127.0.0.1) for local-only access.`, | ||
| ) | ||
| } |
Comment on lines
4
to
+5
| push: | ||
| branches: [ "main" ] | ||
| branches: [master] |
Comment on lines
+15
to
+19
| import { | ||
| translateToAnthropic, | ||
| translateToOpenAI, | ||
| } from "../src/routes/messages/non-stream-translation" | ||
| import { translateChunkToAnthropicEvents } from "../src/routes/messages/stream-translation" |
Comment on lines
+108
to
+110
| const isNonStreamingResponse = ( | ||
| response: Awaited<ReturnType<typeof createResponsesFromChatCompletions>>, | ||
| ): response is ResponseApiResponse => !(Symbol.asyncIterator in response) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add
modelscommand, formalize/responsessupport, harden defaultsThis branch is 9 commits ahead of
master. It adds a non-interactivemodelscommand, formalizes and hardens the/responsesadapter the chathandler already depended on, makes the server local-only by default, and fixes
documentation, Docker, and CI drift.
Summary of changes
Features
modelsCLI command (e273b3f) — inspect the current Copilot models andtheir supported endpoints without starting the server. Useful for
non-interactive deployments. Supports
--json,--github-token,--account-type,--show-token, and--proxy-env./responsesendpoint routing (fb77680) — models that only advertise/responses(and not/chat/completions) are now transparently routedthrough a Responses API adapter that converts both non-streaming and streaming
responses back into Chat Completions shape.
/responses(a695374) — chat-completions tools andtool_choiceare mapped into the Responses API shape, assistanttool_callsand tool results become
function_call/function_call_outputinput items, andfunction_calloutputs andfunction_call_argumentsdeltas are translatedback into
tool_calls(non-streaming and streaming). Replaces the earlierdrop-with-warning behavior.
44c74c7) — the server now binds to127.0.0.1unless
--hostis passed, so it is not reachable from other machines out ofthe box. A warning prints when binding to a non-local host. The Docker
entrypoint passes
--host 0.0.0.0so published ports keep working.Fixes
3d24eef) — the VS Code version is now fetchedonly when called, not as an import-time network side effect.
dec3694) — malformed tool arguments fromCopilot no longer crash
translateToAnthropic; the tool input falls back toan empty object.
/responsesstream adapter (e8f316c) — malformed or non-JSON SSEevents (e.g. keepalives) are skipped instead of killing the stream, and a
warning is logged when tool definitions/calls are dropped for
/responses-only models.Tests
dec3694) — images,tool_resultordering(including multiple results per message), mixed text/tool streaming block
transitions, invalid tool JSON, and cache-token accounting.
/responsesadapter (fb77680,e8f316c) — endpoint selection,non-streaming conversion, streaming event conversion, malformed-event
skipping, and unknown-event handling.
count_tokenshandler (ddede24) — drives the real Hono route and locksthe claude (1.15) and grok (1.03) multipliers, the 346/480-token tool
overhead, the
mcp__/claude-codebeta exemption, and the invalid-JSONfallback.
Docs / chore
58249ba) —Dockerfileruns asUSER bun; READMEvolume path corrected to
/home/bun/.local/share/copilot-api.AGENTS.mdupdated (tsup→tsdown, corrected testcommand); README documents the
modelscommand, the--hostflag, andsingle-account/local-use assumptions.
master(the branch this fork andupstream actually expose).
Verification
bun run lint:all— passingbun run typecheck— passingbun test— 60 passing (was 29)bun run build— passingNotes for the maintainer
/responsesadapter now forwards function tool calls, but image contentis still stringified (
[image: <url>]) rather than sent as a structuredinput-image part. The tool mapping was implemented against the documented
OpenAI Responses API shapes; it should be validated against a live
/responses-only Copilot model, since Copilot's variant may differ in fieldnames (e.g.
call_idvsid).USER bun, the host directory mounted at/home/bun/.local/share/copilot-apimust be writable by uid 1000. If you hitan
EACCESon first run,chown 1000:1000 ./copilot-dataon the host.