feat: add requestTransform for deterministic matching and recording#63
feat: add requestTransform for deterministic matching and recording#63iskhakovt wants to merge 169 commits intoCopilotKit:mainfrom
Conversation
chore: release 0.1.0
Add pre-commit hook and CLAUDE.md
Make husky prepare script graceful for fresh installs
Signed-off-by: Tyler Slaton <tyler@copilotkit.ai>
chore: release 1.0.0
docs: update CLAUDE.md for conventional commits
Add unit tests badge to README
docs: add CopilotKit kite favicon
Add handler modules for two new LLM provider APIs, both following the
established pattern from responses.ts: convert inbound request to
ChatCompletionRequest, match fixtures, convert response back to
provider-specific format.
Claude Messages API (/v1/messages):
- Streaming via event: type / data: json SSE format
- Non-streaming JSON responses
- Full message lifecycle: message_start through message_stop
- Tool use with input_json_delta streaming
- msg_ and toolu_ ID prefixes
Google Gemini GenerateContent API:
- /v1beta/models/{model}:generateContent (non-streaming)
- /v1beta/models/{model}:streamGenerateContent (streaming)
- data-only SSE format (no event prefix, no [DONE])
- functionCall/functionResponse round-trips with synthetic IDs
- FUNCTION_CALL finishReason for tool call responses
Also adds generateMessageId() and generateToolUseId() helpers,
server routes for both providers, and comprehensive tests.
Rename the project from @copilotkit/mock-openai to @copilotkit/llmock to reflect multi-provider scope (OpenAI, Anthropic, Google Gemini). - Class: MockOpenAI → LLMock - Files: mock-openai.ts → llmock.ts, mock-openai.test.ts → llmock.test.ts - Package: @copilotkit/mock-openai → @copilotkit/llmock - CLI: "Usage: mock-openai" → "Usage: llmock" - Binary: mock-openai → llmock - All imports, tests, and docs updated - Clean break — no backward-compat alias
…lmock Add Claude + Gemini provider support, rename to LLMock
Update README.md and docs/index.html to reflect the rename from mock-openai/MockOpenAI to llmock/LLMock throughout. Add documentation for Claude Messages API and Gemini GenerateContent endpoints, update the MSW comparison table with multi-provider rows, and add ANTHROPIC_BASE_URL/Gemini base URL examples.
Add src/__tests__/api-conformance.test.ts with 52 tests validating that mock server output structurally matches each real API spec: OpenAI Chat Completions, OpenAI Responses API, Anthropic Claude Messages API, Google Gemini, and cross-provider invariants. Tests cover required fields, types, value enums, event sequences, headers, and ID prefix formats.
…llmock Rename docs to llmock + add multi-provider docs and API conformance tests
getTextContent now supports ContentPart[] content (e.g. [{type:"text", text:"..."}])
as sent by some SDKs like Strands. Empty-string text parts are filtered out,
returning null instead of "".
…upport Add getTextContent for array-format message content
Tests hitting real LLM APIs cost money, time out, and are flaky. The old copy focused on multi-process architecture; the new copy leads with what users actually care about.
Rewrite 'Why llmock' to lead with the problem
ci: add workflow_dispatch trigger to release workflow
prependFixture() inserts a fixture at the front of the list (index 0), replacing the pattern of addFixture() + splice/unshift via `as any`. getFixtures() returns a readonly view of the fixture array, replacing direct access to the private `fixtures` field via `as any`. Both methods are needed by ag-ui's e2e test setup to prepend a tool-result catch-all fixture and log fixture statistics.
…-fixtures Add prependFixture() and getFixtures() public API
…end-get-fixtures Add changeset for 1.1.0 release (prependFixture/getFixtures)
- metrics.test.ts: add test that injects a faulty registry via spy to
verify the try-catch in res.on("finish") prevents process crashes;
rename existing test for accuracy
- stream-collapse.test.ts: update CRC mismatch tests to assert
result.truncated === true (replaced console.warn spy pattern)
…ion test Clamp x-llmock-chaos-* header values to [0,1] and warn on NaN or out-of-range input. Restore universal clamping in resolveChaosConfig to cover fixture-level and server-default rates (regression from prior change). Fix file-level docstring to accurately describe the three chaos actions. Add tests for header clamping/NaN behavior and disconnect chaos action end-to-end.
…nish callback
Wrap the res.on('finish') metrics block in try/catch to prevent instrumentation
errors (wrong label cardinality, registry misconfiguration) from propagating
silently or crashing the request handler. Log failures at warn level so operators
see them without enabling debug logging.
Change providerKey parameter type from string to RecordProviderKey in collapseStreamingResponse, proxyAndRecord, handleGemini, and handleCompletions. Catches provider key typos at compile time. Add console.warn for unknown SSE provider fallback and document the OpenAI fallback behavior in the docstring. Add TODO comments for CollapseResult discriminated union and chunkSize helper centralization. Fix test comment and cast for unknown-provider fallback path.
…d time Add error-severity validation checks in validateFixtures for streamingProfile (ttft >= 0, tps > 0, jitter in [0,1]) and chaos (all rates in [0,1]). Catches nonsensical streaming physics and out-of-range chaos rates early with clear error messages rather than silently producing broken behavior at request time.
…G chaos flags - docker.html: fix health probes (TCP socket → httpGet on /health and /ready) - docker.html: remove "CLI Configuration (v1.7.0)" section (references non-existent --config flag and aimock binary name) - docker.html: fix --chaos-error-rate → --chaos-drop/--chaos-malformed/--chaos-disconnect - docker.html: fix mountPath /fixtures → /app/fixtures (matches actual values.yaml) - docs.html: add POST /v2/chat (Cohere) and POST /api/generate (Ollama) to endpoint table - CHANGELOG.md: fix "via --chaos CLI flag" → list all three chaos flags - README.md: fix chaos-testing link (chaos.html → chaos-testing.html)
… bedrock SSE; body timeout
- chaos.ts: add optional logger param to resolveChaosConfig/evaluateChaos/applyChaos;
replace all console.warn calls with logger?.warn
- stream-collapse.ts: logger param on collapseStreamingResponse; replace console.warn;
add explicit case "bedrock" routing to collapseAnthropicSSE; add bounds check in
decodeEventStreamFrames — return {frames, truncated:true} when totalLength extends
past buffer, preventing out-of-bounds reads on malformed/truncated EventStream frames
- recorder.ts: pass defaults.logger to collapseStreamingResponse; add res.setTimeout
body accumulation timeout (30s) to prevent unbounded memory growth on slow responses
- bedrock.ts: update module docstring to describe all four endpoint families
- all handlers: pass defaults.logger as final arg to all applyChaos call sites
…edrock SSE, and body timeout
- chaos.test.ts: verify evaluateChaos without logger does not call console.warn;
verify invalid chaos header with logLevel:silent is silently ignored end-to-end
- stream-collapse.test.ts: verify bounds check returns {truncated:true} for
oversized totalLength; verify provider="bedrock" routes to collapseAnthropicSSE
- recorder.test.ts: verify proxyAndRecord calls res.setTimeout(30_000) on
upstream IncomingMessage
…ation, type unions - recorder.ts: fix misleading 'saving raw response' log → 'saving as error fixture' - recorder.ts: warn when stream collapse produces empty content - recorder.ts: preserve both empty-match and truncation warnings in fixture JSON - cli.ts: exit(1) on zero fixtures in strict/validate mode - server.ts: warn on out-of-range chaos config values at startup - bedrock.ts/messages.ts: narrow content block type from string to union - aws-event-stream.ts: fix writeEventStream docstring return semantics
…Kit#53) ## Summary Major feature release adding 8 capabilities to llmock, plus 29 bugs found and fixed in code review. ### Provider Endpoints - **Bedrock Streaming** — invoke-with-response-stream (AWS Event Stream binary) + Converse API - **Vertex AI** — Routes to existing Gemini handler - **Ollama** — /api/chat, /api/generate, /api/tags (NDJSON streaming) - **Cohere** — /v2/chat (typed SSE events) ### Infrastructure - **Chaos Testing** — Probabilistic drop/malformed/disconnect, three precedence levels (header > fixture > server), rate clamping to [0,1] - **Prometheus Metrics** — Opt-in /metrics, counters, cumulative histograms, gauges ### Record-and-Replay - **Proxy-on-miss** — Real API responses saved as fixtures with 30s upstream timeout - **Stream collapsing** — 6 functions (SSE, NDJSON, EventStream) supporting both Converse and Messages formats - **Strict mode (503)** — Catch missing fixtures in CI - **Auth safety** — Forwarded but redacted in journal, never in fixtures ### Quality - **1250 tests** across 37 files - 7 rounds of 7-agent code review, 29 bugs found and fixed - Build/format/lint clean, zero external dependencies, zero as-any in source ## Review Fixes (29 total across 7 rounds) ### Round 1: Original review (20 findings) - HandlerDefaults type extracted, fixing silent undefined access in 5 handlers - Provider-specific error formats (Anthropic, Gemini, Bedrock) - Recorder binary relay corruption (UTF-8 round-trip on EventStream) - collapseOllamaNDJSON tool_calls + buildFixtureResponse priority - ChaosAction dedup, RecordProviderKey union, OllamaMessage.role union - collapseCohereSSE naming, chaos rate clamping, recorder auth comment - SKILL.md 503 status, warn log level, README provider list, types.ts header ### Round 2 (2 findings) - applyChaos registry argument missing in 5 handlers (chaos metrics incomplete) - Bedrock Converse response format missing in buildFixtureResponse ### Round 5 — fresh context (2 findings) - Global recordCounter → crypto.randomUUID() (concurrent test determinism) - rawBody pass-through in OpenAI completions proxy path ### Round 6 — fresh context (2 findings) - 30s upstream timeout in makeUpstreamRequest (prevents indefinite hangs) - collapseBedrockEventStream: handle both Converse (camelCase) and Messages (flat type) formats ### Round 7 — fresh context (3 findings) - new URL() validation with specific 502 error for malformed provider URLs - writtenToDisk flag to prevent misleading "Response recorded" log on write failure - res.on("error") handler for upstream response stream mid-transfer drops All fixes have corresponding regression tests.
Automated weekly update based on competitor README analysis.
Writes a markdown file with a change table and mermaid flowchart grouped by competitor. Mermaid node labels are quoted and subgraph IDs sanitized to handle special characters in competitor/capability names.
Covers markdown table generation, mermaid flowchart structure, special character escaping (parentheses, quotes, slashes), competitor grouping, node ID uniqueness, and file I/O.
Pass --summary to the script and use gh pr create --body-file to inject the markdown directly, avoiding shell interpolation of backticks from the mermaid code fences.
## Summary - The `update-competitive-matrix.ts` script now accepts `--summary <path>` to write a markdown summary with a change table and mermaid flowchart grouped by competitor - The workflow uses `--body-file` to inject the summary directly into the PR body, avoiding shell interpolation of mermaid backtick fences - Mermaid node labels are quoted and subgraph IDs sanitized to handle special characters (parentheses, slashes, quotes) - Unit tests cover formatting, mermaid structure, escaping, and edge cases ## Test plan - [x] `pnpm test` — 1318 tests pass (13 new) - [x] `pnpm run format:check` — clean - [x] `pnpm run lint` — clean - [x] `pnpm run build` — clean - [ ] Trigger workflow via `workflow_dispatch` and confirm PR body renders correctly
bac282e to
0207739
Compare
…ording Optional requestTransform on MockServerOptions normalizes requests before fixture matching. When set, string comparisons use exact equality (===) instead of includes() for deterministic recorded-fixture replay. - matchFixture gets optional 4th parameter, threaded from all handlers - Recorder applies transform before building fixture match keys - 8 new tests cover transform behavior, backward compat, and predicate passthrough Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0207739 to
958add3
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9694329 to
75d0dfa
Compare
commit: |
jpr5
left a comment
There was a problem hiding this comment.
Code Review — 7-Agent Standard CR
335 lines across 17 files reviewed by 7 specialized agents.
The feature design is sound and well-motivated — requestTransform solves a real problem with dynamic data in recorded fixtures. The implementation is mechanically correct across all 17 files, with consistent threading of the new parameter through every handler. Three items need addressing before merge.
Bugs
1. Docs claim "RegExp and predicate matching are unaffected" — RegExp IS affected
docs/record-replay.html states:
RegExp and predicate matching are unaffected
But the code in router.ts applies effectiveReq (the transformed request) to all matching criteria including regex — only predicates receive the original req. Your own test at line 125 ("regexp does not match when transform changes the text") proves this by asserting that regex fails when the transform changes the content.
Fix: Change to something like:
Only predicate matching is unaffected — predicates always receive the original (untransformed) request. All other match criteria (including RegExp) operate on the transformed request.
Also update the JSDoc on requestTransform in types.ts — it mentions matching but omits the effect on recording and the includes() → === behavioral switch.
2. Transform that throws crashes the handler with opaque "Internal error"
requestTransform is user-supplied code called unprotected in matchFixture():
const effectiveReq = requestTransform ? requestTransform(req) : req;If the transform throws (TypeError, user logic error, etc.), the exception propagates up through every handler into the HTTP response path. Users get a generic 500 "Internal error" with zero indication their transform was the problem.
In recorder.ts, it's worse — the upstream response has already been fetched but is lost when the transform throws during fixture-key building.
In WebSocket handlers, the matchFixture call is outside the JSON parse try/catch, so a throwing transform crashes the message handler.
Fix: Wrap the transform invocation in try/catch. Log the actual error with context identifying the transform as the source. Return null from matchFixture (no match) on failure, or create a small wrapper:
let effectiveReq: ChatCompletionRequest;
try {
effectiveReq = requestTransform ? requestTransform(req) : req;
} catch (err) {
// need logger access — either pass it in or wrap at call sites
return null;
}Since matchFixture doesn't have logger access, the wrapping may need to happen at call sites. Either approach works.
3. Transform that drops messages causes TypeError
If a user's transform returns an object without messages (e.g. { model: req.model }), getLastMessageByRole(effectiveReq.messages, "user") throws TypeError: Cannot read properties of undefined (reading 'length'). TypeScript interfaces provide no runtime enforcement.
Fix: Either validate the transform output at the top of matchFixture, or add a guard before getLastMessageByRole:
if (match.userMessage !== undefined) {
if (!effectiveReq.messages?.length) continue;
const msg = getLastMessageByRole(effectiveReq.messages, "user");
// ...
}Missing Test
4. No test for throwing transform
User-supplied code in a hot path with no error handling deserves a test that documents the contract — whether matchFixture propagates the exception or handles it gracefully. Either behavior is fine, but it should be tested and intentional, not accidental.
Design Discussion (non-blocking)
5. Identity transform silently changes matching semantics
The mere presence of ANY transform (even (r) => r) switches string matching from includes() to ===. A user adding a no-op transform expecting no behavioral change will find previously-matching fixtures stop matching. Worth considering whether exact-match should be a separate exactMatch?: boolean option rather than coupled to transform presence.
6. Transform can mutate original request
If a user writes req.messages = req.messages.filter(...) (mutating in place), effectiveReq === req and the "predicates receive original" contract is silently violated. The docs example uses spread correctly, but nothing enforces immutability. Consider documenting this prominently or using structuredClone(req) before passing to the transform.
7. Type duplication across 8+ locations
The 3 WebSocket handler files inline their own defaults type (6 copies) instead of referencing HandlerDefaults. recorder.ts also has its own inline type. Consider extracting:
export type WebSocketHandlerDefaults = HandlerDefaults & { model: string };
export type RequestTransform = (req: ChatCompletionRequest) => ChatCompletionRequest;This is a pre-existing pattern the PR inherits, not something it introduced — but since you're touching all these signatures anyway, it's a good time to clean it up.
What's Good
- Clean, consistent threading of the new parameter through all 15
matchFixturecall sites - Predicate isolation (receiving original
req) is a well-thought-out design choice - Dual application in recorder (normalize both at match time and record time) ensures round-trip consistency
- 8 router tests cover the core behavioral contract well: exact match, regex, embedding, backward compat, predicate isolation
- Documentation section with realistic code example
f97049f to
2bf6bc3
Compare
Problem
LLM prompts often contain dynamic data (timestamps, UUIDs, session IDs)
injected by upstream services. When recording fixtures, the match key
includes this dynamic data. On replay, the same prompt with different
timestamps doesn't match the stored fixture.
Example: Hindsight memory server injects
Event Date: 2026-03-30T13:19:38into extraction prompts. Each test run has a different timestamp, so
recorded fixtures never match on replay.
Solution
Add
requestTransformtoMockServerOptions— a function that normalizesrequests before both matching and recording:
Recording: saves the transformed match key — no timestamps in fixture.
Matching: transforms incoming request before comparison — same clean key.
When
requestTransformis set, string matching switches fromincludes(substring) to
===(exact equality). This prevents false positive matchesfrom shortened keys accidentally matching unrelated prompts. Without a
transform, existing
includesbehavior is preserved (backward compatible).Follows the Polly.js pattern of
composable request normalizers for deterministic snapshot matching.
Changes
types.ts: AddrequestTransformtoMockServerOptionsandHandlerDefaultsrouter.ts: Optional 4th param onmatchFixture, exact match when transform setserver.ts: Thread transform into defaultsrecorder.ts: Apply transform before match key extractiondefaults.requestTransformas 4th arg tomatchFixturedocs/record-replay.html: DocumentrequestTransformfeature