Skip to content

feat: add requestTransform for deterministic matching and recording#63

Open
iskhakovt wants to merge 169 commits intoCopilotKit:mainfrom
iskhakovt:feat/request-transform
Open

feat: add requestTransform for deterministic matching and recording#63
iskhakovt wants to merge 169 commits intoCopilotKit:mainfrom
iskhakovt:feat/request-transform

Conversation

@iskhakovt
Copy link
Copy Markdown
Contributor

@iskhakovt iskhakovt commented Mar 30, 2026

Problem

LLM prompts often contain dynamic data (timestamps, UUIDs, session IDs)
injected by upstream services. When recording fixtures, the match key
includes this dynamic data. On replay, the same prompt with different
timestamps doesn't match the stored fixture.

Example: Hindsight memory server injects Event Date: 2026-03-30T13:19:38
into extraction prompts. Each test run has a different timestamp, so
recorded fixtures never match on replay.

Solution

Add requestTransform to MockServerOptions — a function that normalizes
requests before both matching and recording:

const mock = new LLMock({
  requestTransform: (req) => ({
    ...req,
    messages: req.messages.map(m => ({
      ...m,
      content: typeof m.content === "string"
        ? m.content.replace(/\d{4}-\d{2}-\d{2}T[\d:.+Z]+/g, "")
        : m.content,
    })),
    embeddingInput: req.embeddingInput?.split(" | ")[0],
  }),
});

Recording: saves the transformed match key — no timestamps in fixture.
Matching: transforms incoming request before comparison — same clean key.

When requestTransform is set, string matching switches from includes
(substring) to === (exact equality). This prevents false positive matches
from shortened keys accidentally matching unrelated prompts. Without a
transform, existing includes behavior is preserved (backward compatible).

Follows the Polly.js pattern of
composable request normalizers for deterministic snapshot matching.

Changes

  • types.ts: Add requestTransform to MockServerOptions and HandlerDefaults
  • router.ts: Optional 4th param on matchFixture, exact match when transform set
  • server.ts: Thread transform into defaults
  • recorder.ts: Apply transform before match key extraction
  • All handlers: Pass defaults.requestTransform as 4th arg to matchFixture
  • docs/record-replay.html: Document requestTransform feature
  • 8 new router tests for transform behavior, exact matching, backward compat

jpr5 and others added 30 commits March 3, 2026 13:43
Make husky prepare script graceful for fresh installs
Signed-off-by: Tyler Slaton <tyler@copilotkit.ai>
docs: update CLAUDE.md for conventional commits
Add handler modules for two new LLM provider APIs, both following the
established pattern from responses.ts: convert inbound request to
ChatCompletionRequest, match fixtures, convert response back to
provider-specific format.

Claude Messages API (/v1/messages):
- Streaming via event: type / data: json SSE format
- Non-streaming JSON responses
- Full message lifecycle: message_start through message_stop
- Tool use with input_json_delta streaming
- msg_ and toolu_ ID prefixes

Google Gemini GenerateContent API:
- /v1beta/models/{model}:generateContent (non-streaming)
- /v1beta/models/{model}:streamGenerateContent (streaming)
- data-only SSE format (no event prefix, no [DONE])
- functionCall/functionResponse round-trips with synthetic IDs
- FUNCTION_CALL finishReason for tool call responses

Also adds generateMessageId() and generateToolUseId() helpers,
server routes for both providers, and comprehensive tests.
Rename the project from @copilotkit/mock-openai to @copilotkit/llmock
to reflect multi-provider scope (OpenAI, Anthropic, Google Gemini).

- Class: MockOpenAI → LLMock
- Files: mock-openai.ts → llmock.ts, mock-openai.test.ts → llmock.test.ts
- Package: @copilotkit/mock-openai → @copilotkit/llmock
- CLI: "Usage: mock-openai" → "Usage: llmock"
- Binary: mock-openai → llmock
- All imports, tests, and docs updated
- Clean break — no backward-compat alias
…lmock

Add Claude + Gemini provider support, rename to LLMock
Update README.md and docs/index.html to reflect the rename from
mock-openai/MockOpenAI to llmock/LLMock throughout. Add documentation
for Claude Messages API and Gemini GenerateContent endpoints, update
the MSW comparison table with multi-provider rows, and add
ANTHROPIC_BASE_URL/Gemini base URL examples.
Add src/__tests__/api-conformance.test.ts with 52 tests validating
that mock server output structurally matches each real API spec:
OpenAI Chat Completions, OpenAI Responses API, Anthropic Claude
Messages API, Google Gemini, and cross-provider invariants. Tests
cover required fields, types, value enums, event sequences, headers,
and ID prefix formats.
…llmock

Rename docs to llmock + add multi-provider docs and API conformance tests
getTextContent now supports ContentPart[] content (e.g. [{type:"text", text:"..."}])
as sent by some SDKs like Strands. Empty-string text parts are filtered out,
returning null instead of "".
…upport

Add getTextContent for array-format message content
Tests hitting real LLM APIs cost money, time out, and are flaky.
The old copy focused on multi-process architecture; the new copy
leads with what users actually care about.
Rewrite 'Why llmock' to lead with the problem
ci: add workflow_dispatch trigger to release workflow
prependFixture() inserts a fixture at the front of the list (index 0),
replacing the pattern of addFixture() + splice/unshift via `as any`.

getFixtures() returns a readonly view of the fixture array, replacing
direct access to the private `fixtures` field via `as any`.

Both methods are needed by ag-ui's e2e test setup to prepend a
tool-result catch-all fixture and log fixture statistics.
…-fixtures

Add prependFixture() and getFixtures() public API
…end-get-fixtures

Add changeset for 1.1.0 release (prependFixture/getFixtures)
jpr5 and others added 18 commits March 21, 2026 10:26
- metrics.test.ts: add test that injects a faulty registry via spy to
  verify the try-catch in res.on("finish") prevents process crashes;
  rename existing test for accuracy
- stream-collapse.test.ts: update CRC mismatch tests to assert
  result.truncated === true (replaced console.warn spy pattern)
…ion test

Clamp x-llmock-chaos-* header values to [0,1] and warn on NaN or out-of-range
input. Restore universal clamping in resolveChaosConfig to cover fixture-level
and server-default rates (regression from prior change). Fix file-level docstring
to accurately describe the three chaos actions. Add tests for header clamping/NaN
behavior and disconnect chaos action end-to-end.
…nish callback

Wrap the res.on('finish') metrics block in try/catch to prevent instrumentation
errors (wrong label cardinality, registry misconfiguration) from propagating
silently or crashing the request handler. Log failures at warn level so operators
see them without enabling debug logging.
Change providerKey parameter type from string to RecordProviderKey in
collapseStreamingResponse, proxyAndRecord, handleGemini, and handleCompletions.
Catches provider key typos at compile time. Add console.warn for unknown SSE
provider fallback and document the OpenAI fallback behavior in the docstring.
Add TODO comments for CollapseResult discriminated union and chunkSize helper
centralization. Fix test comment and cast for unknown-provider fallback path.
…d time

Add error-severity validation checks in validateFixtures for streamingProfile
(ttft >= 0, tps > 0, jitter in [0,1]) and chaos (all rates in [0,1]). Catches
nonsensical streaming physics and out-of-range chaos rates early with clear
error messages rather than silently producing broken behavior at request time.
…G chaos flags

- docker.html: fix health probes (TCP socket → httpGet on /health and /ready)
- docker.html: remove "CLI Configuration (v1.7.0)" section (references non-existent --config
  flag and aimock binary name)
- docker.html: fix --chaos-error-rate → --chaos-drop/--chaos-malformed/--chaos-disconnect
- docker.html: fix mountPath /fixtures → /app/fixtures (matches actual values.yaml)
- docs.html: add POST /v2/chat (Cohere) and POST /api/generate (Ollama) to endpoint table
- CHANGELOG.md: fix "via --chaos CLI flag" → list all three chaos flags
- README.md: fix chaos-testing link (chaos.html → chaos-testing.html)
… bedrock SSE; body timeout

- chaos.ts: add optional logger param to resolveChaosConfig/evaluateChaos/applyChaos;
  replace all console.warn calls with logger?.warn
- stream-collapse.ts: logger param on collapseStreamingResponse; replace console.warn;
  add explicit case "bedrock" routing to collapseAnthropicSSE; add bounds check in
  decodeEventStreamFrames — return {frames, truncated:true} when totalLength extends
  past buffer, preventing out-of-bounds reads on malformed/truncated EventStream frames
- recorder.ts: pass defaults.logger to collapseStreamingResponse; add res.setTimeout
  body accumulation timeout (30s) to prevent unbounded memory growth on slow responses
- bedrock.ts: update module docstring to describe all four endpoint families
- all handlers: pass defaults.logger as final arg to all applyChaos call sites
…edrock SSE, and body timeout

- chaos.test.ts: verify evaluateChaos without logger does not call console.warn;
  verify invalid chaos header with logLevel:silent is silently ignored end-to-end
- stream-collapse.test.ts: verify bounds check returns {truncated:true} for
  oversized totalLength; verify provider="bedrock" routes to collapseAnthropicSSE
- recorder.test.ts: verify proxyAndRecord calls res.setTimeout(30_000) on
  upstream IncomingMessage
…ation, type unions

- recorder.ts: fix misleading 'saving raw response' log → 'saving as error fixture'
- recorder.ts: warn when stream collapse produces empty content
- recorder.ts: preserve both empty-match and truncation warnings in fixture JSON
- cli.ts: exit(1) on zero fixtures in strict/validate mode
- server.ts: warn on out-of-range chaos config values at startup
- bedrock.ts/messages.ts: narrow content block type from string to union
- aws-event-stream.ts: fix writeEventStream docstring return semantics
…Kit#53)

## Summary

Major feature release adding 8 capabilities to llmock, plus 29 bugs
found and fixed in code review.

### Provider Endpoints
- **Bedrock Streaming** — invoke-with-response-stream (AWS Event Stream
binary) + Converse API
- **Vertex AI** — Routes to existing Gemini handler
- **Ollama** — /api/chat, /api/generate, /api/tags (NDJSON streaming)
- **Cohere** — /v2/chat (typed SSE events)

### Infrastructure
- **Chaos Testing** — Probabilistic drop/malformed/disconnect, three
precedence levels (header > fixture > server), rate clamping to [0,1]
- **Prometheus Metrics** — Opt-in /metrics, counters, cumulative
histograms, gauges

### Record-and-Replay
- **Proxy-on-miss** — Real API responses saved as fixtures with 30s
upstream timeout
- **Stream collapsing** — 6 functions (SSE, NDJSON, EventStream)
supporting both Converse and Messages formats
- **Strict mode (503)** — Catch missing fixtures in CI
- **Auth safety** — Forwarded but redacted in journal, never in fixtures

### Quality
- **1250 tests** across 37 files
- 7 rounds of 7-agent code review, 29 bugs found and fixed
- Build/format/lint clean, zero external dependencies, zero as-any in
source

## Review Fixes (29 total across 7 rounds)

### Round 1: Original review (20 findings)
- HandlerDefaults type extracted, fixing silent undefined access in 5
handlers
- Provider-specific error formats (Anthropic, Gemini, Bedrock)
- Recorder binary relay corruption (UTF-8 round-trip on EventStream)
- collapseOllamaNDJSON tool_calls + buildFixtureResponse priority
- ChaosAction dedup, RecordProviderKey union, OllamaMessage.role union
- collapseCohereSSE naming, chaos rate clamping, recorder auth comment
- SKILL.md 503 status, warn log level, README provider list, types.ts
header

### Round 2 (2 findings)
- applyChaos registry argument missing in 5 handlers (chaos metrics
incomplete)
- Bedrock Converse response format missing in buildFixtureResponse

### Round 5 — fresh context (2 findings)
- Global recordCounter → crypto.randomUUID() (concurrent test
determinism)
- rawBody pass-through in OpenAI completions proxy path

### Round 6 — fresh context (2 findings)
- 30s upstream timeout in makeUpstreamRequest (prevents indefinite
hangs)
- collapseBedrockEventStream: handle both Converse (camelCase) and
Messages (flat type) formats

### Round 7 — fresh context (3 findings)
- new URL() validation with specific 502 error for malformed provider
URLs
- writtenToDisk flag to prevent misleading "Response recorded" log on
write failure
- res.on("error") handler for upstream response stream mid-transfer
drops

All fixes have corresponding regression tests.
Automated weekly update based on competitor README analysis.
Writes a markdown file with a change table and mermaid flowchart
grouped by competitor. Mermaid node labels are quoted and subgraph
IDs sanitized to handle special characters in competitor/capability
names.
Covers markdown table generation, mermaid flowchart structure,
special character escaping (parentheses, quotes, slashes),
competitor grouping, node ID uniqueness, and file I/O.
Pass --summary to the script and use gh pr create --body-file to
inject the markdown directly, avoiding shell interpolation of
backticks from the mermaid code fences.
## Summary

- The `update-competitive-matrix.ts` script now accepts `--summary
<path>` to write a markdown summary with a change table and mermaid
flowchart grouped by competitor
- The workflow uses `--body-file` to inject the summary directly into
the PR body, avoiding shell interpolation of mermaid backtick fences
- Mermaid node labels are quoted and subgraph IDs sanitized to handle
special characters (parentheses, slashes, quotes)
- Unit tests cover formatting, mermaid structure, escaping, and edge
cases

## Test plan

- [x] `pnpm test` — 1318 tests pass (13 new)
- [x] `pnpm run format:check` — clean
- [x] `pnpm run lint` — clean
- [x] `pnpm run build` — clean
- [ ] Trigger workflow via `workflow_dispatch` and confirm PR body
renders correctly
@iskhakovt iskhakovt force-pushed the feat/request-transform branch from bac282e to 0207739 Compare March 30, 2026 18:52
…ording

Optional requestTransform on MockServerOptions normalizes requests before
fixture matching. When set, string comparisons use exact equality (===)
instead of includes() for deterministic recorded-fixture replay.

- matchFixture gets optional 4th parameter, threaded from all handlers
- Recorder applies transform before building fixture match keys
- 8 new tests cover transform behavior, backward compat, and predicate passthrough

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@iskhakovt iskhakovt force-pushed the feat/request-transform branch from 0207739 to 958add3 Compare March 30, 2026 19:43
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@iskhakovt iskhakovt marked this pull request as ready for review March 30, 2026 20:34
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@iskhakovt iskhakovt force-pushed the feat/request-transform branch 2 times, most recently from 9694329 to 75d0dfa Compare March 31, 2026 02:24
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Mar 31, 2026

Open in StackBlitz

npm i https://pkg.pr.new/CopilotKit/llmock/@copilotkit/llmock@63

commit: 75d0dfa

Copy link
Copy Markdown
Contributor

@jpr5 jpr5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — 7-Agent Standard CR

335 lines across 17 files reviewed by 7 specialized agents.

The feature design is sound and well-motivated — requestTransform solves a real problem with dynamic data in recorded fixtures. The implementation is mechanically correct across all 17 files, with consistent threading of the new parameter through every handler. Three items need addressing before merge.


Bugs

1. Docs claim "RegExp and predicate matching are unaffected" — RegExp IS affected

docs/record-replay.html states:

RegExp and predicate matching are unaffected

But the code in router.ts applies effectiveReq (the transformed request) to all matching criteria including regex — only predicates receive the original req. Your own test at line 125 ("regexp does not match when transform changes the text") proves this by asserting that regex fails when the transform changes the content.

Fix: Change to something like:

Only predicate matching is unaffected — predicates always receive the original (untransformed) request. All other match criteria (including RegExp) operate on the transformed request.

Also update the JSDoc on requestTransform in types.ts — it mentions matching but omits the effect on recording and the includes()=== behavioral switch.

2. Transform that throws crashes the handler with opaque "Internal error"

requestTransform is user-supplied code called unprotected in matchFixture():

const effectiveReq = requestTransform ? requestTransform(req) : req;

If the transform throws (TypeError, user logic error, etc.), the exception propagates up through every handler into the HTTP response path. Users get a generic 500 "Internal error" with zero indication their transform was the problem.

In recorder.ts, it's worse — the upstream response has already been fetched but is lost when the transform throws during fixture-key building.

In WebSocket handlers, the matchFixture call is outside the JSON parse try/catch, so a throwing transform crashes the message handler.

Fix: Wrap the transform invocation in try/catch. Log the actual error with context identifying the transform as the source. Return null from matchFixture (no match) on failure, or create a small wrapper:

let effectiveReq: ChatCompletionRequest;
try {
  effectiveReq = requestTransform ? requestTransform(req) : req;
} catch (err) {
  // need logger access — either pass it in or wrap at call sites
  return null;
}

Since matchFixture doesn't have logger access, the wrapping may need to happen at call sites. Either approach works.

3. Transform that drops messages causes TypeError

If a user's transform returns an object without messages (e.g. { model: req.model }), getLastMessageByRole(effectiveReq.messages, "user") throws TypeError: Cannot read properties of undefined (reading 'length'). TypeScript interfaces provide no runtime enforcement.

Fix: Either validate the transform output at the top of matchFixture, or add a guard before getLastMessageByRole:

if (match.userMessage !== undefined) {
  if (!effectiveReq.messages?.length) continue;
  const msg = getLastMessageByRole(effectiveReq.messages, "user");
  // ...
}

Missing Test

4. No test for throwing transform

User-supplied code in a hot path with no error handling deserves a test that documents the contract — whether matchFixture propagates the exception or handles it gracefully. Either behavior is fine, but it should be tested and intentional, not accidental.


Design Discussion (non-blocking)

5. Identity transform silently changes matching semantics

The mere presence of ANY transform (even (r) => r) switches string matching from includes() to ===. A user adding a no-op transform expecting no behavioral change will find previously-matching fixtures stop matching. Worth considering whether exact-match should be a separate exactMatch?: boolean option rather than coupled to transform presence.

6. Transform can mutate original request

If a user writes req.messages = req.messages.filter(...) (mutating in place), effectiveReq === req and the "predicates receive original" contract is silently violated. The docs example uses spread correctly, but nothing enforces immutability. Consider documenting this prominently or using structuredClone(req) before passing to the transform.

7. Type duplication across 8+ locations

The 3 WebSocket handler files inline their own defaults type (6 copies) instead of referencing HandlerDefaults. recorder.ts also has its own inline type. Consider extracting:

export type WebSocketHandlerDefaults = HandlerDefaults & { model: string };
export type RequestTransform = (req: ChatCompletionRequest) => ChatCompletionRequest;

This is a pre-existing pattern the PR inherits, not something it introduced — but since you're touching all these signatures anyway, it's a good time to clean it up.


What's Good

  • Clean, consistent threading of the new parameter through all 15 matchFixture call sites
  • Predicate isolation (receiving original req) is a well-thought-out design choice
  • Dual application in recorder (normalize both at match time and record time) ensures round-trip consistency
  • 8 router tests cover the core behavioral contract well: exact match, regex, embedding, backward compat, predicate isolation
  • Documentation section with realistic code example

@jpr5 jpr5 force-pushed the main branch 2 times, most recently from f97049f to 2bf6bc3 Compare April 3, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants