Skip to content

JITSU-74 feat(ingest): capture request headers into event context#1343

Open
vklimontovich wants to merge 8 commits into
newjitsufrom
feat/event-context-headers
Open

JITSU-74 feat(ingest): capture request headers into event context#1343
vklimontovich wants to merge 8 commits into
newjitsufrom
feat/event-context-headers

Conversation

@vklimontovich

@vklimontovich vklimontovich commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

JITSU-74

What

Adds context.headers to ingested events so destinations can see the raw HTTP request headers (accept, content-type, sec-fetch-*, sec-ch-ua*, …) and distinguish real browser traffic from bots/agents. Today only context.userAgent is available.

Behavior

  • Browser endpointcontext.headers is derived only from the actual request; the body can't redefine them (a browser can't read its own request headers anyway, and shouldn't be able to spoof them).
  • S2S endpoint — captures the forwarding request's headers, but lets the caller override allow-listed headers via the event body, so a server-side SDK can forward the original device's headers. Allow-list: accept, accept-language, accept-encoding, content-type, user-agent, referer, dnt, sec-fetch-*, sec-ch-ua*.
  • cookie / authorization are stripped and the write key is masked before headers reach context (which is forwarded to destinations). Keys are lower-cased. The internal IngestMessage.HttpHeaders (full set) is unchanged.

Types

  • AnalyticsContext.headers?: Record<string, string> (Jitsu extension — Segment's spec has no raw-headers field).
  • Optional RuntimeFacade.headers() so a Node integration can supply the original device's headers; @jitsu/js wires it into the built context (no-op in the browser).

Notes for bot detection

The sec-fetch-* / sec-ch-ua* set is the strongest tell — raw HTTP clients (curl, python-requests, most non-browser agents) don't send them; only real/headless browsers do.

🤖 Generated with Claude Code

Add context.headers to the event so destinations can see the raw HTTP
headers (accept, content-type, sec-fetch-*, sec-ch-ua*, ...) and tell
real browser traffic from bots/agents.

- Browser endpoint derives context.headers from the request only; the
  body can't redefine them (a browser can't read its own headers anyway).
- S2S endpoint captures the forwarding request's headers but lets the
  caller override allow-listed headers via the body to forward the
  original device's headers.
- cookie/authorization are stripped and the write key is masked, so
  secrets don't leak to destinations.

Types: add AnalyticsContext.headers and an optional RuntimeFacade.headers()
so a Node integration can supply the original headers; jitsu-js wires it
into the built context (no-op in the browser).
@vklimontovich vklimontovich requested a review from absorbb June 3, 2026 21:07

@jitsu-code-review jitsu-code-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the changes in bulker/ingest/router.go, libs/jitsu-js/src/analytics-plugin.ts, and types/protocols/analytics.d.ts.

The overall direction makes sense (capturing request headers into context.headers and masking sensitive values), but I found one correctness/security edge case in the Go implementation and left an inline comment with details.

Comment thread bulker/ingest/router.go Outdated
@vklimontovich vklimontovich changed the title feat(ingest): capture request headers into event context JITSU-74 feat(ingest): capture request headers into event context Jun 29, 2026
jitsu-code-review[bot]
jitsu-code-review Bot previously approved these changes Jun 29, 2026

@jitsu-code-review jitsu-code-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the changes in bulker/ingest/router.go, libs/jitsu-js/src/analytics-plugin.ts, and types/protocols/analytics.d.ts, focusing on correctness and security implications of the new context.headers flow. I found two issues worth addressing: (1) context.headers sanitization can be bypassed for __sql_type* keys because headers are stored as a plain map, and (2) the current sensitive-header filter is too narrow and may leak credentials carried in custom auth headers.

Comment thread bulker/ingest/router.go Outdated
Comment thread bulker/ingest/router.go Outdated
Comment thread bulker/ingest/router.go Outdated
// browser clients cannot read their own request headers and must not be able to
// spoof them: always derive context.headers from the actual request, ignoring
// whatever the body provided.
ctx.Set("headers", buildContextHeaders(c, nil))

@absorbb absorbb Jun 30, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must find a way to tell destination to put this object as JSON or as String with marshaled JSON.
__sql_type hint cannot be used - we must know specific datawarehouse in advance

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 3b2a95a using bulker's existing warehouse-agnostic JSON data type (types.JSON, enum value 6 in bulkerlib/types/datatype.go): the builtin bulker destination now always declares context_headers in the schema stream option, which (a) puts the key into notFlatteningKeys so the nested object is kept as a single value, and (b) maps to the native JSON type per warehouse via GetSQLType - jsonb (Postgres), SUPER (Redshift), JSON (BigQuery/MySQL/ClickHouse-with-JSON), text (Snowflake). No __sql_type hint and no per-warehouse knowledge needed. Required two bulker fixes: schema option now parses from the already-unmarshalled streamOptions header object, and schema-derived notFlatteningKeys are name-transformed so the match works on Snowflake. Integration-tested with a nested context.headers object on Postgres+MySQL in both batch and stream modes (TestSchemaOptionNestedObject).

absorbb and others added 2 commits July 3, 2026 15:01
Flip header capture from a cookie/authorization blocklist to an
allowlist of standard content-negotiation, fetch-metadata and
client-hint headers. Non-allow-listed headers keep their name (presence
is still a bot-detection signal) but the value is masked, so
credentials in custom auth headers (x-api-key, proxy-authorization,
vendor JWTs) can't reach destinations. __sql_type* header keys are
dropped entirely: context.headers is a plain map that bypasses
types.FilterEvent, so such a key would otherwise become a raw SQL type
hint (DDL injection) downstream.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Header names are client-controlled, so flattening context.headers would
create unbounded warehouse columns. The builtin bulker destination now
declares context_headers in the stream schema option with bulker's
warehouse-agnostic JSON data type (enum value 6), which both prevents
flattening and maps to the native JSON type per warehouse (jsonb /
SUPER / JSON / string). Applied to every data layout except
jitsu-legacy, which never carries context.headers.

Supporting bulker fixes:
- schema option ParseFunc accepts map[string]any - the streamOptions
  header is unmarshalled into a map before options are parsed, so a
  nested schema object previously failed with "invalid value type"
- notFlatteningKeys derived from the schema option are now
  name-transformed, matching the transformed key paths _mapForDwh
  compares (schema-driven non-flattening never matched on Snowflake)
- TypeFromValue recognizes ordered JSON objects (types.Object is
  *jsonorder.OrderedMap - reflect sees Ptr, not Map)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@jitsu-code-review jitsu-code-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the changes across ingest header capture/masking ( + tests), bulker schema/type handling (, , ), and destination stream option shaping ().\n\nI focused on correctness/security regressions around header spoofing and secret leakage, SQL type-hint bypasses, and schema flattening behavior for .\n\nNo new actionable correctness or security issues found in this diff.

jitsu-code-review[bot]
jitsu-code-review Bot previously approved these changes Jul 3, 2026

@jitsu-code-review jitsu-code-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the changes across ingest header capture/masking (router.go + tests), bulker schema/type handling (options.go, abstract.go, datatype.go), and destination stream option shaping (bulker-destination.ts).

I focused on correctness/security regressions around header spoofing and secret leakage, SQL type-hint bypasses, and schema flattening behavior for context.headers.

No new actionable correctness or security issues found in this diff.

… carries headers

Avoids eagerly creating the context_headers column (batch mode adds
schema columns to the table even without data) on connections whose
events never carry context.headers.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
jitsu-code-review[bot]
jitsu-code-review Bot previously approved these changes Jul 3, 2026

@jitsu-code-review jitsu-code-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the ingest header-capture changes, bulker schema/not-flattening updates, and destination-function streamOptions wiring. I found one potential runtime regression in withContextHeadersSchema and left it inline.

Comment thread libs/destination-functions/src/functions/bulker-destination.ts
The plain schema-option case was already covered per warehouse (see
json_test_snowflake in types_test.go) and worked: without toSameCase
raw field names match raw key paths. The broken combination was
WithSchema + WithToSameCase (what sync-sidecar sends when the sync's
sameCase option is on): notFlatteningKeys stayed raw while _mapForDwh
compares case-transformed paths, so declared nested JSON fields were
flattened. This test fails without the nameTransformer fix in
abstract.go and passes with it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
jitsu-code-review[bot]
jitsu-code-review Bot previously approved these changes Jul 3, 2026

@jitsu-code-review jitsu-code-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the changes in bulker ingest/header handling, bulker schema option parsing, SQL schema mapping behavior, and destination function stream option generation.

Findings:

  • Potential runtime error in withContextHeadersSchema: streamOptions.schema.fields is treated as an array without a type guard; malformed or legacy config shapes can throw before event delivery.

Comment thread libs/destination-functions/src/functions/bulker-destination.ts Outdated
…text fields

user-agent, referer and host are removed from context.headers when the
event already carries the exact same values in context.userAgent,
context.page.referrer and context.page.host respectively. Differing
values are kept - the mismatch itself is a bot signal.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
jitsu-code-review[bot]
jitsu-code-review Bot previously approved these changes Jul 3, 2026

@jitsu-code-review jitsu-code-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the Go ingest/bulker changes and destination-function schema wiring. I focused on the new context.headers handling, schema-option parsing, and streamOptions propagation. I found one runtime-safety issue in withContextHeadersSchema (inline comment).

Comment thread libs/destination-functions/src/functions/bulker-destination.ts Outdated
…chema.fields (review)

streamOptions is untyped user config: a non-array schema.fields would
make fields.some() throw, turning a config problem into endless
RetryError delivery retries. Leave a malformed schema untouched instead.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
jitsu-code-review[bot]
jitsu-code-review Bot previously approved these changes Jul 3, 2026

@jitsu-code-review jitsu-code-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the diff for ingest header capture/sanitization, bulker schema parsing + SQL flattening behavior, and destination streamOptions mutation. I found one correctness risk around preserving existing schema configs.

Comment thread libs/destination-functions/src/functions/bulker-destination.ts Outdated
…ma (review)

bulker accepts the schema stream option as a JSON string; spreading a
string schema would explode it into character keys and silently drop
the user's schema config. Leave any non-plain-object schema untouched.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@jitsu-code-review jitsu-code-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the changes in ingest header capture, bulker stream option/schema handling, SQL flattening behavior, and related TS/Go tests.

I did not find new actionable correctness or security regressions in this diff. The added tests around context headers, schema parsing, and nested-object handling cover the key risk areas and pass locally for the touched Go packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants