JITSU-74 feat(ingest): capture request headers into event context by vklimontovich · Pull Request #1343 · jitsucom/jitsu

vklimontovich · 2026-06-03T21:04:22Z

JITSU-74

What

Adds context.headers to ingested events so destinations can see the raw HTTP request headers (accept, content-type, sec-fetch-*, sec-ch-ua*, …) and distinguish real browser traffic from bots/agents. Today only context.userAgent is available.

Behavior

Browser endpoint — context.headers is derived only from the actual request; the body can't redefine them (a browser can't read its own request headers anyway, and shouldn't be able to spoof them).
S2S endpoint — captures the forwarding request's headers, but lets the caller override allow-listed headers via the event body, so a server-side SDK can forward the original device's headers. Allow-list: accept, accept-language, accept-encoding, content-type, user-agent, referer, dnt, sec-fetch-*, sec-ch-ua*.
cookie / authorization are stripped and the write key is masked before headers reach context (which is forwarded to destinations). Keys are lower-cased. The internal IngestMessage.HttpHeaders (full set) is unchanged.

Types

AnalyticsContext.headers?: Record<string, string> (Jitsu extension — Segment's spec has no raw-headers field).
Optional RuntimeFacade.headers() so a Node integration can supply the original device's headers; @jitsu/js wires it into the built context (no-op in the browser).

Notes for bot detection

The sec-fetch-* / sec-ch-ua* set is the strongest tell — raw HTTP clients (curl, python-requests, most non-browser agents) don't send them; only real/headless browsers do.

🤖 Generated with Claude Code

Add context.headers to the event so destinations can see the raw HTTP headers (accept, content-type, sec-fetch-*, sec-ch-ua*, ...) and tell real browser traffic from bots/agents. - Browser endpoint derives context.headers from the request only; the body can't redefine them (a browser can't read its own headers anyway). - S2S endpoint captures the forwarding request's headers but lets the caller override allow-listed headers via the body to forward the original device's headers. - cookie/authorization are stripped and the write key is masked, so secrets don't leak to destinations. Types: add AnalyticsContext.headers and an optional RuntimeFacade.headers() so a Node integration can supply the original headers; jitsu-js wires it into the built context (no-op in the browser).

jitsu-code-review

Reviewed the changes in bulker/ingest/router.go, libs/jitsu-js/src/analytics-plugin.ts, and types/protocols/analytics.d.ts.

The overall direction makes sense (capturing request headers into context.headers and masking sensitive values), but I found one correctness/security edge case in the Go implementation and left an inline comment with details.

jitsu-code-review

Reviewed the changes in bulker/ingest/router.go, libs/jitsu-js/src/analytics-plugin.ts, and types/protocols/analytics.d.ts, focusing on correctness and security implications of the new context.headers flow. I found two issues worth addressing: (1) context.headers sanitization can be bypassed for __sql_type* keys because headers are stored as a plain map, and (2) the current sensitive-header filter is too narrow and may leak credentials carried in custom auth headers.

absorbb · 2026-06-30T09:13:59Z

+		// browser clients cannot read their own request headers and must not be able to
+		// spoof them: always derive context.headers from the actual request, ignoring
+		// whatever the body provided.
+		ctx.Set("headers", buildContextHeaders(c, nil))


We must find a way to tell destination to put this object as JSON or as String with marshaled JSON.
__sql_type hint cannot be used - we must know specific datawarehouse in advance

Implemented in 3b2a95a using bulker's existing warehouse-agnostic JSON data type (types.JSON, enum value 6 in bulkerlib/types/datatype.go): the builtin bulker destination now always declares context_headers in the schema stream option, which (a) puts the key into notFlatteningKeys so the nested object is kept as a single value, and (b) maps to the native JSON type per warehouse via GetSQLType - jsonb (Postgres), SUPER (Redshift), JSON (BigQuery/MySQL/ClickHouse-with-JSON), text (Snowflake). No __sql_type hint and no per-warehouse knowledge needed. Required two bulker fixes: schema option now parses from the already-unmarshalled streamOptions header object, and schema-derived notFlatteningKeys are name-transformed so the match works on Snowflake. Integration-tested with a nested context.headers object on Postgres+MySQL in both batch and stream modes (TestSchemaOptionNestedObject).

Flip header capture from a cookie/authorization blocklist to an allowlist of standard content-negotiation, fetch-metadata and client-hint headers. Non-allow-listed headers keep their name (presence is still a bot-detection signal) but the value is masked, so credentials in custom auth headers (x-api-key, proxy-authorization, vendor JWTs) can't reach destinations. __sql_type* header keys are dropped entirely: context.headers is a plain map that bypasses types.FilterEvent, so such a key would otherwise become a raw SQL type hint (DDL injection) downstream. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Header names are client-controlled, so flattening context.headers would create unbounded warehouse columns. The builtin bulker destination now declares context_headers in the stream schema option with bulker's warehouse-agnostic JSON data type (enum value 6), which both prevents flattening and maps to the native JSON type per warehouse (jsonb / SUPER / JSON / string). Applied to every data layout except jitsu-legacy, which never carries context.headers. Supporting bulker fixes: - schema option ParseFunc accepts map[string]any - the streamOptions header is unmarshalled into a map before options are parsed, so a nested schema object previously failed with "invalid value type" - notFlatteningKeys derived from the schema option are now name-transformed, matching the transformed key paths _mapForDwh compares (schema-driven non-flattening never matched on Snowflake) - TypeFromValue recognizes ordered JSON objects (types.Object is *jsonorder.OrderedMap - reflect sees Ptr, not Map) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

jitsu-code-review

Reviewed the changes across ingest header capture/masking ( + tests), bulker schema/type handling (, , ), and destination stream option shaping ().\n\nI focused on correctness/security regressions around header spoofing and secret leakage, SQL type-hint bypasses, and schema flattening behavior for .\n\nNo new actionable correctness or security issues found in this diff.

jitsu-code-review

Reviewed the changes across ingest header capture/masking (router.go + tests), bulker schema/type handling (options.go, abstract.go, datatype.go), and destination stream option shaping (bulker-destination.ts).

I focused on correctness/security regressions around header spoofing and secret leakage, SQL type-hint bypasses, and schema flattening behavior for context.headers.

No new actionable correctness or security issues found in this diff.

… carries headers Avoids eagerly creating the context_headers column (batch mode adds schema columns to the table even without data) on connections whose events never carry context.headers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

jitsu-code-review

Reviewed the ingest header-capture changes, bulker schema/not-flattening updates, and destination-function streamOptions wiring. I found one potential runtime regression in withContextHeadersSchema and left it inline.

The plain schema-option case was already covered per warehouse (see json_test_snowflake in types_test.go) and worked: without toSameCase raw field names match raw key paths. The broken combination was WithSchema + WithToSameCase (what sync-sidecar sends when the sync's sameCase option is on): notFlatteningKeys stayed raw while _mapForDwh compares case-transformed paths, so declared nested JSON fields were flattened. This test fails without the nameTransformer fix in abstract.go and passes with it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

jitsu-code-review

Reviewed the changes in bulker ingest/header handling, bulker schema option parsing, SQL schema mapping behavior, and destination function stream option generation.

Findings:

Potential runtime error in withContextHeadersSchema: streamOptions.schema.fields is treated as an array without a type guard; malformed or legacy config shapes can throw before event delivery.

…text fields user-agent, referer and host are removed from context.headers when the event already carries the exact same values in context.userAgent, context.page.referrer and context.page.host respectively. Differing values are kept - the mismatch itself is a bot signal. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

jitsu-code-review

Reviewed the Go ingest/bulker changes and destination-function schema wiring. I focused on the new context.headers handling, schema-option parsing, and streamOptions propagation. I found one runtime-safety issue in withContextHeadersSchema (inline comment).

…chema.fields (review) streamOptions is untyped user config: a non-array schema.fields would make fields.some() throw, turning a config problem into endless RetryError delivery retries. Leave a malformed schema untouched instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

jitsu-code-review

Reviewed the diff for ingest header capture/sanitization, bulker schema parsing + SQL flattening behavior, and destination streamOptions mutation. I found one correctness risk around preserving existing schema configs.

…ma (review) bulker accepts the schema stream option as a JSON string; spreading a string schema would explode it into character keys and silently drop the user's schema config. Leave any non-plain-object schema untouched. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

jitsu-code-review

Reviewed the changes in ingest header capture, bulker stream option/schema handling, SQL flattening behavior, and related TS/Go tests.

I did not find new actionable correctness or security regressions in this diff. The added tests around context headers, schema parsing, and nested-object handling cover the key risk areas and pass locally for the touched Go packages.

vklimontovich requested a review from absorbb June 3, 2026 21:07

jitsu-code-review Bot approved these changes Jun 3, 2026

View reviewed changes

Comment thread bulker/ingest/router.go Outdated

vklimontovich changed the title ~~feat(ingest): capture request headers into event context~~ JITSU-74 feat(ingest): capture request headers into event context Jun 29, 2026

jitsu-code-review Bot previously approved these changes Jun 29, 2026

View reviewed changes

Comment thread bulker/ingest/router.go Outdated

Comment thread bulker/ingest/router.go Outdated

absorbb reviewed Jun 30, 2026

View reviewed changes

absorbb and others added 2 commits July 3, 2026 15:01

absorbb dismissed jitsu-code-review[bot]’s stale review via 3b2a95a July 3, 2026 11:01

jitsu-code-review Bot approved these changes Jul 3, 2026

View reviewed changes