Port OTEP-4947 thread-context writer from polarsignals/custom-labels#347
Draft
szegedi wants to merge 15 commits into
Draft
Port OTEP-4947 thread-context writer from polarsignals/custom-labels#347szegedi wants to merge 15 commits into
szegedi wants to merge 15 commits into
Conversation
Ports the in-development OpenTelemetry thread-context writer that lives on the otel-thread-ctx-node branch of polarsignals/custom-labels (szegedi fork) into this project. The two codebases will likely diverge again later; for now this is a snapshot of the current state. Structurally: - bindings/otel-thread-ctx.cc/.hh: the native addon code, namespaced in `dd::` and exposed via OtelThreadCtx::Init(exports) called from binding.cc. The thread_local otel_thread_ctx_nodejs_v1 discovery symbol stays in extern "C" at file scope so it's exported by name through the dd_pprof.node dynsym table. - ts/src/otel-thread-ctx.ts: the runWithContext / enterWithContext / makeNamedContext API, loading the native addon via node-gyp-build like the rest of this project. - ts/test/test-otel-thread-ctx.ts: mocha port of the node:test suite. Skipped wholesale on non-Linux. - binding.gyp: adds bindings/otel-thread-ctx.cc to both target source lists and the -mtls-dialect=gnu2 cflag on x86_64 Linux (required by the OTEP-4947 spec; on arm64 TLSDESC is the only dynamic TLS model so no flag is needed). Verified by mocha against the built dd_pprof.node in a Linux container (Node 22 with --experimental-async-context-frame): 35 passing.
Mirrors the test:docker mechanism in custom-labels/js: a Dockerfile under scripts/docker/ extending node:24-bookworm with python3 and build-essential, plus a launcher script that builds the image (cached), mounts the repo read-only, copies it into /tmp/work inside the container, and runs `npm install && npm test`. The host tree is never modified (no stray node_modules/, build/, out/). Node 24 is used so the full test suite — including the new OTEP-4947 thread-context tests, which need AsyncContextFrame — runs without extra Node flags. Run via `npm run test:docker`.
This comment has been minimized.
This comment has been minimized.
Overall package sizeSelf size: 2.42 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | pprof-format | 2.2.2 | 500.53 kB | 500.53 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | node-gyp-build | 4.8.4 | 13.86 kB | 13.86 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
- Add a static_assert that offsetof(CtxWrap, record_) == sizeof(node::ObjectWrap), since that offset is part of the reader ABI. Restructure CtxWrap so record_, capacity_, and truncated_ live in a single public access section: C++ leaves cross-access-control field ordering implementation-defined, so splitting them would allow a conforming compiler to reorder the bookkeeping fields ahead of record_. - Add an acq_rel signal fence between the pointer swap and free() in the reallocate path. The pre-existing release fence only constrains prior writes; nothing was stopping the compiler from hoisting free() above the publication store, which would let a stopped reader follow self->record_ into freed memory. - Restore the [[unlikely]] annotation on the IsConstructCall() check. - Misc local cleanups (std::min/max in two spots, assert valid==1 after the memcpy instead of redundantly setting it).
The CtxWrap accessor that returns the raw record as a Uint8Array is only intended for tests and out-of-process-reader development. Naming it DebugBytes (and exposing it as wrap.debugBytes() on the JS prototype) makes that explicit at every call site.
v8::internal::Internals::kContinuationPreservedEmbedderDataOffset was introduced in Node 22. Older versions don't have ContinuationPreservedEmbedderData at all, and the TS layer already refuses to install the hook there via asyncContextFrameError, so StoreAls is never actually invoked on Node < 22 — we just need the addon to compile so the package installs.
Two more Node-version-sensitive spots blocking the prebuild against Node 18.0.0 headers: - ArrayBuffer::Data() wasn't exposed in V8 10.1 (Node 18.0). Switch to GetBackingStore()->Data(), which has been available since V8 7.4 / Node 12. The shared_ptr atomic is a per-call cost in CopyBytes (twice per CtxWrap::New) and DebugBytes — neither is a hot path. - kEmbedderDataSlotExternalPointerOffset is Node 22+. Same shape as the earlier kContinuationPreservedEmbedderDataOffset guard: publish a sentinel 0 on older Node so the addon's exported surface stays consistent across majors. A would-be reader can't reach a live record through it anyway (no CPED on Node < 22).
…removal) Node 26 ships V8 14.x, which removes the old String::Utf8Length / WriteUtf8 / NO_NULL_TERMINATION trio in favor of the V2 versions, and removes Object::GetIsolate() entirely. Switch the encode loop to the V2 forms on Node >= 24 (Node 22 ships V8 12.4 which never gets V2; Node 24's V8 13.6 has both, Node 26's V8 14.x has only V2). Replace exports->GetIsolate() with Isolate::GetCurrent() unconditionally — they're equivalent during module init and the latter is the only version that survives Node 26.
- Reformat ts/test/test-otel-thread-ctx.ts via gts (prettier). - Drop unused parameter declarations from the non-Linux stubs in ts/src/otel-thread-ctx.ts (they were carrying _-prefix names that gts's eslint still flags); TS allows fewer params on the assigned function than the declared variable's signature requires. - Use strict ==/!= equality instead of loose null-check. - Disable no-sparse-arrays in the test file: holes in attribute arrays are part of the wire format we're verifying. - Use `void` prefix on the runWithContext() call inside the sync test whose return type confuses no-floating-promises.
__attribute__((visibility("default"))) is a GCC/Clang extension that
MSVC doesn't recognize, breaking the Windows prebuild. Guard it
behind __GNUC__/__clang__. Visibility is irrelevant on Windows anyway
since the OTEP-4947 reader contract is ELF-TLSDESC and only meaningful
on Linux.
Two cases the prior unconditional describe didn't cover: - AsyncContextFrame (the writer's discovery substrate) is opt-in on Node 22/23 (requires --experimental-async-context-frame), on by default in Node 24+ (disable-able via --no-async-context-frame), and absent on Node < 22. The TS layer's asyncContextFrameError refuses to install the hook in each of those cases; the test now mirrors the same predicate so the suite is skipped instead of failing every test with "feature unavailable". - The discovery-contract test reads the addon binary from build/Release/dd_pprof.node, which exists only on the build-from-source path. The prebuild-install / node-gyp-build CI matrix uses a prebuilt binary from prebuilds/, so the path doesn't exist there. Skip that one test when the file isn't present.
Add an otelThreadCtx namespace alongside time/heap so consumers can
reach the writer (runWithContext, enterWithContext, clearContext,
appendAttributes, isContextTruncated, makeNamedContext) via
require('@datadog/pprof').otelThreadCtx without importing internal
paths. The debug-only _currentRecordBytes accessor stays unexposed.
Native CtxWrap.spanIdMatches(buf) memcmps the 8-byte argument against
record_->span_id and returns a boolean — no allocation on the JS side
when the caller passes a stable Uint8Array (typically cached on the
span object).
Exposed at the JS layer as:
- top-level currentSpanIdMatches(spanIdBytes), returns false outside
a context, on non-Linux platforms, and for arguments that aren't
8-byte Uint8Arrays.
- NamedContext.currentSpanIdMatches(spanIdBytes), a passthrough.
Motivation: dd-trace-js's storage:enter channel fires on every
async-context resume. Without a way to ask "is this span already
the active context?", the writer ends up allocating a fresh CtxWrap
on each enter, even when re-entering the same span — the same
allocation-churn pattern the wall profiler fixed in dd-trace-js#8638.
…elpers Reshape the otelThreadCtx namespace around an explicitly-allocated CtxWrap object so consumers can cache one record per span and re-install it without allocating churn: - The native CtxWrap class is now constructable from JS via `new pprof.otelThreadCtx.CtxWrap(traceId, spanId, attributes?)`. - New top-level functions get/set/runWithContext take a wrap reference (or undefined). `getContext() === wrap` becomes the cheap identity check that replaces the previous currentSpanIdMatches dance. - The opts-form helpers (`enterWithContext`, `appendAttributes`, `clearContext`, `isContextTruncated`, `currentSpanIdMatches`) are removed at the top level. Callers go through the wrap directly: `wrap.appendAttributes(...)`, `wrap.isTruncated()`, `setContext(undefined)`. - NamedContext stays as a name→index resolver: `buildContext(opts)` returns a CtxWrap with attributes resolved positionally; `runWithContext` / `enterWithContext` / `clearContext` are kept as one-liner sugar that compose with the new top-level functions. Native: the CtxWrap class is registered with its new name (was "OtelThreadCtxWrap") and the per-instance "append" method is now "appendAttributes" for parity with the JS-side phrasing. The SpanIdMatches binding is dropped. Reasoning: under AsyncContextFrame each fork inherits the wrap by reference, so once dd-trace-js caches one CtxWrap per span and re-installs it on every storage:enter, both the allocation churn we saw in the wall profiler (PR dd-trace-js#8638) and the "different wraps in different CPEDs for the same span" stale-record edge case go away — there's exactly one record per span across the whole lifetime, and mutations via wrap.appendAttributes propagate naturally because the native realloc-on-append path updates the wrap's record_ pointer in place, never the JS wrapper.
`CtxWrap` leaks the C++ ObjectWrap implementation detail — as far as
the JS API is concerned, the object IS the thread context. Rename:
- The JS-visible class name (SetClassName) and TS interface:
`CtxWrap` → `ThreadContext`.
- The TS constructor interface: `CtxWrapCtor` → `ThreadContextCtor`.
- The addon constructor export: `addon.ctxWrap` → `addon.threadContext`.
- The non-Linux fallback class: `NoopCtxWrap` → `NoopThreadContext`.
- Parameter/variable names previously named `wrap` (or `_wrap`) →
`context` (or `_context`).
The native C++ class itself stays `CtxWrap` — that's the conventional
ObjectWrap-ish naming on the C++ side and isn't user-visible.
Error messages updated to refer to "ThreadContext" too ("not a
ThreadContext", "ThreadContext must be called with `new`", etc.).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a Node.js writer for the OpenTelemetry Thread Local Context Record (OTEP-4947), ported from the in-development upstream at polarsignals/custom-labels#16.
What's in this branch
Native addon (
bindings/)otel-thread-ctx.cc+otel-thread-ctx.hh: the writer, namespaced asdd::OtelThreadCtx::Init(exports)and called frombinding.cc.otel_thread_ctx_nodejs_v1stays inextern "C"at file scope so it's exported by name through thedd_pprof.nodedynsym table. It's a 32-byte struct holding the four fields a reader needs:cped_slot(V8 isolate'sContinuationPreservedEmbedderDataslot pointer),als_handle(Global<Object>to the writer'sAsyncLocalStorage),als_identity_hash(JS identity hash for hash-bucket narrowing), andundefined_addr(cached per-isolate undefined singleton for clean "no context" detection by the reader).record_pointer in place, so every async-context frame that holds the same wrapper reference observes the new buffer (no per-CPED record divergence).binding.gypadds-mtls-dialect=gnu2on x86_64 Linux (required by OTEP-4947 for TLSDESC; on arm64 TLSDESC is the only dynamic TLS model so no flag is needed).TypeScript layer (
ts/src/otel-thread-ctx.ts)API surface (Linux only; no-op stubs elsewhere). The shape mirrors the wall profiler's
setContext/getContextpattern: callers construct oneThreadContextper logical scope (typically per trace span) and re-install the same JS reference on every async-context fire.class ThreadContext— constructable from JS vianew pprof.otelThreadCtx.ThreadContext(traceId, spanId, attributes?). Instance methods:appendAttributes(attrs)(append-only, mutates the record in place),isTruncated(), anddebugBytes()(debug-only). Lifetime is GC-managed: the underlying native record is freed when no JS or async-context-frame reference survives.getContext(): ThreadContext | undefined— returns whatever's attached to the active async-context frame.setContext(ctx)— installs (or detaches, withundefined) a context in the active async-context frame. Re-installing the same reference across many frames is cheap (no allocation); per-span caching of the context object is the intended usage.runWithContext(ctx, fn)— scoped variant:als.run-style.makeNamedContext(keys)— returns aNamedContextwithbuildContext(opts)(resolvesnamedAttributesagainst the captured key list and constructs aThreadContext) plus sugar methods (enterWithContext/runWithContext/clearContext) that composebuildContextwith the top-level functions. It also exposesprocessContextAttributes, a frozen snapshot of the OTEP-4719 entries the caller should publish.traceId/spanIdare rawUint8Array(16 and 8 bytes;Bufferworks as a subclass).attributesis positional: index N is the value for uint8 key index N on the wire; null/undefined/holes are skipped. Per-value cap of 255 UTF-8 bytes (uint8 length prefix), total attrs_data cap of 612 bytes (OTEP-recommended 640-byte total record minus 28-byte header).Process-context attributes
makeNamedContext(keys).processContextAttributesexposes a frozen snapshot ready to spread into whatever OTEP-4719 process-context publisher the application uses:Tests
Mocha suite under
ts/test/test-otel-thread-ctx.tscovering:ThreadContextconstruction and input parsing, on-the-wire record encoding (including multibyte UTF-8 truncation at the 255-byte per-value cap), the 612-byte cap andisTruncated, in-place append vs geometric realloc, async propagation,setContext/getContext/runWithContextlifecycle and idempotency,NamedContext.buildContextname resolution,processContextAttributesshape and immutability, and areadelf --dyn-symscheck that the TLS symbol is exported with the right binding / visibility / type. The whole describe block is skipped on non-Linux.A second commit adds a
scripts/docker/Dockerfile + launcher and atest:dockernpm script that builds the addon and runs the full test suite in a Linux container — useful for running the new tests from macOS dev machines. End-to-end verified at 158 passing (96 existing pprof tests + the new suite).