Skip to content

feat(aws): cross-invocation tracecontext propagation#8182

Merged
BridgeAR merged 36 commits into
joey/apm-ai-toolkit/aws-durable-execution-sdk-jsfrom
joey/cross-invocation-tracecontext-propagation
Jun 15, 2026
Merged

feat(aws): cross-invocation tracecontext propagation#8182
BridgeAR merged 36 commits into
joey/apm-ai-toolkit/aws-durable-execution-sdk-jsfrom
joey/cross-invocation-tracecontext-propagation

Conversation

@joeyzhao2018

@joeyzhao2018 joeyzhao2018 commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Summary

https://datadoghq.atlassian.net/browse/APMSVLS-493

Adds cross-invocation trace-context continuity for the @aws/durable-execution-sdk-js integration. Each invocation of a durable execution now writes datadog{N} checkpoints on suspend when the trace context updates, so subsequent invocations of the same execution can extract the trace context from checkpoints and attach to a common anchor span. NOTE : The extraction part of these checkpoints is in DataDog/datadog-lambda-js#774

Motivation

A durable execution is a logically single workflow that the SDK transparently runs across N Lambda invocations (suspending on ctx.wait, ctx.waitForCallback, ctx.invoke, retries, etc.).
Before this PR, dd-trace produced one isolated trace per physical invocation. Customers couldn't see the workflow end-to-end in APM.

This PR makes those invocations show up under a single anchor, while leaving the per-invocation aws.durable.execute spans intact.

Without this, every resume of a suspended durable execution starts a fresh, unconnected trace.

Changes

New module: packages/datadog-plugin-aws-durable-execution-sdk-js/src/trace-checkpoint.js

  • saveTraceContextCheckpointIfUpdated(tracer, span, durableContext, firstExecutionSpanId, event) — writes a Datadog-format trace context (x-datadog-* headers) as a datadog{N} STEP
    operation via the SDK's checkpoint manager.
  • Always injects Datadog-style headers regardless of DD_TRACE_PROPAGATION_STYLE_INJECT, since the payload is read back by Datadog code only.
  • Idempotent within an execution: the stepId is a deterministic blake2b(name:executionArn) hash, so re-running the same handler won't write duplicate ops.
  • Skips the write when the current context is byte-identical to the most recent datadog{N} (ignoring x-datadog-parent-id, which always changes).
  • All checkpoints in a given execution carry the same anchor parent_id: the first save anchors at the aws.durable.execute span id; every subsequent save reuses the prior checkpoint's
    anchor verbatim.

_installTerminationCheckpointHook will:

  1. Wraps the user handler (args[5]) to capture DurableContextImpl.
  2. Wraps terminationManager.terminate so that on every PENDING reason it kicks off maybeSaveCheckpoint → saveCheckpoint (trace-checkpoint.js:126-155), which issues two
    checkpointManager.checkpoint(stepId, START) + checkpoint(stepId, SUCCEED) calls.

Tests

Screenshot 2026-05-20 at 3 01 26 PM Screenshot 2026-05-20 at 3 01 00 PM

Why commenting out some tests

All three tests race against a TimerScheduler bug in @aws/durable-execution-sdk-js that is fixed upstream in aws/aws-durable-execution-sdk-js#544. Once that fix is published in a release we pin against, the skips will be removed.

The race only manifests at the suspend → resume boundary when resume is driven externally (by sendCallbackSuccess() for wait_for_callback, or by the target function completing for chained invoke()). Timer-driven resumes (ctx.wait, ctx.waitForCondition, etc.) take a single, ordered code path through TimerScheduler and are unaffected.

This PR adds the cross-invocation trace-context checkpoint hook. On every PENDING termination, the hook calls back into the SDK via checkpointManager.checkpoint(stepId, START) + checkpoint(stepId, SUCCEED). That extra async work overlaps with the SDK's own pending-state bookkeeping at the same boundary where TimerScheduler is coordinating drain — which is precisely the state machine #544 cleans up.

Why production is unaffected

The TimerScheduler code path involved is only used by @aws/durable-execution-sdk-js-testing's LocalDurableTestRunner (simulated clock + in-process callback resolution). Real Lambda invocations don't drive resume through TimerScheduler — the resume of a suspended execution is a fresh invocation initiated by the Durable Execution service, not a continuation inside the same process. The checkpoint writes themselves complete normally; what races is the test harness's observation of the resumed invocation.

Other Notes

REPLAY -> NEW transitions

In dd-trace-py PR-17773, mark_trace_context_checkpoints_visited method was added to address a glitch caused by the datadog{N} steps we added. That issue doesn't exist for NodeJS.

Python SDK pattern:

  • Tracks visited ops as a set (state._visited_operations), keyed by op_id.
  • track_replay(op_id) adds to the set. REPLAY → NEW transitions only when every completed op in state.operations is in the
    visited set.
  • Our datadog* ops are completed-but-never-visited → permanently blocks the transition.
  • Hence the fix: pre-visit them at handler start.

Node.js SDK pattern (durable-context.ts:209):

private checkAndUpdateReplayMode(): void {
   if (this.durableExecutionMode === DurableExecutionMode.ReplayMode) {
     const nextStepId = this.getNextStepId();          // "1", "2", ... from _stepCounter
     const nextStepData = this.executionContext.getStepData(nextStepId);
     if (!nextStepData) { 
       this.durableExecutionMode = DurableExecutionMode.ExecutionMode;
     } 
   }
 } 
  • Replay exits per-call via a sequential step-id lookup (_stepCounter → "1", "2", ...), not a "all ops visited" check.
  • getStepData("N") is keyed by md5("N").slice(0, 16) (step-id-utils.ts:11).
  • Our datadog* ops use blake2b stepIds (trace-checkpoint.js:184) — they live in stepData under a totally separate hash namespace and aren't found by any getStepData("N") lookup. They don't appear in the sequential chain user code walks.
    So our synthetic ops are invisible to the SDK's replay-mode bookkeeping — they can't keep it stuck in REPLAY.

@pr-commenter

pr-commenter Bot commented May 1, 2026

Copy link
Copy Markdown

Benchmarks

Benchmark execution time: 2026-06-12 17:29:22

Comparing candidate commit ff37f7a in PR branch joey/cross-invocation-tracecontext-propagation with baseline commit bf1d011 in branch joey/apm-ai-toolkit/aws-durable-execution-sdk-js.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1449 metrics, 18 unstable metrics.

@pablomartinezbernardo pablomartinezbernardo force-pushed the joey/apm-ai-toolkit/aws-durable-execution-sdk-js branch 2 times, most recently from c1c351e to 7756bba Compare May 11, 2026 10:21
@joeyzhao2018 joeyzhao2018 force-pushed the joey/cross-invocation-tracecontext-propagation branch from 3eec288 to 2df8f55 Compare May 13, 2026 21:18
@github-actions

github-actions Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Overall package size

Self size: 6.22 MB
Deduped: 7.26 MB
No deduping: 7.26 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.2 | 85.93 kB | 825.11 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | dc-polyfill | 0.1.11 | 25.74 kB | 25.74 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented May 13, 2026

Copy link
Copy Markdown

Tests

🎉 All green!

🧪 All tests passed
❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ff37f7a | Docs | Datadog PR Page | Give us feedback!

@joeyzhao2018 joeyzhao2018 force-pushed the joey/cross-invocation-tracecontext-propagation branch 2 times, most recently from da775cf to d2bb910 Compare May 13, 2026 22:01
@codecov

codecov Bot commented May 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 24.34783% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.42%. Comparing base (bf1d011) to head (2c75fe8).

Files with missing lines Patch % Lines
...s-durable-execution-sdk-js/src/trace-checkpoint.js 6.57% 71 Missing ⚠️
...plugin-aws-durable-execution-sdk-js/src/handler.js 38.09% 13 Missing ⚠️
...strumentations/src/aws-durable-execution-sdk-js.js 83.33% 3 Missing ⚠️
Additional details and impacted files
@@                                 Coverage Diff                                  @@
##           joey/apm-ai-toolkit/aws-durable-execution-sdk-js    #8182      +/-   ##
====================================================================================
- Coverage                                             92.71%   90.42%   -2.30%     
====================================================================================
  Files                                                   871      867       -4     
  Lines                                                 49780    49797      +17     
  Branches                                               9737     9768      +31     
====================================================================================
- Hits                                                  46153    45027    -1126     
- Misses                                                 3627     4770    +1143     
Flag Coverage Δ
aiguard-integration-active 41.38% <ø> (ø)
aiguard-integration-latest 41.39% <ø> (ø)
aiguard-integration-maintenance 41.47% <ø> (?)
aiguard-macos 33.32% <ø> (ø)
aiguard-ubuntu 33.48% <ø> (ø)
aiguard-windows 33.14% <ø> (ø)
apm-capabilities-tracing-macos 48.21% <9.37%> (-0.11%) ⬇️
apm-capabilities-tracing-ubuntu-active 48.21% <9.37%> (-0.11%) ⬇️
apm-capabilities-tracing-ubuntu-latest 48.21% <9.37%> (-0.11%) ⬇️
apm-capabilities-tracing-ubuntu-maintenance ?
apm-capabilities-tracing-ubuntu-oldest 48.44% <9.37%> (?)
apm-capabilities-tracing-windows 48.17% <9.37%> (-0.15%) ⬇️
apm-integrations-aerospike-18-gte.5.2.0 33.04% <ø> (ø)
apm-integrations-aerospike-20-gte.5.5.0 33.06% <ø> (ø)
apm-integrations-aerospike-22-gte.5.12.1 33.07% <ø> (ø)
apm-integrations-aerospike-22-gte.6.0.0 33.07% <ø> (ø)
apm-integrations-aerospike-eol- 32.97% <ø> (ø)
apm-integrations-child-process 33.99% <ø> (ø)
apm-integrations-confluentinc-kafka-javascript-18 ?
apm-integrations-confluentinc-kafka-javascript-20 ?
apm-integrations-confluentinc-kafka-javascript-24 40.00% <ø> (ø)
apm-integrations-couchbase-18 33.06% <ø> (ø)
apm-integrations-couchbase-eol 32.98% <ø> (-0.13%) ⬇️
apm-integrations-dns 32.99% <ø> (ø)
apm-integrations-elasticsearch 34.02% <ø> (ø)
apm-integrations-http-latest ?
apm-integrations-http-maintenance 41.32% <ø> (ø)
apm-integrations-http-oldest 41.25% <ø> (ø)
apm-integrations-http2 ?
apm-integrations-kafkajs-latest 40.09% <ø> (ø)
apm-integrations-kafkajs-oldest 40.18% <ø> (ø)
apm-integrations-net 33.69% <ø> (ø)
apm-integrations-next-11.1.4 36.84% <ø> (ø)
apm-integrations-next-12.3.7 36.84% <ø> (ø)
apm-integrations-next-13.0.0 29.05% <ø> (+0.03%) ⬆️
apm-integrations-next-13.2.0 29.01% <ø> (ø)
apm-integrations-next-13.5.11 29.15% <ø> (ø)
apm-integrations-next-14.0.0 ?
apm-integrations-next-14.2.35 29.08% <ø> (ø)
apm-integrations-next-14.2.6 29.12% <ø> (+0.03%) ⬆️
apm-integrations-next-14.2.7 29.08% <ø> (-0.04%) ⬇️
apm-integrations-next-15.0.0 29.08% <ø> (ø)
apm-integrations-next-15.4.0 29.15% <ø> (ø)
apm-integrations-next-latest 29.11% <ø> (-0.02%) ⬇️
apm-integrations-oracledb 33.90% <ø> (ø)
apm-integrations-prisma-18-gte.6.16.0.and.lt.7.0.0 ?
apm-integrations-prisma-latest-all 34.19% <ø> (ø)
apm-integrations-restify 35.10% <ø> (ø)
apm-integrations-sharedb 32.40% <ø> (ø)
apm-integrations-tedious ?
appsec-express 51.15% <ø> (-0.02%) ⬇️
appsec-fastify 47.90% <ø> (ø)
appsec-graphql 47.88% <ø> (-0.03%) ⬇️
appsec-integration-active 35.94% <27.77%> (-0.01%) ⬇️
appsec-integration-latest 35.94% <27.77%> (-0.01%) ⬇️
appsec-integration-maintenance 36.00% <27.77%> (-0.01%) ⬇️
appsec-integration-oldest 35.99% <27.77%> (-0.01%) ⬇️
appsec-kafka 40.34% <ø> (ø)
appsec-ldapjs 39.75% <ø> (ø)
appsec-lodash 39.77% <ø> (ø)
appsec-macos ?
appsec-mongodb-core 43.98% <ø> (ø)
appsec-mongoose ?
appsec-mysql 46.94% <ø> (-0.13%) ⬇️
appsec-next-latest-11.1.4 27.29% <ø> (ø)
appsec-next-latest-12.3.7 27.82% <ø> (ø)
appsec-next-latest-13.0.0 29.09% <ø> (ø)
appsec-next-latest-13.2.0 29.12% <ø> (ø)
appsec-next-latest-13.5.11 29.21% <ø> (ø)
appsec-next-latest-14.0.0 29.14% <ø> (ø)
appsec-next-latest-14.2.35 29.14% <ø> (ø)
appsec-next-latest-14.2.6 29.14% <ø> (ø)
appsec-next-latest-14.2.7 ?
appsec-next-latest-15.0.0 29.14% <ø> (ø)
appsec-next-latest-latest ?
appsec-next-oldest-11.1.4 27.34% <ø> (ø)
appsec-next-oldest-12.3.7 29.15% <ø> (ø)
appsec-next-oldest-13.0.0 29.15% <ø> (ø)
appsec-next-oldest-13.2.0 29.41% <ø> (-0.01%) ⬇️
appsec-next-oldest-13.5.11 29.52% <ø> (ø)
appsec-next-oldest-14.0.0 29.45% <ø> (ø)
appsec-next-oldest-14.2.6 29.45% <ø> (?)
appsec-next-oldest-14.2.7 29.45% <ø> (ø)
appsec-next-oldest-15.0.0 ?
appsec-next-oldest-latest 28.03% <ø> (ø)
appsec-node-serialize 39.08% <ø> (ø)
appsec-passport 42.66% <ø> (ø)
appsec-postgres 46.83% <ø> (ø)
appsec-sourcing 38.49% <ø> (ø)
appsec-stripe 40.42% <ø> (ø)
appsec-template 39.32% <ø> (ø)
appsec-ubuntu 57.18% <ø> (ø)
appsec-windows 56.90% <ø> (-0.02%) ⬇️
debugger-ubuntu-active 43.57% <ø> (ø)
debugger-ubuntu-latest ?
debugger-ubuntu-maintenance ?
debugger-ubuntu-oldest 44.00% <ø> (?)
instrumentations-instrumentation-ai 42.72% <ø> (ø)
instrumentations-instrumentation-aws-sdk 44.89% <ø> (ø)
instrumentations-instrumentation-bluebird 27.44% <ø> (ø)
instrumentations-instrumentation-body-parser 35.61% <ø> (ø)
instrumentations-instrumentation-child_process 33.37% <ø> (ø)
instrumentations-instrumentation-cookie-parser 29.34% <ø> (ø)
instrumentations-instrumentation-couchbase-18 46.02% <ø> (ø)
instrumentations-instrumentation-couchbase-eol ?
instrumentations-instrumentation-crypto 27.49% <ø> (ø)
instrumentations-instrumentation-express 29.54% <ø> (ø)
instrumentations-instrumentation-express-mongo-sanitize 29.45% <ø> (ø)
instrumentations-instrumentation-express-multi-version 41.47% <ø> (ø)
instrumentations-instrumentation-express-session 35.38% <ø> (ø)
instrumentations-instrumentation-fastify 47.86% <ø> (ø)
instrumentations-instrumentation-fetch 44.76% <ø> (ø)
instrumentations-instrumentation-fs 27.14% <ø> (ø)
instrumentations-instrumentation-generic-pool 27.05% <ø> (ø)
instrumentations-instrumentation-hono 28.65% <ø> (ø)
instrumentations-instrumentation-http 35.04% <ø> (ø)
instrumentations-instrumentation-http-client-options 37.56% <ø> (ø)
instrumentations-instrumentation-kafkajs 48.94% <ø> (ø)
instrumentations-instrumentation-knex 27.43% <ø> (ø)
instrumentations-instrumentation-light-my-request 35.24% <ø> (ø)
instrumentations-instrumentation-mongoose 28.54% <ø> (ø)
instrumentations-instrumentation-multer 35.28% <ø> (ø)
instrumentations-instrumentation-mysql2 33.34% <ø> (ø)
instrumentations-instrumentation-openai-aiguard 47.86% <ø> (ø)
instrumentations-instrumentation-otel-sdk-trace 25.31% <ø> (ø)
instrumentations-instrumentation-passport 39.15% <ø> (ø)
instrumentations-instrumentation-passport-http 38.85% <ø> (ø)
instrumentations-instrumentation-passport-local 39.30% <ø> (ø)
instrumentations-instrumentation-pg 33.12% <ø> (ø)
instrumentations-instrumentation-promise 27.38% <ø> (ø)
instrumentations-instrumentation-promise-js 27.39% <ø> (ø)
instrumentations-instrumentation-q 27.42% <ø> (ø)
instrumentations-instrumentation-router 43.07% <ø> (ø)
instrumentations-instrumentation-stripe 27.92% <ø> (ø)
instrumentations-instrumentation-url 27.32% <ø> (ø)
instrumentations-instrumentation-when 27.40% <ø> (?)
instrumentations-instrumentation-zlib 27.37% <ø> (ø)
instrumentations-integration-esbuild-0.16.12-active ?
instrumentations-integration-esbuild-0.16.12-latest ?
instrumentations-integration-esbuild-0.16.12-maintenance 18.43% <27.77%> (?)
instrumentations-integration-esbuild-0.16.12-oldest 18.43% <27.77%> (+<0.01%) ⬆️
instrumentations-integration-esbuild-latest-active 24.45% <24.34%> (-0.01%) ⬇️
instrumentations-integration-esbuild-latest-latest 24.45% <24.34%> (-0.01%) ⬇️
instrumentations-integration-esbuild-latest-maintenance ?
instrumentations-integration-esbuild-latest-oldest 18.43% <27.77%> (+<0.01%) ⬆️
llmobs-ai 34.79% <ø> (ø)
llmobs-anthropic 36.51% <ø> (?)
llmobs-bedrock 36.05% <ø> (ø)
llmobs-google-genai 35.56% <ø> (ø)
llmobs-langchain 34.42% <ø> (ø)
llmobs-openai-latest 38.93% <ø> (ø)
llmobs-openai-oldest 39.03% <ø> (+0.01%) ⬆️
llmobs-sdk-active 43.51% <ø> (ø)
llmobs-sdk-latest 43.51% <ø> (ø)
llmobs-sdk-maintenance 43.61% <ø> (ø)
llmobs-sdk-oldest 43.60% <ø> (ø)
llmobs-vertex-ai ?
master-coverage ?
openfeature-macos 37.37% <ø> (ø)
openfeature-ubuntu ?
openfeature-unit-active 50.10% <ø> (ø)
openfeature-unit-latest 50.10% <ø> (ø)
openfeature-unit-maintenance 50.46% <ø> (ø)
openfeature-unit-oldest 50.46% <ø> (ø)
openfeature-windows ?
platform-core 45.98% <ø> (ø)
platform-esbuild 46.62% <ø> (ø)
platform-instrumentations-misc ?
platform-integration-active 46.61% <ø> (ø)
platform-integration-latest ?
platform-integration-maintenance 46.75% <ø> (+0.04%) ⬆️
platform-integration-oldest 46.90% <ø> (?)
platform-shimmer 47.05% <ø> (ø)
platform-unit-guardrails 44.04% <ø> (ø)
platform-webpack 17.97% <27.77%> (+<0.01%) ⬆️
plugins-aws-durable-execution-sdk-js ?
plugins-axios 35.36% <ø> (ø)
plugins-azure-event-hubs 34.69% <ø> (ø)
plugins-azure-service-bus 35.17% <ø> (ø)
plugins-body-parser 36.36% <ø> (ø)
plugins-bullmq ?
plugins-cassandra 33.52% <ø> (ø)
plugins-cookie 40.62% <ø> (ø)
plugins-cookie-parser 40.45% <ø> (ø)
plugins-crypto 42.36% <ø> (ø)
plugins-dd-trace-api 33.22% <ø> (ø)
plugins-express-mongo-sanitize 40.51% <ø> (ø)
plugins-express-session 40.37% <ø> (ø)
plugins-fastify 37.69% <ø> (ø)
plugins-fetch 33.95% <ø> (ø)
plugins-fs ?
plugins-generic-pool 39.94% <ø> (ø)
plugins-google-cloud-pubsub ?
plugins-grpc 36.44% <ø> (ø)
plugins-handlebars 40.51% <ø> (ø)
plugins-hapi 35.51% <ø> (ø)
plugins-hono 35.85% <ø> (ø)
plugins-ioredis 34.07% <ø> (ø)
plugins-jest ?
plugins-knex 39.98% <ø> (ø)
plugins-langgraph 32.25% <ø> (ø)
plugins-ldapjs 38.96% <ø> (ø)
plugins-light-my-request 40.08% <ø> (ø)
plugins-limitd-client 27.74% <ø> (-0.07%) ⬇️
plugins-lodash 40.12% <ø> (ø)
plugins-mariadb 35.04% <ø> (ø)
plugins-memcached ?
plugins-microgateway-core 34.61% <ø> (ø)
plugins-modelcontextprotocol-sdk 32.20% <ø> (ø)
plugins-moleculer ?
plugins-mongodb 35.65% <ø> (ø)
plugins-mongodb-core 35.33% <ø> (ø)
plugins-mongoose 34.23% <ø> (-0.17%) ⬇️
plugins-multer 40.42% <ø> (ø)
plugins-mysql 34.52% <ø> (+0.14%) ⬆️
plugins-mysql2 34.80% <ø> (ø)
plugins-nats 36.24% <ø> (ø)
plugins-node-serialize 40.65% <ø> (?)
plugins-opensearch 33.46% <ø> (ø)
plugins-passport-http 40.24% <ø> (ø)
plugins-pino 29.68% <ø> (ø)
plugins-postgres 34.57% <ø> (ø)
plugins-process 42.36% <ø> (ø)
plugins-pug 40.62% <ø> (ø)
plugins-redis 34.13% <ø> (ø)
plugins-router 37.93% <ø> (ø)
plugins-sequelize 39.90% <ø> (ø)
plugins-test-and-upstream-amqp10 33.74% <ø> (ø)
plugins-test-and-upstream-amqplib 39.11% <ø> (+0.16%) ⬆️
plugins-test-and-upstream-apollo 34.69% <ø> (ø)
plugins-test-and-upstream-avsc 33.62% <ø> (ø)
plugins-test-and-upstream-bunyan 28.90% <ø> (ø)
plugins-test-and-upstream-connect 36.19% <ø> (ø)
plugins-test-and-upstream-graphql ?
plugins-test-and-upstream-koa ?
plugins-test-and-upstream-protobufjs 33.85% <ø> (ø)
plugins-test-and-upstream-rhea 39.07% <ø> (ø)
plugins-undici 34.58% <ø> (+0.12%) ⬆️
plugins-url 42.36% <ø> (ø)
plugins-valkey 33.70% <ø> (+<0.01%) ⬆️
plugins-vm 42.36% <ø> (ø)
plugins-winston 29.55% <ø> (ø)
plugins-ws 36.99% <ø> (ø)
profiling-macos ?
profiling-ubuntu 43.49% <ø> (ø)
profiling-windows 40.82% <ø> (ø)
serverless-aws-sdk-latest-aws-sdk 32.98% <ø> (ø)
serverless-aws-sdk-latest-bedrockruntime ?
serverless-aws-sdk-latest-client 36.34% <ø> (ø)
serverless-aws-sdk-latest-dynamodb ?
serverless-aws-sdk-latest-eventbridge 26.82% <ø> (?)
serverless-aws-sdk-latest-kinesis 36.95% <ø> (ø)
serverless-aws-sdk-latest-lambda 34.32% <ø> (ø)
serverless-aws-sdk-latest-s3 ?
serverless-aws-sdk-latest-serverless-peer-service 39.22% <ø> (ø)
serverless-aws-sdk-latest-sns 38.13% <ø> (ø)
serverless-aws-sdk-latest-sqs 37.62% <ø> (-0.06%) ⬇️
serverless-aws-sdk-latest-stepfunctions 32.89% <ø> (ø)
serverless-aws-sdk-latest-util ?
serverless-aws-sdk-oldest-aws-sdk 33.10% <ø> (ø)
serverless-aws-sdk-oldest-bedrockruntime 31.97% <ø> (ø)
serverless-aws-sdk-oldest-client ?
serverless-aws-sdk-oldest-dynamodb 33.90% <ø> (-0.06%) ⬇️
serverless-aws-sdk-oldest-eventbridge 26.91% <ø> (ø)
serverless-aws-sdk-oldest-kinesis 37.12% <ø> (ø)
serverless-aws-sdk-oldest-lambda 34.42% <ø> (ø)
serverless-aws-sdk-oldest-s3 ?
serverless-aws-sdk-oldest-serverless-peer-service 39.31% <ø> (ø)
serverless-aws-sdk-oldest-sns 38.23% <ø> (ø)
serverless-aws-sdk-oldest-sqs ?
serverless-aws-sdk-oldest-stepfunctions ?
serverless-aws-sdk-oldest-util 47.13% <ø> (ø)
serverless-azure-durable-functions 36.80% <ø> (+0.15%) ⬆️
serverless-azure-functions-eventhubs 38.28% <ø> (ø)
serverless-azure-functions-servicebus ?
serverless-lambda ?
test-optimization-cucumber-latest-7.0.0 49.98% <ø> (ø)
test-optimization-cucumber-latest-latest 52.66% <ø> (-0.07%) ⬇️
test-optimization-cucumber-oldest-7.0.0 50.06% <ø> (ø)
test-optimization-cypress-latest-12.0.0-commonJS 49.00% <ø> (-0.34%) ⬇️
test-optimization-cypress-latest-12.0.0-esm 48.82% <ø> (-0.55%) ⬇️
test-optimization-cypress-latest-14.5.4-commonJS 49.18% <ø> (ø)
test-optimization-cypress-latest-14.5.4-esm 48.72% <ø> (-0.44%) ⬇️
test-optimization-cypress-latest-latest-commonJS 49.61% <ø> (-0.07%) ⬇️
test-optimization-cypress-latest-latest-esm 49.64% <ø> (ø)
test-optimization-cypress-oldest-12.0.0-commonJS 48.91% <ø> (+0.13%) ⬆️
test-optimization-cypress-oldest-12.0.0-esm 48.03% <ø> (-1.42%) ⬇️
test-optimization-cypress-oldest-14.5.4-commonJS 48.91% <ø> (-0.36%) ⬇️
test-optimization-cypress-oldest-14.5.4-esm 49.20% <ø> (-0.04%) ⬇️
test-optimization-jest-latest-latest 55.43% <ø> (+1.86%) ⬆️
test-optimization-jest-latest-oldest 49.99% <ø> (-4.36%) ⬇️
test-optimization-jest-oldest-latest 50.01% <ø> (-4.83%) ⬇️
test-optimization-jest-oldest-oldest 52.14% <ø> (+2.09%) ⬆️
test-optimization-mocha-latest-latest 53.67% <ø> (ø)
test-optimization-mocha-latest-oldest 51.22% <ø> (?)
test-optimization-mocha-oldest-latest 53.80% <ø> (?)
test-optimization-mocha-oldest-oldest ?
test-optimization-playwright-latest-latest-playwright-active-test-span 44.22% <ø> (-0.05%) ⬇️
test-optimization-playwright-latest-latest-playwright-atr 42.89% <ø> (ø)
test-optimization-playwright-latest-latest-playwright-efd 43.31% <ø> (?)
test-optimization-playwright-latest-latest-playwright-final-status ?
test-optimization-playwright-latest-latest-playwright-impacted-tests ?
test-optimization-playwright-latest-latest-playwright-reporting 42.93% <ø> (ø)
test-optimization-playwright-latest-latest-playwright-test-management 44.54% <ø> (ø)
test-optimization-playwright-latest-oldest-playwright-active-test-span ?
test-optimization-playwright-latest-oldest-playwright-atr ?
test-optimization-playwright-latest-oldest-playwright-efd 43.24% <ø> (-0.03%) ⬇️
test-optimization-playwright-latest-oldest-playwright-final-status 43.31% <ø> (ø)
test-optimization-playwright-latest-oldest-playwright-impacted-tests 42.79% <ø> (?)
test-optimization-playwright-latest-oldest-playwright-reporting ?
test-optimization-playwright-latest-oldest-playwright-test-management 44.49% <ø> (ø)
test-optimization-playwright-oldest-latest-playwright-active-test-span 44.28% <ø> (?)
test-optimization-playwright-oldest-latest-playwright-atr 42.97% <ø> (ø)
test-optimization-playwright-oldest-latest-playwright-efd 43.39% <ø> (ø)
test-optimization-playwright-oldest-latest-playwright-final-status 43.48% <ø> (+0.02%) ⬆️
test-optimization-playwright-oldest-latest-playwright-impacted-tests 42.94% <ø> (ø)
test-optimization-playwright-oldest-latest-playwright-reporting 42.98% <ø> (ø)
test-optimization-playwright-oldest-latest-playwright-test-management 44.60% <ø> (-0.03%) ⬇️
test-optimization-playwright-oldest-oldest-playwright-active-test-span ?
test-optimization-playwright-oldest-oldest-playwright-atr 43.05% <ø> (ø)
test-optimization-playwright-oldest-oldest-playwright-efd ?
test-optimization-playwright-oldest-oldest-playwright-final-status 43.39% <ø> (-0.03%) ⬇️
test-optimization-playwright-oldest-oldest-playwright-impacted-tests 42.87% <ø> (ø)
test-optimization-playwright-oldest-oldest-playwright-reporting 42.80% <ø> (ø)
test-optimization-playwright-oldest-oldest-playwright-test-management 44.57% <ø> (+0.02%) ⬆️
test-optimization-selenium-latest ?
test-optimization-selenium-oldest 44.80% <ø> (ø)
test-optimization-testopt-active 48.32% <ø> (ø)
test-optimization-testopt-latest 48.32% <ø> (ø)
test-optimization-testopt-maintenance 48.31% <ø> (ø)
test-optimization-testopt-oldest 49.42% <ø> (ø)
test-optimization-vitest-latest 50.78% <ø> (+1.02%) ⬆️
test-optimization-vitest-oldest 47.99% <ø> (+0.63%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@joeyzhao2018 joeyzhao2018 force-pushed the joey/cross-invocation-tracecontext-propagation branch from c4eca48 to 73fdb80 Compare May 19, 2026 19:19
@pablomartinezbernardo pablomartinezbernardo force-pushed the joey/apm-ai-toolkit/aws-durable-execution-sdk-js branch from 411cd56 to 4180be5 Compare May 20, 2026 10:06
…s-invocation continuity

Persist the current trace context as a synthetic `_datadog_{N}` STEP operation
when the SDK suspends to PENDING, so subsequent invocations (read by the
upstream datadog-lambda-js wrapper) can resume the same trace.

Files:
- src/handler.js: install a hook on the SDK's terminationManager.terminate
  inside bindStart. Save fires only for resumable reasons (PENDING_TERMINATION_REASONS
  allow-list mirrors the SDK's TerminationReason enum entries that result in
  Status: PENDING). Gated by DD_DURABLE_CROSS_INVOCATION_TRACING_ENABLED
  (default on; opt out with 'false'/'0').
- src/trace-checkpoint.js: NEW. Datadog-only header inject (private
  TextMapPropagator with tracePropagationStyle.inject = ['datadog'], shadows
  the live tracer config), dedup against prior _datadog_N op via
  JSON.stringify-after-stripping-x-datadog-parent-id, deterministic blake2b
  stepId so the save is idempotent within an execution.
- test/handler.checkpoint.spec.js: unit tests for the termination hook
  (pending vs non-pending reasons, env-var gate, idempotency, default reason).
- test/trace-checkpoint.spec.js: unit tests for the save module
  (queue START+SUCCEED before terminating, dedup on parent-id-only changes).
- test/index.spec.js: integration coverage for SDK safe-paths
  (single cycle, child-context, step-suspend-step).
- packages/dd-trace/src/config/supported-configurations.json and
  generated-config-types.d.ts: register DD_DURABLE_CROSS_INVOCATION_TRACING_ENABLED.
…merScheduler bug

Skip wait_for_callback (happy path) and the entire invoke describe block
(happy + error). All three fail deterministically in CI under
@aws/durable-execution-sdk-js-testing's current TimerScheduler, whose
hasScheduledFunction() undercounts in-flight scheduled functions and
trips the test orchestrator's "Cannot return PENDING status with no
pending operations." validation. Production (real AWS backend) is not
affected — the validation is mock-only.

Fix is open upstream as aws/aws-durable-execution-sdk-js#544; re-enable
these tests once a release containing it is pinned in
packages/dd-trace/test/plugins/versions/package.json.
…led guard

The guard was defensive against a "same terminationManager passed to
bindStart twice" scenario that cannot happen in the SDK as it stands —
each Lambda invocation calls initializeExecutionContext, which constructs
a fresh `new TerminationManager()`, so warm starts share the wrapper
closure but not the terminationManager instance. Removing the Symbol +
the guard + the explicit "twice across invocations" unit test that only
covered a contrived re-entry.

Drive-by: fix four pre-existing space-before-function-paren lint errors
in the same file.
…cute span, not its parent

Drop the `getParentSpanId` helper and inline the read directly during
`state` initialization. While inlining, switch the anchor from the
execute span's *parent* (typically `aws.lambda`'s id) to the execute
span's *own* id (`span.context().toSpanId()`).

Why anchor at the execute span:
- It's a span this integration owns and just created, so always defined
  and never depends on what upstream context happened to be active when
  `bindStart` fired.
- Topology becomes "resumed invocations are continuations of the first
  execute" — matching the user-facing model of a single durable
  execution. The old shape made resumes look like sibling Lambda
  invocations under whatever upstream span happened to be there.
- In the no-upstream case the old code already fell through to the
  propagator default (= execute span's own id) via `if (parentId)` —
  so this just makes the behavior consistent across environments.

Rename for clarity:
- `saveTraceContextCheckpointIfUpdated`'s `checkpointAnchorSpanId`
  parameter -> `firstExecutionSpanId`. JSDoc spells out it's only
  consulted on the very first save; once a prior `_datadog_{N}` exists,
  the function reuses that checkpoint's `x-datadog-parent-id` verbatim.
- The local `latestParentId` (the value carried forward across saves)
  -> `anchoredSpanId`, reflecting that it IS the anchor we've been
  using since the first save.
- handler.js's `state.parentSpanId` -> `state.firstExecutionSpanId`.

Note: dd-trace-py's `_resolve_override_parent_id` currently anchors at
the execute span's parent (matching the old JS behavior). A follow-up
should bring Python in line with this change so both languages produce
the same trace shape.
@joeyzhao2018 joeyzhao2018 force-pushed the joey/cross-invocation-tracecontext-propagation branch from 034a8f8 to 748a826 Compare May 20, 2026 16:22
…gainst TimerScheduler bug"

This reverts commit 748a826.
@joeyzhao2018 joeyzhao2018 changed the title cross-invocation tracecontext propagation feat(aws): cross-invocation tracecontext propagation May 20, 2026
@joeyzhao2018 joeyzhao2018 marked this pull request as ready for review May 20, 2026 19:15
@joeyzhao2018 joeyzhao2018 requested review from a team as code owners May 20, 2026 19:15
@joeyzhao2018 joeyzhao2018 requested review from BridgeAR and crysmags and removed request for a team May 20, 2026 19:15

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f6e5f5fe4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/datadog-plugin-aws-durable-execution-sdk-js/src/trace-checkpoint.js Outdated
@DataDog DataDog deleted a comment from codecov Bot Jun 5, 2026
@joeyzhao2018 joeyzhao2018 requested a review from BridgeAR June 5, 2026 18:15
"default": "true"
}
],
"DD_DURABLE_CROSS_INVOCATION_TRACING_ENABLED": [

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this landed in other tracers yet? Don't we usually disable it by default in the first iteration so we can try to get customer feedback?

@joeyzhao2018 joeyzhao2018 Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question — and totally fair to want a feedback runway on anything new. I'd push back on default-off here though, for a few reasons:

  1. To be honest, the feature really only delivers its value when it's on by default. The whole point is connecting traces across suspended/resumed invocations so customers see one coherent trace instead of disconnected fragments. If it's opt-in, not only the overwhelming majority of customers will never discover the flag, but the durable-execution traces they get will look broken (like orphaned spans) by comparison.

  2. This is trace-context propagation, not a new product surface. The "ship opt-in first, gather feedback" pattern fits net-new features with novel behavior or perf/risk profiles. Cross-invocation propagation is in the same family as our SQS, SNS, Kinesis, and EventBridge context propagation — all of which are on by default and have been for years without flag-related customer issues.

  3. The safety concern is already covered. The implementation is strictly best-effort: every checkpoint write is wrapped so a failure logs and is swallowed — it can never break or fail a customer's durable execution. And the flag still exists as a kill switch, so anyone who does hit an issue (or just wants it off) can disable it immediately. So default-on doesn't remove the escape hatch; it just changes who has to take action, and for an opt-out safety feature that should be the rare exception, not everyone.

  4. Cross-tracer parity. This already landed default-on in dd-trace-py (#17773). Shipping Node default-off would diverge our defaults across languages, which creates an inconsistent customer experience and avoidable support/debugging confusion ("why does my Python durable trace connect but my Node one doesn't?").

  5. There's no clean path to flip opt-in → opt-out later. Turning a default-off flag on in a later release is itself a behavior change we'd have to risk-assess and communicate — so "default-off now, default-on later" isn't actually lower-risk, it just defers the same change and adds a migration step. We don't have an established convention for that flip, whereas default-on guarded by an opt-out flag is a well-trodden path.

We will definitely keep the kill switch prominent in the docs so the off-ramp is obvious. In fact, we will mention this in both public documentations and release notes and README.md files. But given it mirrors our existing propagation defaults, is best-effort by construction, and already matches Python, I think default-on is the right call here. Let me know if I'm missing context on the feedback process you had in mind though.

@BridgeAR BridgeAR left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to look in the comments before landing :)

Comment thread packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js Outdated
Comment thread packages/datadog-plugin-aws-durable-execution-sdk-js/src/trace-checkpoint.js Outdated
Comment thread packages/datadog-plugin-aws-durable-execution-sdk-js/src/trace-checkpoint.js Outdated
Comment on lines +208 to +210
await saveCheckpoint(checkpointManager, executionArn, newNumber, currentHeaders)
} catch (e) {
log.debug('Failed to save trace context checkpoint', e)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe if we write the code itself in a way that it is defensive, we do not need to wrap it in an additional try/catch :)

Wrapping methods in a try catch is something I would rather not do (and I believe we mostly do not do that in other spots).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — I dropped the method-level try/catch in saveTraceContextCheckpointIfUpdated and made the helpers propagate normally.
The one bit of handling I brought back is the .catch() at the call site in maybeSaveCheckpoint. I don't think we can make that one go away with defensive code: the failure mode isn't bad input we can guard against, it's the SDK's checkpointManager.checkpoint() rejecting at runtime. Since #onTerminate fires this as void maybeSaveCheckpoint(...), an unhandled rejection there would surface in the customer's Lambda. So the .catch() is just fire-and-forget hygiene at the boundary rather than a try/catch wrapping the logic — which I think matches what you were after.

Comment thread packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js Outdated
Comment thread packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js Outdated
Comment thread packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js Outdated
}

const originalTerminate = terminationManager.terminate
terminationManager.terminate = function (...terminateArgs) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure: who defined this terminate method / the terminationManager? Is the manager potentially defined by our users?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great question. It's definitely worth nailing down precisely.

The terminationManager isn't user-defined — it's owned entirely by the SDK.

  • The channel we hook wraps the SDK's internal runHandler(event, context, executionContext, durableExecutionMode, checkpointToken, handler) (index.js:4452)
    • args[2] is the SDK's executionContext
    • args[5] is the only customer-supplied argument (their handler).
      So customers only ever call withDurableExecution(handler) — they can't inject or define this object.

And it's also constructed fresh on every invocation: withDurableExecution returns async (event, context) => …, which calls initializeExecutionContext(...), which builds a new executionContext with terminationManager: new TerminationManager() (index.js:4436) each time. terminate is then invoked by the SDK's own CheckpointManager.executeTermination.
Because it's a fresh instance per invocation, we wrap exactly once and nothing accumulates or leaks across invocations. We also capture-and-delegate (originalTerminate.apply(...)), so we compose with any pre-existing wrap rather than replacing it - the only caveat is the generic one where a different non-delegating patcher on the same method could conflict, which is very unlikely on an SDK-internal object only runHandler ever sees.

The wrap itself is also defensive: we bail if terminationManager.terminate isn't a function (line 63), and we always call originalTerminate.apply(this, terminateArgs), so the SDK's termination behavior is preserved unchanged — we just enqueue a best-effort checkpoint first.
And this isn't just reasoned from reading the source: the trace-checkpoint propagation integration tests in index.spec.js drive the real @aws/durable-execution-sdk-js (the version pinned in our test matrix) through actual suspend/resume cycles — real TerminationManager, real CheckpointManager.executeTermination — and assert the datadog checkpoint is written. So this path is covered against the actual SDK, not a mock, which also guards against the wrapping assumptions drifting if the SDK changes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that seems fine! Just as above: I noticed this is happening in the plugin and wrapping should always happen inside of our instrumentations. Please move this there :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really important find. I basically overlooked it coming from the POC.

Comment thread packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js Outdated
}

const originalTerminate = terminationManager.terminate
terminationManager.terminate = function (...terminateArgs) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that seems fine! Just as above: I noticed this is happening in the plugin and wrapping should always happen inside of our instrumentations. Please move this there :)

joeyzhao2018 and others added 5 commits June 11, 2026 14:51
…joey/cross-invocation-tracecontext-propagation
… in bindStart

Extract #shouldInstallTerminationHook so bindStart decides whether to install
the cross-invocation checkpoint hook, instead of the hook self-gating with
early returns. The hook now assumes its preconditions and recomputes the
handler args, execute span, and termination manager it wraps.

Stub operationName() in the handler checkpoint spec: it reaches into the
tracer's nomenclature, which the bare tracer stub doesn't provide.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…js so that it's only done once instead of every invocation
Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>

@BridgeAR BridgeAR left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM! Only few small things are left that would be nice to clean up before landing

Comment thread packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js Outdated
Comment thread packages/datadog-plugin-aws-durable-execution-sdk-js/src/trace-checkpoint.js Outdated
joeyzhao2018 and others added 5 commits June 12, 2026 08:24
…er.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
…-checkpoint.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
…-checkpoint.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>

@BridgeAR BridgeAR left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the quick follow-ups

joeyzhao2018 and others added 2 commits June 12, 2026 12:42
…-checkpoint.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
@BridgeAR BridgeAR merged commit 1780848 into joey/apm-ai-toolkit/aws-durable-execution-sdk-js Jun 15, 2026
797 checks passed
@BridgeAR BridgeAR deleted the joey/cross-invocation-tracecontext-propagation branch June 15, 2026 16:03
pablomartinezbernardo added a commit that referenced this pull request Jun 16, 2026
* workflow(aws-durable-execution-sdk-js): install_package

* workflow(aws-durable-execution-sdk-js): generate_app

* workflow(aws-durable-execution-sdk-js): compile

* workflow(aws-durable-execution-sdk-js): test:att1:iter1:fixer

* workflow(aws-durable-execution-sdk-js): test:att1:iter2:fixer

* workflow(aws-durable-execution-sdk-js): feature_implement

* workflow(aws-durable-execution-sdk-js): get_lint_failures

* workflow(aws-durable-execution-sdk-js): lint_and_fix:att1:iter1:fix_lint_errors

* workflow(aws-durable-execution-sdk-js): review_cycle:att1:iter1:batch_fix

* remove the unnecessary dd-api-key

* clean up

# Conflicts:
#	index.d.ts

* yarn.lock changed...

* fixing yarn.lock

* remove the unintended finish() guard

* update span names

* use a fixed service name instead

* update resource names

* naming consistency

* small fix

* Python  PR parity

* Undo unnecessary changes

* Finish error spans on asyncEnd

* Simplify orchestrion file

* Class/file name changes

* Several simplifications and improvements

* Do not explicitly set component

* Remove includeReplayedTag

* Smaller simplifications

* Tests simplification

* chore: update supported-integrations

* More test simplification

* Add aws.durable.operation_id and aws.durable.operation_name

* Fix checks

* Linter

* Test simplifications

* More test improvements

* Lazy thenables + only close this integration's spans

* Code simplifications

* Fix rebase

* Mirror changes in v5

* Test waitForCondition happy path

* Comment improvements based on guidelines

* supress child context for WaitForCallback

* Increase tested version

* Address review comments

* Avoid patching on the plugin by creating a "settle" channel

* Do not skipTime to avoid interfering with tracer's timers

* Fix test flakiness

* Test durable-execution-sdk-js  only on node >=22

* Linter

* Rename context variables

* Lint

* Add esbuild bundling acceptance test

* ESM smoke tests

* Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>

* Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/checkpoint.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>

* Address various review comments

* Operation name

* feat(aws): cross-invocation tracecontext propagation (#8182)

Persist the current trace context as a synthetic `_datadog_{N}` STEP operation
when the SDK suspends to PENDING, so subsequent invocations (read by the
upstream datadog-lambda-js wrapper) can resume the same trace.

---------

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Pablo Martínez Bernardo <pablo.martinezbernardo@datadoghq.com>
Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>
Co-authored-by: pablomartinezbernardo <134320516+pablomartinezbernardo@users.noreply.github.com>
Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
dd-octo-sts Bot added a commit that referenced this pull request Jun 16, 2026
* workflow(aws-durable-execution-sdk-js): install_package

* workflow(aws-durable-execution-sdk-js): generate_app

* workflow(aws-durable-execution-sdk-js): compile

* workflow(aws-durable-execution-sdk-js): test:att1:iter1:fixer

* workflow(aws-durable-execution-sdk-js): test:att1:iter2:fixer

* workflow(aws-durable-execution-sdk-js): feature_implement

* workflow(aws-durable-execution-sdk-js): get_lint_failures

* workflow(aws-durable-execution-sdk-js): lint_and_fix:att1:iter1:fix_lint_errors

* workflow(aws-durable-execution-sdk-js): review_cycle:att1:iter1:batch_fix

* remove the unnecessary dd-api-key

* clean up

# Conflicts:
#	index.d.ts

* yarn.lock changed...

* fixing yarn.lock

* remove the unintended finish() guard

* update span names

* use a fixed service name instead

* update resource names

* naming consistency

* small fix

* Python  PR parity

* Undo unnecessary changes

* Finish error spans on asyncEnd

* Simplify orchestrion file

* Class/file name changes

* Several simplifications and improvements

* Do not explicitly set component

* Remove includeReplayedTag

* Smaller simplifications

* Tests simplification

* chore: update supported-integrations

* More test simplification

* Add aws.durable.operation_id and aws.durable.operation_name

* Fix checks

* Linter

* Test simplifications

* More test improvements

* Lazy thenables + only close this integration's spans

* Code simplifications

* Fix rebase

* Mirror changes in v5

* Test waitForCondition happy path

* Comment improvements based on guidelines

* supress child context for WaitForCallback

* Increase tested version

* Address review comments

* Avoid patching on the plugin by creating a "settle" channel

* Do not skipTime to avoid interfering with tracer's timers

* Fix test flakiness

* Test durable-execution-sdk-js  only on node >=22

* Linter

* Rename context variables

* Lint

* Add esbuild bundling acceptance test

* ESM smoke tests

* Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>

* Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/checkpoint.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>

* Address various review comments

* Operation name

* feat(aws): cross-invocation tracecontext propagation (#8182)

Persist the current trace context as a synthetic `_datadog_{N}` STEP operation
when the SDK suspends to PENDING, so subsequent invocations (read by the
upstream datadog-lambda-js wrapper) can resume the same trace.

---------

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Pablo Martínez Bernardo <pablo.martinezbernardo@datadoghq.com>
Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>
Co-authored-by: pablomartinezbernardo <134320516+pablomartinezbernardo@users.noreply.github.com>
Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
BridgeAR added a commit that referenced this pull request Jun 16, 2026
* workflow(aws-durable-execution-sdk-js): install_package

* workflow(aws-durable-execution-sdk-js): generate_app

* workflow(aws-durable-execution-sdk-js): compile

* workflow(aws-durable-execution-sdk-js): test:att1:iter1:fixer

* workflow(aws-durable-execution-sdk-js): test:att1:iter2:fixer

* workflow(aws-durable-execution-sdk-js): feature_implement

* workflow(aws-durable-execution-sdk-js): get_lint_failures

* workflow(aws-durable-execution-sdk-js): lint_and_fix:att1:iter1:fix_lint_errors

* workflow(aws-durable-execution-sdk-js): review_cycle:att1:iter1:batch_fix

* remove the unnecessary dd-api-key

* clean up

# Conflicts:
#	index.d.ts

* yarn.lock changed...

* fixing yarn.lock

* remove the unintended finish() guard

* update span names

* use a fixed service name instead

* update resource names

* naming consistency

* small fix

* Python  PR parity

* Undo unnecessary changes

* Finish error spans on asyncEnd

* Simplify orchestrion file

* Class/file name changes

* Several simplifications and improvements

* Do not explicitly set component

* Remove includeReplayedTag

* Smaller simplifications

* Tests simplification

* chore: update supported-integrations

* More test simplification

* Add aws.durable.operation_id and aws.durable.operation_name

* Fix checks

* Linter

* Test simplifications

* More test improvements

* Lazy thenables + only close this integration's spans

* Code simplifications

* Fix rebase

* Mirror changes in v5

* Test waitForCondition happy path

* Comment improvements based on guidelines

* supress child context for WaitForCallback

* Increase tested version

* Address review comments

* Avoid patching on the plugin by creating a "settle" channel

* Do not skipTime to avoid interfering with tracer's timers

* Fix test flakiness

* Test durable-execution-sdk-js  only on node >=22

* Linter

* Rename context variables

* Lint

* Add esbuild bundling acceptance test

* ESM smoke tests

* Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>

* Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/checkpoint.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>

* Address various review comments

* Operation name

* feat(aws): cross-invocation tracecontext propagation (#8182)

Persist the current trace context as a synthetic `_datadog_{N}` STEP operation
when the SDK suspends to PENDING, so subsequent invocations (read by the
upstream datadog-lambda-js wrapper) can resume the same trace.

---------

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Pablo Martínez Bernardo <pablo.martinezbernardo@datadoghq.com>
Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>
Co-authored-by: pablomartinezbernardo <134320516+pablomartinezbernardo@users.noreply.github.com>
Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
tlhunter pushed a commit that referenced this pull request Jun 18, 2026
* workflow(aws-durable-execution-sdk-js): install_package

* workflow(aws-durable-execution-sdk-js): generate_app

* workflow(aws-durable-execution-sdk-js): compile

* workflow(aws-durable-execution-sdk-js): test:att1:iter1:fixer

* workflow(aws-durable-execution-sdk-js): test:att1:iter2:fixer

* workflow(aws-durable-execution-sdk-js): feature_implement

* workflow(aws-durable-execution-sdk-js): get_lint_failures

* workflow(aws-durable-execution-sdk-js): lint_and_fix:att1:iter1:fix_lint_errors

* workflow(aws-durable-execution-sdk-js): review_cycle:att1:iter1:batch_fix

* remove the unnecessary dd-api-key

* clean up

# Conflicts:
#	index.d.ts

* yarn.lock changed...

* fixing yarn.lock

* remove the unintended finish() guard

* update span names

* use a fixed service name instead

* update resource names

* naming consistency

* small fix

* Python  PR parity

* Undo unnecessary changes

* Finish error spans on asyncEnd

* Simplify orchestrion file

* Class/file name changes

* Several simplifications and improvements

* Do not explicitly set component

* Remove includeReplayedTag

* Smaller simplifications

* Tests simplification

* chore: update supported-integrations

* More test simplification

* Add aws.durable.operation_id and aws.durable.operation_name

* Fix checks

* Linter

* Test simplifications

* More test improvements

* Lazy thenables + only close this integration's spans

* Code simplifications

* Fix rebase

* Mirror changes in v5

* Test waitForCondition happy path

* Comment improvements based on guidelines

* supress child context for WaitForCallback

* Increase tested version

* Address review comments

* Avoid patching on the plugin by creating a "settle" channel

* Do not skipTime to avoid interfering with tracer's timers

* Fix test flakiness

* Test durable-execution-sdk-js  only on node >=22

* Linter

* Rename context variables

* Lint

* Add esbuild bundling acceptance test

* ESM smoke tests

* Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/handler.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>

* Update packages/datadog-plugin-aws-durable-execution-sdk-js/src/checkpoint.js

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>

* Address various review comments

* Operation name

* feat(aws): cross-invocation tracecontext propagation (#8182)

Persist the current trace context as a synthetic `_datadog_{N}` STEP operation
when the SDK suspends to PENDING, so subsequent invocations (read by the
upstream datadog-lambda-js wrapper) can resume the same trace.

---------

Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Pablo Martínez Bernardo <pablo.martinezbernardo@datadoghq.com>
Co-authored-by: dd-octo-sts[bot] <200755185+dd-octo-sts[bot]@users.noreply.github.com>
Co-authored-by: pablomartinezbernardo <134320516+pablomartinezbernardo@users.noreply.github.com>
Co-authored-by: Ruben Bridgewater <ruben@bridgewater.de>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants