Skip to content

Provide ability to capture continuations using the Context API#11663

Open
mcculls wants to merge 11 commits into
masterfrom
mcculls/context-continuations
Open

Provide ability to capture continuations using the Context API#11663
mcculls wants to merge 11 commits into
masterfrom
mcculls/context-continuations

Conversation

@mcculls

@mcculls mcculls commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What Does This Do

Introduces basic continuations and events to the Context API

Motivation

This is meant as a stepping stone from the old API towards a simpler foundation

Additional Notes

Benchmarking summary from Claude:

  • The two managers are throughput-equivalent but show different allocation pressure
  • ThreadLocalContextManager: nested attach() calls add zero bytes because the ContextScopeImpl objects are stack-allocated by escape analysis.
  • ContinuableScopeManager: each ContinuableScope object adds ≈32 B because they escape to the heap via the ScopeStack linked structure.

Contributor Checklist

  • Format the title according to the contribution guidelines
  • Assign the type: and (comp: or inst:) labels in addition to any other useful labels
  • Avoid using close, fix, or any linking keywords when referencing an issue
    Use solves instead, and assign the PR milestone to the issue
  • Update the CODEOWNERS file on source file addition, migration, or deletion
  • Update public documentation with any new configuration flags or behaviors
  • Add your completed PR to the merge queue by commenting /merge. You can also:
    • Customize the commit message associated with the merge with /merge --commit-message "..."
    • Remove your PR from the merge queue with /merge -c
    • Skip all merge queue checks with /merge -f --reason "reason"; please use this judiciously, as some checks do not run at the PR-level (note: the PR still needs to be mergeable, this will only skip the pre-merge build)
    • Get more information in this doc

Jira ticket: [PROJ-IDENT]

@mcculls mcculls added comp: core Tracer core tag: no release notes Changes to exclude from release notes type: refactoring labels Jun 17, 2026
@mcculls mcculls requested a review from Copilot June 17, 2026 11:28
@datadog-prod-us1-3

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the datadog.context API to support capturing/resuming execution context via a new ContextContinuation abstraction, and wires dd-trace’s scope continuation mechanism to also implement that API (so Context.current().capture() can hold traces similarly to the legacy tracer continuation).

Changes:

  • Introduces ContextContinuation + ContextListener and adds Context.capture() / ContextManager.capture(...) / listener registration APIs.
  • Implements capture(...) for both the default ThreadLocalContextManager and dd-trace’s ContinuableScopeManager, and makes AgentScope.Continuation also implement ContextContinuation.
  • Adds extensive tests for continuation semantics, noop scopes, and listener lifecycle events.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal-api/src/main/java/datadog/trace/bootstrap/instrumentation/api/AgentScope.java Makes AgentScope.Continuation also implement ContextContinuation and maps methods (resumeactivate, releasecancel).
dd-trace-core/src/main/java/datadog/trace/core/scopemanager/ContinuableScopeManager.java Adds ContextManager.capture(Context) implementation and an addListener(...) stub.
dd-trace-core/src/test/java/datadog/trace/core/scopemanager/ScopeManagerTest.java Adds tests verifying Context API capture/continuation behavior holds/unblocks traces and activates spans.
components/context/src/main/java/datadog/context/Context.java Adds Context.capture() default method.
components/context/src/main/java/datadog/context/ContextManager.java Adds capture + listener APIs and a static register(ContextListener) helper.
components/context/src/main/java/datadog/context/ContextContinuation.java New public API for capturing/resuming context across execution units.
components/context/src/main/java/datadog/context/ContextListener.java New public listener API for attach/detach/capture/release events.
components/context/src/main/java/datadog/context/ThreadLocalContextManager.java Implements capture/resume/release semantics and listener notification for the default manager.
components/context/src/main/java/datadog/context/NoopContextScope.java New cached noop ContextScope implementation for same-context attach/resume no-ops.
components/context/src/main/java/datadog/context/EmptyContextContinuation.java New noop continuation for root/empty context.
components/context/src/main/java/datadog/context/EmptyContext.java Makes EmptyContext constructor private (enforces singleton).
components/context/src/main/java/datadog/context/TestContextManager.java Delegates new capture and addListener APIs to the underlying manager.
components/context/src/test/java/datadog/context/ContextContinuationTest.java New comprehensive tests for continuation lifecycle, concurrency, holds, and event ordering.
components/context/src/test/java/datadog/context/ContextManagerTest.java Adds tests for noop-scope caching/collisions and listener behaviors.
components/context/src/test/java/datadog/context/ContextProvidersForkedTest.java Updates custom manager stub to implement new APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dd-octo-sts

dd-octo-sts Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 13.93 s 13.95 s [-1.0%; +0.8%] (no difference)
startup:insecure-bank:tracing:Agent 12.96 s 13.00 s [-1.0%; +0.4%] (no difference)
startup:petclinic:appsec:Agent 16.90 s 16.49 s [+1.4%; +3.6%] (significantly worse)
startup:petclinic:iast:Agent 16.38 s 16.91 s [-7.6%; +1.3%] (no difference)
startup:petclinic:profiling:Agent 16.62 s 16.89 s [-2.5%; -0.6%] (maybe better)
startup:petclinic:sca:Agent 16.90 s 16.78 s [-0.4%; +1.8%] (no difference)
startup:petclinic:tracing:Agent 15.99 s 16.23 s [-2.6%; -0.4%] (maybe better)

Commit: f8d0935f · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

@PerfectSlayer PerfectSlayer self-requested a review June 17, 2026 11:55

@PerfectSlayer PerfectSlayer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early review of the API with a lot of refinement comments - trying to nail the API right before merging it. The implementation feels quite close to the current one so far. It's also nice to see the AgentSpan.Continuation nicely retrofitted to the new ContextContinuation.
Glad to see this piece of context coming! 🤝

Comment thread components/context/src/main/java/datadog/context/ContextContinuation.java Outdated
Comment thread components/context/src/main/java/datadog/context/Context.java Outdated
Comment thread components/context/src/main/java/datadog/context/ThreadLocalContextManager.java Outdated
Comment thread components/context/src/test/java/datadog/context/ContextManagerTest.java Outdated
Comment thread components/context/src/test/java/datadog/context/ContextManagerTest.java Outdated
@mcculls mcculls force-pushed the mcculls/context-continuations branch 2 times, most recently from 396e8a0 to c8de592 Compare June 18, 2026 09:08
@PerfectSlayer

PerfectSlayer commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

By the way, my comments only are early feedback. I don't expect to have them to get answer in timely manner.
Mostly reacting to you sharing your progress during our sync 🤝

@mcculls mcculls force-pushed the mcculls/context-continuations branch 6 times, most recently from 2754977 to d40fa11 Compare June 18, 2026 14:44
@mcculls mcculls marked this pull request as ready for review June 18, 2026 16:39
@mcculls mcculls requested review from a team as code owners June 18, 2026 16:39
@mcculls mcculls requested review from amarziali, dougqh, mhlidd and sarahchen6 and removed request for a team June 18, 2026 16:40

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d40fa11a82

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@dougqh dougqh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the work to move away from the old OpenTracing model towards OpenTelemetry.
I do think OpenTelemetry is genuinely a better model. ContextManagers never quite solved the hybrid async environments that automatic instrumentation encounters.

As we do the move, I do want to take care that we preserve the safety measures that were added previously.
Those were...
1 - stack depth limit
2 - unfinished continuations are tolerated
3 - scopes were allowed to be closed across threads
4 - out-of-order scope close was tolerated

In the past, I know that our instrumentations contributed to those problems, but I think we still need to be careful about these things because of outside / manual instrumentation. There have been issues in those areas as recently as last year.

Based on an initial review, it looks like we’ve lost some of these guards. I’m not sure that we need all of them, but I do think we need a depth limit and tolerance for unfinished continuations.

I'll need some time to review this more comprehensively, but I wanted to share that concern up front.

@amarziali amarziali left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me

@mcculls

mcculls commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

1 - stack depth limit

Previously we needed our own limit because scopes always escaped onto the heap via our separate scope stack. Now scopes naturally reside on the thread's own call stack. Given this I honestly believe we don't need the overhead of enforcing a separate stack limit and can rely on the JVM's own limits.

2 - unfinished continuations are tolerated

The new code uses the same basic continuation algorithm, so should tolerate the same scenarios.

3 - scopes were allowed to be closed across threads

This came with caveats - the first thing the old ContinuableScope.close does is fetch the current thread's scope stack. Once the scope is closed, cleanup is triggered on that scope stack - not the original thread's scope stack. This is because fundamentally only the original thread can perform cleanup of its own scope stack without having to introduce lots of extra locking.

The old code did update the scope's internal book-keeping regardless which thread closed it - but in practice this doesn't solve the issue of stale scopes left in the scope stack until the original thread happens to trigger cleanup. This is not an issue when we store scopes on the actual call stack, we naturally get cleanup of them as the call stack unwinds.

The new scopes have a minimum amount of state / book-keeping - basically just a flag to record when a close attempt was successful. This means the scope tolerates accidental extra calls to close it, firing only when the expected state matches.

A feature we now get which wasn't previously available is when we get to a known safe point, we can potentially 'swap' in (i.e. reset to) the expected context. This is possible because we don't have a separate scope stack which may contain a mix of stale and valid scopes. We just have the natural call stack and a notion of the currently attached context.

4 - out-of-order scope close was tolerated

We have tests that specifically cover out-of-order close. We do stop an out-of-order close from changing the currently attached context if it doesn't match what we expected, but that's intentional. We'll likely introduce a special mode to track out-of-order close calls back to the originating method/instrumentation like in OTel (such features are usually off by default for performance reasons.)

There are still situations where a misbehaving instrumentation could leave a context attached - this is the equivalent of an instrumentation leaving an unclosed scope on the stack in the old code, which wasn't recoverable. But now we have the ability to safely reset to the right context (at known safe points) which we don't have with the old approach.

If there is a leak then the next time that thread handles a request, a new context will attach and overwrite any stale context - from that point on (assuming the instrumentations for that request behave) the context will properly attach and detach. Only once the request unwinds completely will the stale context "re-appear". This also means that unlike the old approach we won't continually accumulate stale contexts as we don't have the separate scope stack. We'd only ever have potentially one stale context lingering per-thread, and that stale context would not impact any request handling on top of it.


Reminder that this code will no doubt evolve as we migrate away from the old approach. But fundamentally the old scope stack will be going away, as it limits how we can model more interesting async primitives. This also gives us a cleaner/simpler foundation to build on - it's not the destination, but a stepping stone.

@PerfectSlayer PerfectSlayer left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! 👍
Not much important feedback, mostly about testing and mostly polish about coverage and being explicit.
I like having the API interop tested (hopefully we wont have to maintain it long).

Comment thread components/context/src/main/java/datadog/context/EmptyContext.java
Comment thread components/context/src/main/java/datadog/context/EmptyContextContinuation.java Outdated
Comment thread components/context/src/main/java/datadog/context/NoopContextScope.java Outdated
Comment thread components/context/src/test/java/datadog/context/ContextManagerTest.java Outdated
Comment thread dd-trace-core/src/test/java/datadog/trace/core/scopemanager/ScopeManagerTest.java Outdated
@mcculls

mcculls commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@dougqh I'd like to merge this today unless there are obvious blockers

response from Doug (which somehow ended up edited into my comment?)

The unbounded data structure is a blocker for me, so I cannot approved.
I also think we should try to come with a better solution to continuation leaks as a team.

* });
* }</pre>
*
* <p>If a continuation is never resumed (e.g. a task is cancelled before it runs), you must release

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned by the requirement to explicitly release.
There are places where that's hard to do.

I also think it complicates handling of dispatch loops. For dispatch loops, I rather guard the loop with ignoring incoming context - rather than stopping it on the originating end.

I say that because dispatch loops are often set-up lazily with a varied set of construction points, so defending against all of them seems messier.

I think if we could come up with a system without this requirement; it would clean-up instrumentation considerably. Right now, I feel like we're taking a step backwards.

@mcculls mcculls Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH that decision is best left up to the consumer of the capture and release events.

The code here as it stands won't cause any resource leak per se (the continuation is not kept in any internal cache) - what it will do is delay calling the release event until the conditions have been met.

Now the tracer, as a consumer of that event, can decide to forgo waiting on the release event - if it determines that there is either no need to hold back the trace, or that continuing to wait will cause resource issues. It is better placed to make that decision because it knows how many traces are pending, etc.

This avoids overcomplicating the context layer while still allowing downstream consumers to decide how to handle long-running situations. The alternative would be to try and use some kind of weak reference, which would introduce other issues around liveliness.

@dougqh dougqh Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to complicate the context layer rather than complicating the numerous instrumentations. But mostly, I just wish we explored this as a group before the decision was made.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is about the consumer of the events - i.e. the tracer core - not instrumentations.

Instrumentations are what trigger these events, which is why I want to keep the context layer simple - rather than second guessing when to bypass the "rules" and send additional events to the tracer core.

@dougqh dougqh Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not proposing that the context layer second guess when to bypass the rules.

But what I'm saying is that a rule that assumes that the user code is well-behaved is problematic, I don't regard that as a bypass. For me, it is just about maintaining our requirement that we "do no harm" even when the user code is misbehaving.

So I'm just asking for the system to be resilient to misuse, that doesn't necessarily mean doing some fix-up -- just not allowing for runaway memory consumption.

Comment thread components/context/src/main/java/datadog/context/NoopContextScope.java Outdated

ContextListener[] ls = listeners;
notifyDetach(previous, ls);
holder.current = context;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this can grow in and unbounded fashion.
We need to add a guard against that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is guarded by the call stack size - also I would prefer not to add book-keeping until we verify it is needed because it a) can end up consuming resources more than protecting them and b) can lead to objects escaping from the stack which is worse then if we'd kept things simple.

@dougqh dougqh Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that presume that the end user uses the API correctly, I don't think that's a safe assumption.

We're really not suppose to have any unbounded data structures, so the Context chain is worrying from my perspective. The fact that is now Context and not Scope doesn't change the situation.

@mcculls mcculls Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context scopes are not collected in this layer because the scope stack has gone away - the only place they can accumulate is on the call stack. If a method erroneously calls activate multiple times and throws the scope away then there won't be a leak of scopes because there is no scope stack pinning them in the heap.

To leak context scopes you would need to call the API multiple times with different contexts, and keep the scopes alive in a way where they escape onto the heap. I'm just wary that by re-introducing state back into scopes (or any kind of "stack") that we'll go back to heap-allocating scopes which results in more allocations and more garbage to collect vs. the simple approach used here.

@dougqh dougqh Jun 23, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what you are saying is true for Scopes; that's correct. However, the Context linked list is itself unbounded. While misuse may be rare or hard, I still think we should guard against it.

And I'm not proposing putting state back into Scope or Continuation. Continuation need only be an immutable reference to Context.

I'm actually trying to move to a model with less state in the user visible objects by moving the linked list out of Context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no linked list in Context - Context is an immutable object with no links to other contexts

@Override
public void close() {
if (!closed) {
// check for out-of-order close to avoid corrupting the current state

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we don't have the same degree of forgiveness in the API, I'd at least like to see the strict mode restored so we can diagnose problems.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strict mode will be added in a separate PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the time frame? Before the next release?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strict mode will be added before ThreadLocalContextManager becomes the default implementation.

The default manager for the next release will still be ContinuableScopeManager. This PR is a migration step to get instrumentations away from AgentScope and related structures and towards a lighter API.

@mcculls

mcculls commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

@dougqh

The unbounded data structure is a blocker for me, so I cannot approved.
I also think we should try to come with a better solution to continuation leaks as a team.

This PR does not change the default implementation. It is extending the API to help move instrumentations away from the complex Agent API onto a slimmer more focused API. It does extend ThreadLocalContextManager to keep it in step with the API, but definitely does not replace ContinuableScopeManager - that will be done carefully and in a limited fashion once everything is in place and has been tested and benchmarked.

I also disagree that the data structure is unbounded, but that's best discussed off-line. Benchmarks so far show that the simpler approach actually reduces memory and overhead due to fewer objects escaping onto the heap.

Benchmarking shows these scopes aren't alive long enough to pay for the caching
@mcculls mcculls force-pushed the mcculls/context-continuations branch from db47043 to 1d73e24 Compare June 22, 2026 16:08
@dougqh

dougqh commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

For the most part, I think we should settle the design before discussing performance, but I did want to raise some concerns about the escape analysis claims.

Escape analysis in C2 is actually a very brittle optimization, so unfortunately, a microbenchmark doesn't prove much. I fully expect in real world usage the Scope will get allocated because the Scope creation and Scope closing will often sit around I/O work that the JIT will fail to inline fully through.

To really verify the optimization is happening, I'd want to see an allocation profile from a real application (for instance Spring PetClinic).

If we really want to avoid the allocation, I think a claim check system is likely a better solution. Rather than return an object, we return just a number. Then on close, we provide the same number again, so the stack can be verified.

And then we can wrap the claim check mechanism with a Scope object for public and legacy APIs.

@dougqh

dougqh commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

I think most the majority of the problems that I’ve raised can be solved with two key changes.
1 - Make Context immutable - move Context-stack to a Context[] rather than a linked list
2 - Add a NO_CONTEXT mechanism to ignore Context coming into dispatch loops

Make Context Immutable

By making Context immutable, we can solve the continuation leaking problems
The continuation simply becomes a wrapper around a Context that can migrate cleanly between threads
No need for explicit cancellation

But doing so requires that Context not contain a pointer to another Context
That can be resolved by storing the stack of Context-s another way

The simplest solution would be a Context[] in ContextHolder and an index into the Context[]
That also nicely prevents unbounded allocation

The primary downside is that it makes the ContextHolder more expensive to construct which is bad for virtual threads
That can be resolved by special casing ContextHolder when it only contains one (or maybe two) Contexts to avoid the Context[] creation

While that is a bit ugly, it is at least nicely contained in one place

Adding a NO_CONTEXT mechanism

From a survey of the “leak” PRs, the primary problem is context leaking into a dispatch loop
Made more complicated by the fact such dispatch loops are often set-up lazily

I think the cleanest way to that is introduce a NO_CONTEXT mechanism that ignores any surrounding Context already on the stack.

@mcculls mcculls force-pushed the mcculls/context-continuations branch from 8760c68 to df32a37 Compare June 22, 2026 23:48
@PerfectSlayer

Copy link
Copy Markdown
Contributor

Adding a NO_CONTEXT mechanism

From a survey of the “leak” PRs, the primary problem is context leaking into a dispatch loop
Made more complicated by the fact such dispatch loops are often set-up lazily

I think the cleanest way to that is introduce a NO_CONTEXT mechanism that ignores any surrounding Context already on the stack.

I would need to have an example of the dispatch loop code pattern to help me visualize.
There is always the option of applying root context that will void most of the context operations but I'm not sure it answer the concern.

But adding a mechanism might not fix wrong context API usage from customers because we won't be able to expose it.
If it's to prevent wrong usage from our instrumentations, @amarziali is building tool to detect and fix those cases.

This helps reduce allocations when attaching the same context multiple times.
@dougqh

dougqh commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@dougqh

The unbounded data structure is a blocker for me, so I cannot approved.
I also think we should try to come with a better solution to continuation leaks as a team.

This PR does not change the default implementation. It is extending the API to help move instrumentations away from the complex Agent API onto a slimmer more focused API. It does extend ThreadLocalContextManager to keep it in step with the API, but definitely does not replace ContinuableScopeManager - that will be done carefully and in a limited fashion once everything is in place and has been tested and benchmarked.

I also disagree that the data structure is unbounded, but that's best discussed off-line. Benchmarks so far show that the simpler approach actually reduces memory and overhead due to fewer objects escaping onto the heap.

As stated elsewhere, I agree that the Scopes are bounded, but the Contexts are not.

As for benchmarks, I would put API safety - avoiding the unhappy case ahead of maximizing performance in the happy case. I'd like to think we can find approach that can balance safety and performance.

@dougqh

dougqh commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Adding a NO_CONTEXT mechanism
From a survey of the “leak” PRs, the primary problem is context leaking into a dispatch loop
Made more complicated by the fact such dispatch loops are often set-up lazily
I think the cleanest way to that is introduce a NO_CONTEXT mechanism that ignores any surrounding Context already on the stack.

I would need to have an example of the dispatch loop code pattern to help me visualize. There is always the option of applying root context that will void most of the context operations but I'm not sure it answer the concern.

But adding a mechanism might not fix wrong context API usage from customers because we won't be able to expose it. If it's to prevent wrong usage from our instrumentations, @amarziali is building tool to detect and fix those cases.

@PerfectSlayer

By dispatch loop, I mean anything of the form...
while ( !interrupted ) {
var work = queue.take();
work.run();
}

This applies to servers, executors, message listener loops, etc.

The trouble with dispatch loops is that they are often set-up lazily and capture the surrounding context.
But really, they just care about the context coming from the incoming work item.

I had Claude do a sweep and this matches most (all) of the PRs where we're fixing context leaks currently.

As for API, you are quite right that it isn't something for customers per se.
I was just proposing it as an alternative to the "leak" fixes that we are currently undertaking.

@amarziali

Copy link
Copy Markdown
Contributor

As for API, you are quite right that it isn't something for customers per se. I was just proposing it as an alternative to the "leak" fixes that we are currently undertaking.

If I can add something, the leakage fix PR we're doing so far is to fill the technical debts we have with instrumentations that have not addressed that well in the past. But I think that also is not directly related to this PR in particular but a wider topic

@PerfectSlayer

Copy link
Copy Markdown
Contributor

The trouble with dispatch loops is that they are often set-up lazily and capture the surrounding context.
But really, they just care about the context coming from the incoming work item.

The way the new context api is supposed to work is by attaching context to those work item using Context.attachTo(work) (not relying on ThreadLocals). They are supposed to be retrieved before running using Context.from(work), and swapped using the new Context.swap(). Doing so, it won't inherit any current context from ThreadLocals.

the leakage fix PR we're doing so far is to fill the technical debts we have with instrumentations that have not addressed that well in the past.

That's the kind of issue we would have to deal with as part of rolling out the new API. Making sure the new API is correctly leverage across all the instrumentations. Hence the tool for leakage detection from test runs - note that requires having tests with decent coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: core Tracer core tag: no release notes Changes to exclude from release notes type: refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants