Skip to content

Revert "Ignore capturing connection continuation for armeria (#11657)"#11685

Draft
r1viollet wants to merge 1 commit into
masterfrom
r1viollet/revert-armeria-11657
Draft

Revert "Ignore capturing connection continuation for armeria (#11657)"#11685
r1viollet wants to merge 1 commit into
masterfrom
r1viollet/revert-armeria-11657

Conversation

@r1viollet

@r1viollet r1viollet commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

What Does This Do

Reverts #11657 (0e13e90dac).

Motivation

0e13e90dac was bisected as the first commit that introduces a SIGSEGV / JVM-shutdown hang in test suites running with the profiler enabled (DataDog/profiling-backend CI). The failure is reproduced at ~17% per run.

Reproducer (linux/glibc, JDK 26-zulu, CI_JOB_NAME set):

DD_TRACE_JAVA_PATH=<agent.jar> ./gradlew --parallel --rerun-tasks \
    :prof-analyzer:test :prof-tester:test :prof-viz-java:test

Bisect trail (each commit built from source, tested 8 iterations except where noted):

Commit Date Result
v1.63.0 Jun 1
ee9a8df466 Jun 5 ✅ 8/8
dd22dbf41c Jun 12 ✅ 8/8
82ced127ea Jun 16 ✅ 8/8
553bc3edc2 Jun 17 ✅ 8/8
477a0ce839 Jun 19 09:03 24/24 (8 + 16 retest)
3ac2bc4a97 Jun 19 ✅ 8/8 (codenarc-only, skip)
0e13e90dac Jun 19 09:31 ❌ 1/8 crash
139a1ba3c7 (HEAD) Jun 19 ❌ 1/6 crash

Note on mechanism: the profiling-backend test suite does not use Armeria, so the new class matchers (HttpClientFactory, HttpChannelPool) and method-level advice will never match at runtime. The crash is therefore not caused by the Armeria instrumentation logic itself. The likely explanation is that registering two additional transformer.applyAdvice() calls changes agent initialisation timing or thread startup order enough to trigger an existing shutdown race in libjavaProfiler.so's thread-exit cleanup path (the native profiler has had a series of shutdown-time race fixes in recent weeks). The revert removes the trigger; a proper fix for the underlying native race is being tracked separately.

Additional Notes

DataDog/profiling-backend CI is temporarily protected by pinning dd-java-agent to Maven release 1.61.1 (DataDog/profiling-backend#8634) while this is resolved.

Jira ticket: [PROF-XXXX]

@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 2 Pipeline jobs failed

Check pull requests | Check pull requests   View in Datadog   GitHub Actions

DataDog/apm-reliability/dd-trace-java | linux-java-spring-petclinic-sca-load-parallel   View in Datadog   GitLab

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 23d65d6 | Docs | Datadog PR Page | Give us feedback!

@dd-octo-sts

dd-octo-sts Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 13.98 s 13.96 s [-0.5%; +0.7%] (no difference)
startup:insecure-bank:tracing:Agent 12.87 s 13.01 s [-1.8%; -0.3%] (maybe better)
startup:petclinic:appsec:Agent 16.96 s 16.91 s [-0.6%; +1.2%] (no difference)
startup:petclinic:iast:Agent 16.92 s 16.99 s [-1.3%; +0.5%] (no difference)
startup:petclinic:profiling:Agent 16.84 s 16.98 s [-1.9%; +0.3%] (no difference)
startup:petclinic:sca:Agent 16.84 s 16.73 s [-0.3%; +1.5%] (no difference)
startup:petclinic:tracing:Agent 15.99 s 16.10 s [-1.7%; +0.2%] (no difference)

Commit: 23d65d61 · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

@r1viollet r1viollet marked this pull request as ready for review June 22, 2026 07:51
@r1viollet r1viollet requested review from a team as code owners June 22, 2026 07:51
@r1viollet r1viollet requested review from bric3 and jordan-wong and removed request for a team June 22, 2026 07:51
@dd-octo-sts

dd-octo-sts Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

  • Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 23d65d61b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"io.grpc.netty.shaded.io.netty.util.concurrent.GlobalEventExecutor",
"com.linecorp.armeria.client.HttpClientFactory",
"com.linecorp.armeria.client.HttpChannelPool"
"io.grpc.netty.shaded.io.netty.util.concurrent.GlobalEventExecutor"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve the Armeria async-propagation guard

By ending knownMatchingTypes() before Armeria's HttpClientFactory/HttpChannelPool, this instrumentation no longer disables async propagation while Armeria creates its pool/connect tasks. When the first Armeria gRPC request is made under an active parent span, executor instrumentation can capture that parent into connection setup with no corresponding consumer, keeping the trace pending until the monitor flushes and allowing the strict gRPC tests to fail; the new test overrides only hide that regression. Please keep an Armeria-specific guard that avoids the shutdown crash without re-enabling these leaked continuations.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tag: do not merge Do not merge changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants