refactor(vtable_target): defer receiver resolution to dump time via SafeAccess#527
Conversation
…4618) Restore vtable-target frame capture in CPU-only recordings by populating _class_map from safe JVM-thread contexts: - Profiler::preregisterLoadedClasses(jvmtiEnv*) bulk-inserts via GetLoadedClasses + GetClassSignature + ObjectSampler::normalizeClassSignature. Called inside the existing exclusive _class_map_lock at recording start and dump so signal-handler readers see a populated map. - VM::ClassPrepare moved out-of-line into vmEntry.cpp; now also inserts newly prepared classes when StackWalkFeatures.vtable_target is set. - Hot signal-handler path in hotspotSupport.cpp is unchanged. New dictionary_concurrent_ut.cpp test pins the bounded_lookup(0) hit contract after bulk pre-registration. A new VtableTargetCpuTest is included @disabled with a TODO for the follow-up (vtable_target has no CLI toggle yet, and the synthetic BCI_ALLOC frame is recorded as a class id, not in the stack-trace string). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR restores vtable_target frame capture for CPU-only recordings by ensuring _class_map is populated from JVM-thread contexts (rather than relying on allocation/liveness paths that don’t run in CPU-only mode), keeping the signal-handler lookup path read-only and signal-safe.
Changes:
- Bulk pre-registers loaded classes into
_class_mapduringProfiler::start()andProfiler::dump()under the existing exclusive_class_map_lock. - Moves
VM::ClassPrepareout-of-line and extends it to insert newly prepared classes into_class_mapwhenvtable_targetis enabled. - Adds a C++ unit test covering the “bulk insert then bounded_lookup(…, 0) hits” contract and stages a (currently disabled) Java integration test.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| ddprof-test/src/test/java/com/datadoghq/profiler/cpu/VtableTargetCpuTest.java | Adds a staged (disabled) integration test outlining expected vtable-target behavior in CPU samples. |
| ddprof-lib/src/test/cpp/dictionary_concurrent_ut.cpp | Adds a unit test validating bulk insertion visibility to bounded_lookup(..., 0) and sentinel behavior on miss. |
| ddprof-lib/src/main/cpp/vmEntry.h | Moves VM::ClassPrepare definition out of the header into vmEntry.cpp. |
| ddprof-lib/src/main/cpp/vmEntry.cpp | Implements VM::ClassPrepare to insert normalized class signatures into the class map when vtable_target is enabled. |
| ddprof-lib/src/main/cpp/profiler.h | Adds Profiler::preregisterLoadedClasses(jvmtiEnv*) declaration and documents intended locking/threading constraints. |
| ddprof-lib/src/main/cpp/profiler.cpp | Calls preregisterLoadedClasses() during start/dump class-map resets and implements the bulk JVMTI enumeration/insertion logic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Log::warn on JVMTI error paths in preregisterLoadedClasses and ClassPrepare so silent skips are observable. - Drop dead-code `if (sig != nullptr)` guards after the `|| sig == nullptr` short-circuit (JVMTI spec leaves output undefined on error). - Remove disabled VtableTargetCpuTest.java — integration test deferred to a follow-up once vtable_target gets a CLI toggle and the synthetic-frame assertion shape is settled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI Test ResultsRun: #26456720745 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-05-26 15:23:33 UTC |
3a34b65 to
3115f92
Compare
b90761e to
76d919d
Compare
3115f92 to
6e825c7
Compare
76d919d to
2105b61
Compare
6e825c7 to
2dbee3f
Compare
6645120 to
2a9dd08
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2dbee3f5a1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
ddprof-lib/src/test/cpp/dictionary_ut.cpp:292
- The test "WritersDuringRotateProduceNoCorruption" doesn’t actually run writers during rotate: all writer threads are joined before the rotate/clearStandby loop starts. This means the test can’t catch races between concurrent lookup() calls and rotate()/clearStandby(); either rename the test to match its behavior or keep writers running while rotating to exercise the intended concurrency contract.
std::vector<std::thread> writers;
for (int i = 0; i < 4; i++) writers.emplace_back(writer, i);
for (auto& t : writers) t.join();
for (int cycle = 0; cycle < 20; cycle++) {
dict.rotate();
dict.clearStandby();
}
- ClassPrepare: blocking SharedLockGuard + classMap()->lookup() (was best-effort lookupClass) - preregisterLoadedClasses: filter to 'L'-type signatures; DeleteLocalRef each class - Test/header comments rewritten to match new lock protocol and L-type filter Addresses PR #527 review comments: - 3233432627 (Copilot, vmEntry.cpp): ClassPrepare no longer silently skips during dump - 3233432702 (Copilot, profiler.h/cpp): only reference types inserted into _class_map - 3233432734 (Copilot, dictionary_concurrent_ut.cpp): no false 'exclusive lock' claim - 3260404113 (Codex, profiler.cpp): JNI local refs released at every loop exit Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2dbee3f to
5a06eb7
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a06eb7e54
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. Another round soon, please! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…ck assert profiler.cpp: reword comment to state tryLockShared() proves no exclusive holder but cannot detect a shared-lock caller (self-deadlock in Phase 0/2); update assert message to only claim what the check can actually detect Co-Authored-By: muse <muse@noreply>
|
@github-copilot please review |
…tration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
I have addressed all your concerns. I am going to dismiss the review not to block - I really need this integrated ASAP. Please, feel free to look at the PR retroactively and if there are any concerns left, I will address them then.
This comment has been minimized.
This comment has been minimized.
- Drop testClassMapPopulatedOnCpuStart: dictionary_classes_keys is always 0 after Counters::reset() (line 1180) which runs after preregisterLoadedClasses, making the assertion permanently false. - Fix testVtableReceiverFrameInCpuSamples: cpu=10ms with 1M trivial iterations (r*r ~3ms) produced <1 sample. Switch to cpu=1ms and ThreadLocalRandom bodies to ensure ~300ms of CPU-bound work and force megamorphic vtable dispatch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kaahos
left a comment
There was a problem hiding this comment.
I have found some potential bugs. Particularly a counter-ordering bug due to preregisterLoadedClasses being called before Counters::reset(), which seems to have already been noticed in commit be0e50a6 but the test was apparently dropped.
- Use SharedLockGuard in Phase 2 of preregisterLoadedClasses - Clarify test comment: jvmtiError is JVMTI fallback for fake class_id method_id Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- flightRecorder.cpp: BCI_ALLOC in resolveMethod now uses class_id directly as the JFR class ref with method name <vtable_receiver> instead of passing a fake jmethodID to fillJavaMethodInfo - VtableTargetPreregistrationTest: assert Circle/Square/Triangle in stack trace next to vtable stub; add waitForProfilerReady and null guards - profiler.h: fix Phase 2 doccomment (shared lock, not exclusive) - profiler.cpp: Deallocate classes array when class_count==0 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
GeneratedConstructorAccessor, GeneratedMethodAccessor, and LambdaForm$ classes are normalised (digit suffix stripped) in fillJavaMethodInfo for regular frames; the BCI_ALLOC vtable-receiver path bypasses that normalisation, exposing un-normalised names that fail MetadataNormalisationTest. Skip these classes at the insertion point in hotspotSupport.cpp. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Instead of skipping generated accessor/lambda-form classes at the insertion point, apply the same prefix-based normalisation as fillJavaMethodInfo in the BCI_ALLOC case of resolveMethod (dump-time, cached per unique receiver class via method map): GeneratedConstructorAccessor1 -> GeneratedConstructorAccessor, LambdaForm$MH.xxx -> LambdaForm$MH, etc. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…shot - Lookup: add _class_cache (std::map<u32,const char*>) populated once via initClassCache() before writeStackTraces; writeClasses reuses it instead of a second collect - resolveMethod BCI_ALLOC: use _class_cache + same prefix normalisations as fillJavaMethodInfo (GeneratedXXXAccessor strips digit suffix, LambdaForm sub-types canonicalised) - hotspotSupport.cpp: revert skip filter (normalisation now in resolveMethod) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- arguments.cpp: document vtable_target overhead (JVMTI enum at start +
each dump; incremental ClassPrepare; signal path is lock-free)
- profiler.cpp: clarify debug assertion — detects exclusive hold, not
shared-holder re-entrancy; remove overclaiming comment
- profiler.h: fix Phase 2 comment causality (exclusive Phase 0 prevents
clear(), not CAS; CAS makes concurrent shared inserts safe)
- vmEntry.cpp: document _features/_state memory-ordering invariant in
ClassPrepare
- VtableTargetPreregistrationTest: add contains("<vtable_receiver>") to
assert the synthetic frame itself, not just the class name substring
- flightRecorder.cpp: two-phase collect design comments (linter-staged)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…afeAccess Signal handler stores raw VMSymbol* in a new BCI_VTABLE_RECEIVER frame; Lookup::resolveVTableReceiver reads the Symbol via SafeAccess at dump time, crash-safe under concurrent class unloading. Drops preregister + ClassPrepare hook; _class_map cleared unconditionally each dump. MethodMap gains a class_id key namespace so two VMSymbol* addresses for the same class collapse to one MethodInfo. SafeAccess::safeCopy added with aligned-load strategy so over-reads stay in-page. Buffer 1024 → 4096; failure sentinel "<unresolved_vtable_receiver>" instead of "". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JMC's STACK_TRACE_STRING HTML-escapes angle brackets in method names (same as it does for <init>/<clinit>), so the synthetic frame renders as "<vtable_receiver>". Match on the bare token so the assertion works on every JFR-consumer rendering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What does this PR do?:
Redesigns
vtable_targetreceiver-class resolution. Instead of eagerly pre-registering every loaded class into_class_mapat profiler start and onClassPrepare, the signal handler now captures the receiver's rawVMSymbol*in a newBCI_VTABLE_RECEIVERframe, and the Symbol is read at dump time viaSafeAccess— crash-safe against concurrent class unloading.Changes:
BCI_VTABLE_RECEIVER = -21(vmEntry.h):method_idis aVMSymbol*, not ajmethodID. Doc comment cross-referenced from the signal-handler write site (hotspotSupport.cpp) and the dump-time read site (flightRecorder.cpp).SafeAccess::safeCopyprimitive (safeAccess.{h,cpp}): bulk byte copy through the safefetch trampoline, with an alignment-aware strategy that keeps every 4-byte load inside a single page so over-reads pastsrc + lennever spuriously fault. Replaces a memcpy that would have crashed if the Symbol's page was unmapped between the readability check and the copy.Lookup::resolveVTableReceiver+resolveVTableReceiverCached(flightRecorder.{h,cpp}): SafeAccess-protected length + body read, printable-byte filter, synthetic-accessor/LambdaForm normalisation. Per-dumpVMSymbol* → class_idcache so distinct Symbol addresses for the same class collapse to one MethodInfo.MethodMap::makeVTableReceiverKey(u32)(flightRecorder.h): adds a fourth key namespace (11high-bit pattern,VTABLE_RECEIVER_MARK) so BCI_VTABLE_RECEIVER frames key by resolvedclass_idinstead of byVMSymbol*address.Profiler::preregisterLoadedClasses(~100 lines + 3 call sites + decl), theClassPrepareJVMTI hook block for vtable_target invmEntry.cpp,classMapTrySharedGuard(), the shared-lock try-dance inwalkVM. Conditional end-of-dump_class_map.clear()is restored to unconditional (wasif (!_features.vtable_target || ...)for the previous design's persistence requirement).VTABLE_RECEIVER_RESOLVE_FAILEDrecords resolution failures (Symbol page unmapped, length out of range, printable filter rejects).VtableTargetPreregistrationTest→VtableReceiverFrameTest; assertion message updated to describe the SafeAccess path.Motivation:
The prior preregistration design forced unbounded growth of
_class_map— every loaded class ever seen byGetLoadedClassesorClassPreparewas inserted permanently, with no removal path (clearing the map would require a JVMTI-safepoint re-enumeration on every dump). Long-running JVMs with synthetic-class generators (LambdaForm, reflection accessors, Groovy/Kotlin invokedynamic, CGLIB) accumulated millions of entries indefinitely.The new design bounds
_class_mapto "classes actually sampled this chunk" (cleared unconditionally at end-of-dump), removes the eager JVMTI enumeration and theClassPrepareinsert path, eliminates the shared/exclusive lock dance in the signal handler, and is robust to concurrent class unloading viaSafeAccess-based dump-time reads.Additional Notes:
safeFetch32returns sentinel → frame becomes<unresolved_vtable_receiver>,VTABLE_RECEIVER_RESOLVE_FAILEDticks. (2) memory reused for unrelated data → length / printable filter rejects → same as (1). (3) memory reused for another Symbol whose bytes pass the printable filter → wrong class name attached to the trace (silent miscorrelation). Cases (1)+(2) dominate; (3) is rare per-Symbol but non-zero in aggregate on class-unload-heavy workloads. Tracked as follow-up in PROF-14780 — class-unload breakpoint hook + Symbol-bytes shadow cache.CallTraceHashTable::calcHashmixes the raw bytes of every frame (includingmethod_id). Two samples of the same logical class whoseVMSymbol*address differs (class unload + reload within a chunk) produce distinct trace ids. Accepted: normalising at sample time would require an in-signal-handler Symbol read, which the redesign explicitly avoids. The dump-timeMethodMapkey isclass_id-based, so the synthetic<vtable_receiver>MethodInfodoes collapse across distinct Symbol addresses; onlyCallTracededup is affected.SafeAccess::safeCopycorrectness. Pages on supported platforms (x86_64 4 KiB; aarch64 Linux 4 KiB; aarch64 macOS 16 KiB) divide evenly by 4. A 4-byte-aligned 4-byte load therefore never crosses a page boundary, so over-reads beyondsrc + lenare guaranteed safe as long as the start address itself is in a mapped page. For misalignedsrc, the front fixup fetches at the previous aligned address (same 4-byte word, same page) and discards leadingk ∈ {1,2,3}bytes. Buffer is 4096 bytes (stack-allocated); names longer than that are recorded as resolve failures via the sentinel._class_mapis fine across chunks. JFR class IDs are per-chunk; clearing at end-of-dump is the correct steady state.BCI_ALLOCbranch inresolveMethoddeleted (it was the previous receiver path; no other producer remains), normalisation logic consolidated intoresolveVTableReceiver(no more duplicatedLambdaForm$MHtable). Help text inarguments.cppupdated to describe the new design.How to test the change?:
SafeFetchTest— 19/19 passes (9 pre-existing + 7 newsafeCopy_*cases covering happy path, zero-length, unmapped source,PROT_NONEsource, tail near unmapped boundary now succeeds (regression for the previous design), requested-range-crosses-unmapped still legitimately fails, unaligned source for allk ∈ {1,2,3}, real data equal to a single sentinel).VtableReceiverFrameTest(renamed) compiles cleanly.:ddprof-lib:assembleReleasesucceeds on darwin-aarch64.For Datadog employees:
If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.This PR doesn't touch any of that.
JIRA: PROF-14618
Follow-up: PROF-14780 — class-unload breakpoint hook for mid-chunk unload preservation