Skip to content

Add support for oneCCL and Synchronization events.#1335

Open
tsocha wants to merge 21 commits intopytorch:mainfrom
intel-staging:dev/tsocha/oneccl
Open

Add support for oneCCL and Synchronization events.#1335
tsocha wants to merge 21 commits intopytorch:mainfrom
intel-staging:dev/tsocha/oneccl

Conversation

@tsocha
Copy link
Copy Markdown

@tsocha tsocha commented Mar 30, 2026

  • Add communication and synchronization profiling support — new activity handlers for collective communication events and device synchronization events
  • new XPU_SYNC ActivityType
  • Add integration tests — test suite for the new communication and synchronization handlers with proper MSVC test discovery via gtest_add_tests
  • Harden error handling — log warnings instead of throwing when PTI_VIEW_COMMUNICATION is unsupported
  • Fix Windows/MSVC build issues — fix XpuptiProfilerTestLib linking, ExternalProject Debug/Release mismatch, general MSVC compilation fixes
  • correct XPUPTI_BUILD_FLAG logging in CMake

Commits marked with 🤖 are made with usage of gen AI.

tsocha and others added 15 commits March 30, 2026 13:40
ptiViewGetApiIdName requires valid _api_group and _api_id values.
Use PTI_API_GROUP_LEVELZERO (1) with zeEventHostSynchronize_id (84)
in test records to avoid PTI_ERROR_BAD_ARGUMENT.
gtest_discover_tests runs the test executable at build time to
enumerate tests. With the MSVC generator, the oneAPI DLLs are
not in PATH, causing STATUS_DLL_NOT_FOUND (0xc0000135).

gtest_add_tests parses the source at configure time instead,
avoiding the runtime dependency.
…pported

PTI does not support communication events on Windows. Replace
XPUPTI_CALL with LOG(WARNING) for PTI_VIEW_COMMUNICATION enable
and disable to allow the session to be constructed gracefully.
Separate PTI_VIEW_DEVICE_SYNCHRONIZATION from COLLECTIVE_COMM into its
own XPU_SYNC activity type, as device synchronization is unrelated to
collective communication.
Add FORCE:UNRESOLVED and WINDOWS_EXPORT_ALL_SYMBOLS for Windows DLL
builds, and use platform-aware import library path instead of hardcoded
.so extension.

With multi-config generators (Visual Studio), CMAKE_BUILD_TYPE is empty
at configure time. The inner ExternalProject (Ninja, single-config) was
receiving an empty build type and defaulting to Release, causing
_ITERATOR_DEBUG_LEVEL and RuntimeLibrary linker mismatches when building
Debug.

- Use custom CONFIGURE_COMMAND with $<CONFIG> generator expression to
  forward the active configuration to the inner Ninja build
- Simplify find_library PATH_SUFFIXES in compute/CMakeLists.txt to
  prioritize CMAKE_BUILD_TYPE-matching libraries
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Fix invalid JSON in synchronization metadata: use addMetadataQuoted for
  string (Type) and pointer (handle) fields
- Increment gpuOpCount for GPU barrier synchronization types
- Guard communication tests with PTI_VERSION_AT_LEAST(0, 17)
- Fix ExternalProject_Add for single/multi-config generators on Linux/Windows
Co-authored-by: Slawomir Siwek <slawomir.siwek@intel.com>
@meta-cla meta-cla bot added the cla signed label Mar 30, 2026
@tsocha
Copy link
Copy Markdown
Author

tsocha commented Mar 30, 2026

@EikanWang, @gujinghui please review.

PRIVATEUSE1_RUNTIME = 24, // host side privateUse1 runtime events
PRIVATEUSE1_DRIVER = 25, // host side privateUse1 driver events
PRIVATEUSE1_RUNTIME = 25, // host side privateUse1 runtime events
PRIVATEUSE1_DRIVER = 26, // host side privateUse1 driver events
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not change the existing values in the list. Let's add the new item before the ENUM_COUNT only.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I will fix it.

XPUPTI_CALL(ptiViewEnable(PTI_VIEW_COLLECTION_OVERHEAD));
break;

#if PTI_VERSION_AT_LEAST(0, 17)
Copy link
Copy Markdown

@gujinghui gujinghui Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we follow the similar design in #1174 , instead of introducing many #if here? @moksiuc is able to give more details.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you be more precise here?
This change is consistent with this function: example in line 233.

This PR have 7 hits with #if PTI_VERSION_AT_LEAST(0, 17):

  • 2x ptiViewEnable and ptiViewDisable. It's consistent within the function handling PTI versions.
  • 3x in event handlers .cpp and .h for XpuptiActivityProfilerSession::handleCommunicationActivity.
    I could move .cpp part to another file but it's a single function.
  • 2x in tests.

I think that I can't use @moksiuc design here.
Most of these #ifs are here because it's the new feature in PTI.

@moksiuc could you confirm that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance I don't see how new code could be extracted to separate files or classes except separate Handlers file.

tsocha added 4 commits April 9, 2026 11:22
The --export-dynamic linker flag is Linux-specific and causes
build failures on Windows. Guard it with a platform check to
only apply on non-Windows systems.

On Windows, symbol exporting is handled differently via DLL
exports and doesn't require this linker flag.
@tsocha tsocha requested a review from gujinghui April 9, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants