Skip to content

Add CPU dependencies for HIP and RCCL#1527

Open
zerefwayne wants to merge 5 commits into
EESSI:mainfrom
zerefwayne:add-hip-gcc
Open

Add CPU dependencies for HIP and RCCL#1527
zerefwayne wants to merge 5 commits into
EESSI:mainfrom
zerefwayne:add-hip-gcc

Conversation

@zerefwayne

Copy link
Copy Markdown
Contributor

Add CPU dependencies for HIP and RCCL #1526

- fmt-11.2.0-GCCcore-14.2.0.eb
- gflags-2.2.2-GCCcore-14.2.0.eb
- glog-0.7.1-GCCcore-14.2.0.eb
- rocprofiler-register-0.4.0-GCCcore-14.2.0-ROCm-6.4.1.eb

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should be here or not. I think it is cpu code, right? If so, then yes. But then why does it have ROCm as a dependency again...?

I'm also wondering if this will actually pass. I don't remember how the build scripts determine if something is a GPU build or not - I think it's based on whether CUDA/ROCm are dependencies, and if yes, it assumes it's a GPU build. Not 100% sure though. Feel free to give it a try...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rocprofiler-register is based on ROCm-6.4.1, but we decided to build it using GCCcore, hence the versionsuffix. It doesn't have any dependency on ROCm-LLVM.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's also a question of where we think this should be, and if the current setup in the build scripts make sense. I think for the nvidia stuff we decided to make a simple split: if it has CUDA as dep, then it's accelerator software. I guess we can (and should) do the same for ROCm. But let's first see if this fails today or not :)

@zerefwayne zerefwayne Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build succeeded for Zen2, should we try building for all the archs? However, the deps with ROCm suffix don't need to be built for aarch64 😕

- fmt-11.2.0-GCCcore-14.2.0.eb
- gflags-2.2.2-GCCcore-14.2.0.eb
- glog-0.7.1-GCCcore-14.2.0.eb
- rocprofiler-register-0.4.0-GCCcore-14.2.0-ROCm-6.4.1.eb

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zerefwayne This one is not included yet in EasyBuild v5.3.0

Two options:

I'm also not sure if this is really CPU-only...
It doesn't have any ROCM dependencies, but it is built from a rocm_* source tarbal...

@casparvl Thoughts?

@zerefwayne

Copy link
Copy Markdown
Contributor Author

Test

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2

@eessi-bot-aws

eessi-bot-aws Bot commented Jun 18, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen2
Building for: x86_64/amd/zen2
Job dir: /project/def-users/SHARED/jobs/2026.06/pr_1527/168047

date job status comment
Jun 18 14:47:47 UTC 2026 submitted job id 168047 awaits release by job manager
Jun 18 14:48:27 UTC 2026 released job awaits launch by Slurm scheduler
Jun 18 14:49:35 UTC 2026 running job 168047 is running
Jun 18 14:53:43 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-168047.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen2-17817941560.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen2/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen2/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen2/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen2
no other files in tarball
Jun 18 14:53:43 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/5) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/22Jul2025-foss-2024a-kokkos %scale=1_node /ade8cad7 @BotBuildTests:x86-64-zen2+default
P: perf: 435.757 timesteps/s (r:0, l:None, u:None)
[ OK ] (2/5) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen2+default
P: latency: 1.29 us (r:0, l:None, u:None)
[ OK ] (3/5) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen2+default
P: latency: 3.24 us (r:0, l:None, u:None)
[ OK ] (4/5) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen2+default
P: latency: 0.19 us (r:0, l:None, u:None)
[ OK ] (5/5) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen2+default
P: bandwidth: 7926.92 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 5/5 test case(s) from 5 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-168047.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@zerefwayne

Copy link
Copy Markdown
Contributor Author

Test 2

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2

@eessi-bot-aws

eessi-bot-aws Bot commented Jun 18, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen2
Building for: x86_64/amd/zen2
Job dir: /project/def-users/SHARED/jobs/2026.06/pr_1527/168048

date job status comment
Jun 18 14:49:19 UTC 2026 submitted job id 168048 awaits release by job manager
Jun 18 14:49:32 UTC 2026 released job awaits launch by Slurm scheduler
Jun 18 14:54:48 UTC 2026 running job 168048 is running
Jun 18 15:00:57 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-168048.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen2-17817947200.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen2/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen2/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen2/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen2
no other files in tarball
Jun 18 15:00:57 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/5) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/22Jul2025-foss-2024a-kokkos %scale=1_node /ade8cad7 @BotBuildTests:x86-64-zen2+default
P: perf: 439.251 timesteps/s (r:0, l:None, u:None)
[ OK ] (2/5) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen2+default
P: latency: 1.37 us (r:0, l:None, u:None)
[ OK ] (3/5) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen2+default
P: latency: 2.07 us (r:0, l:None, u:None)
[ OK ] (4/5) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen2+default
P: latency: 0.84 us (r:0, l:None, u:None)
[ OK ] (5/5) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen2+default
P: bandwidth: 7880.44 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 5/5 test case(s) from 5 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-168048.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@zerefwayne

Copy link
Copy Markdown
Contributor Author

Test 3

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2

@eessi-bot-aws

eessi-bot-aws Bot commented Jun 18, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen2
Building for: x86_64/amd/zen2
Job dir: /project/def-users/SHARED/jobs/2026.06/pr_1527/168049

date job status comment
Jun 18 15:07:04 UTC 2026 submitted job id 168049 awaits release by job manager
Jun 18 15:08:05 UTC 2026 released job awaits launch by Slurm scheduler
Jun 18 15:09:08 UTC 2026 running job 168049 is running
Jun 18 15:28:55 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-168049.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen2-17817963990.tar.zstsize: 2 MiB (2714844 bytes)
entries: 510
modules under 2025.06/software/linux/x86_64/amd/zen2/modules/all
CppHeaderParser/2.7.4-GCCcore-14.2.0.lua
fmt/11.2.0-GCCcore-14.2.0.lua
gflags/2.2.2-GCCcore-14.2.0.lua
glog/0.7.1-GCCcore-14.2.0.lua
rocm-core/6.4.0-GCCcore-14.2.0-ROCm-6.4.1.lua
rocm-smi/7.6.0-GCCcore-14.2.0-ROCm-6.4.1.lua
rocprofiler-register/0.4.0-GCCcore-14.2.0-ROCm-6.4.1.lua
zlib/1.3.1-GCCcore-14.2.0.lua
software under 2025.06/software/linux/x86_64/amd/zen2/software
CppHeaderParser/2.7.4-GCCcore-14.2.0
fmt/11.2.0-GCCcore-14.2.0
gflags/2.2.2-GCCcore-14.2.0
glog/0.7.1-GCCcore-14.2.0
rocm-core/6.4.0-GCCcore-14.2.0-ROCm-6.4.1
rocm-smi/7.6.0-GCCcore-14.2.0-ROCm-6.4.1
rocprofiler-register/0.4.0-GCCcore-14.2.0-ROCm-6.4.1
zlib/1.3.1-GCCcore-14.2.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen2/reprod
CppHeaderParser/2.7.4-GCCcore-14.2.0/20260618_152312UTC
fmt/11.2.0-GCCcore-14.2.0/20260618_151628UTC
gflags/2.2.2-GCCcore-14.2.0/20260618_151715UTC
glog/0.7.1-GCCcore-14.2.0/20260618_151935UTC
rocm-core/6.4.0-GCCcore-14.2.0-ROCm-6.4.1/20260618_152500UTC
rocm-smi/7.6.0-GCCcore-14.2.0-ROCm-6.4.1/20260618_152145UTC
rocprofiler-register/0.4.0-GCCcore-14.2.0-ROCm-6.4.1/20260618_152439UTC
zlib/1.3.1-GCCcore-14.2.0/20260618_152020UTC
other under 2025.06/software/linux/x86_64/amd/zen2
no other files in tarball
Jun 18 15:28:55 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/5) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/22Jul2025-foss-2024a-kokkos %scale=1_node /ade8cad7 @BotBuildTests:x86-64-zen2+default
P: perf: 441.695 timesteps/s (r:0, l:None, u:None)
[ OK ] (2/5) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86-64-zen2+default
P: latency: 1.33 us (r:0, l:None, u:None)
[ OK ] (3/5) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86-64-zen2+default
P: latency: 2.03 us (r:0, l:None, u:None)
[ OK ] (4/5) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86-64-zen2+default
P: latency: 0.18 us (r:0, l:None, u:None)
[ OK ] (5/5) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86-64-zen2+default
P: bandwidth: 8003.65 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 5/5 test case(s) from 5 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-168049.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants