Skip to content

Initial OpenMP/GPU support#228

Open
pbartholomew08 wants to merge 32 commits into
xcompact3d:mainfrom
pbartholomew08:omp_gpu
Open

Initial OpenMP/GPU support#228
pbartholomew08 wants to merge 32 commits into
xcompact3d:mainfrom
pbartholomew08:omp_gpu

Conversation

@pbartholomew08
Copy link
Copy Markdown
Member

This branch implements the basic offloading of data using openMP with the ability to perform the vecadd operation.

It now behaves polymorphically and will create device fields when
that is the `next` type and host fields otherwise
As libx3d2_backends links libx3d2 and xcompact/tests link both the
linking order must be libx3d2_backends then libx3d2 to prevent
duplicate symbols during linking.
This is to allow initialisation of the base class whether called
by OMP/CPU or the OMP/TGT object
The code runs through the test successfully - need to confirm
offload and check for data movement
This means data resides on the device only
Note this is only working on AMD GPUs (although it should be supported
on NVIDIA GPUs w/Cray compiler)
@pbartholomew08 pbartholomew08 marked this pull request as ready for review April 2, 2026 10:54
@pbartholomew08 pbartholomew08 requested a review from ia267 April 2, 2026 10:54
Copy link
Copy Markdown
Collaborator

@ia267 ia267 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things to address before merging.

use m_common, only: dp, pi, DIR_X
use m_mesh, only: mesh_t

use m_omptg_allocator, only: omptgt_allocator_t
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: should be m_omptgt_allocator

I guess this file hasn't been added to CMakeListst.txt so it wasn't pickedup.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected - and added to the tests/CMakeLists.txt under unit/

Comment thread src/CMakeLists.txt Outdated
set(CMAKE_Fortran_FLAGS_DEBUG "-g -Og -Wall -Wpedantic -Werror -Wimplicit-interface -Wimplicit-procedure -Wno-unused-dummy-argument")
set(CMAKE_Fortran_FLAGS_RELEASE "-O3 -ffast-math")
if (OMP_TGT)
# A bit of a hack - hardcoded for MI300A
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use CMake cache varibales (e.g. OMP_TGT_ARCH) so we don't need to edit CMake files?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - this was added for development, and not intended as the actual implementation

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment below for resolution

Comment thread tests/CMakeLists.txt
target_link_libraries(${test_name} PRIVATE OpenMP::OpenMP_Fortran)

if(${backend} STREQUAL omp_tgt)
# Note this is somewhat of a hack - hardcoded to build against MI300A
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use CMake cache variables instead?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a list of options as follow:

set(gpu_archs gfx942 CACHE STRING "")
set_property(CACHE gpu_archs PROPERTY STRINGS gfx942 gfx900 gfx902 gfx906 gfx908 gfx909 gfx90a gfx90c)

One other alternative is to add an additional option of the form cmake .. -Dgpu_arch=gfx942

This would avoid updating the list of toggled option every time there is a new architecture coming out but put the responsibility of setting the correct input on the user.

In both case, we should be able to set the parameter as follow
target_compile_options(${test_name} PRIVATE "--offload-arch=${gpu_arch}")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone with the user must specify option, now when configuring the user should do

cmake -B build . <options> -DOMP_TGT=ON -DOMP_TGT_ARCH=<arch>

end subroutine

! Deallocates device-resident memory before deallocating the base type
subroutine destroy(self)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This destroy subroutine is not bound to omptgt_allocator_t. Need to add procedure :: destroy => destroy to omptgt_allocator_t

Comment thread tests/performance/perf_cuda_reorder.f90 Outdated
Comment on lines +296 to +306
!$omp target teams distribute parallel do private(out_i, out_j, out_k) collapse(3) map(to:u) has_device_addr(u_)
do k = 1, dims(3)
do j = 1, dims(2)
do i = 1, dims(1)
call get_index_reordering(out_i, out_j, out_k, i, j, k, &
dir_from, dir_to, SZ, cart_padded)
u_(out_i, out_j, out_k) = u(i, j, k)
end do
end do
end do
!$omp end target teams distribute parallel do
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: Re-ordering without using shared (scratchpad) memory is likely to be inefficient on GPU due to non-coalesced memory access

The Cray environment should handle this automatically
The user must now set OMP_TGT_ARCH when building code for OpenMP
target offloading using -DOMP_TGT_ARCH
Comment thread CMakeLists.txt
Comment on lines +28 to +31
set(OMP_TGT OFF CACHE BOOL
"Enable OpenMP target offloading.")
set(OMP_TGT_ARCH "" CACHE STRING "Target architecture for OpenMP offloading, e.g. gfx942")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that I need OpenMP 5.1, so I would add a CMake check on the OpenMP version

Comment thread src/backend/omp/target/allocator.f90
Comment thread tests/CMakeLists.txt
Comment on lines 149 to 151
# CUDA backend tests
if(${CMAKE_Fortran_COMPILER_ID} STREQUAL "PGI" OR
${CMAKE_Fortran_COMPILER_ID} STREQUAL "NVHPC")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: with the current logic, those cuda unit tests will be run if one use OpenMP on a Nvidia machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants