Initial OpenMP/GPU support by pbartholomew08 · Pull Request #228 · xcompact3d/x3d2

pbartholomew08 · 2025-09-30T10:48:25Z

This branch implements the basic offloading of data using openMP with the ability to perform the vecadd operation.

It now behaves polymorphically and will create device fields when that is the `next` type and host fields otherwise

As libx3d2_backends links libx3d2 and xcompact/tests link both the linking order must be libx3d2_backends then libx3d2 to prevent duplicate symbols during linking.

This is to allow initialisation of the base class whether called by OMP/CPU or the OMP/TGT object

The code runs through the test successfully - need to confirm offload and check for data movement

This means data resides on the device only

Note this is only working on AMD GPUs (although it should be supported on NVIDIA GPUs w/Cray compiler)

ia267

Couple of things to address before merging.

ia267 · 2026-04-06T03:41:04Z

+  use m_common, only: dp, pi, DIR_X
+  use m_mesh, only: mesh_t
+
+  use m_omptg_allocator, only: omptgt_allocator_t


typo: should be m_omptgt_allocator

I guess this file hasn't been added to CMakeListst.txt so it wasn't pickedup.

Corrected - and added to the tests/CMakeLists.txt under unit/

ia267 · 2026-04-06T04:00:50Z

  set(CMAKE_Fortran_FLAGS_DEBUG "-g -Og -Wall -Wpedantic -Werror -Wimplicit-interface -Wimplicit-procedure -Wno-unused-dummy-argument")
  set(CMAKE_Fortran_FLAGS_RELEASE "-O3 -ffast-math")
+  if (OMP_TGT)
+    # A bit of a hack - hardcoded for MI300A


Could we use CMake cache varibales (e.g. OMP_TGT_ARCH) so we don't need to edit CMake files?

Good point - this was added for development, and not intended as the actual implementation

see comment below for resolution

ia267 · 2026-04-06T04:02:17Z

    target_link_libraries(${test_name} PRIVATE OpenMP::OpenMP_Fortran)
+
+    if(${backend} STREQUAL omp_tgt)
+      # Note this is somewhat of a hack - hardcoded to build against MI300A


Could we use CMake cache variables instead?

We can have a list of options as follow:

set(gpu_archs gfx942 CACHE STRING "") set_property(CACHE gpu_archs PROPERTY STRINGS gfx942 gfx900 gfx902 gfx906 gfx908 gfx909 gfx90a gfx90c)

One other alternative is to add an additional option of the form cmake .. -Dgpu_arch=gfx942

This would avoid updating the list of toggled option every time there is a new architecture coming out but put the responsibility of setting the correct input on the user.

In both case, we should be able to set the parameter as follow
target_compile_options(${test_name} PRIVATE "--offload-arch=${gpu_arch}")

I have gone with the user must specify option, now when configuring the user should do

cmake -B build . <options> -DOMP_TGT=ON -DOMP_TGT_ARCH=<arch>

ia267 · 2026-04-06T04:28:00Z

+  end subroutine
+
+  ! Deallocates device-resident memory before deallocating the base type
+  subroutine destroy(self)


This destroy subroutine is not bound to omptgt_allocator_t. Need to add procedure :: destroy => destroy to omptgt_allocator_t

CFD-Xing · 2026-04-07T10:01:19Z

+    !$omp target teams distribute parallel do private(out_i, out_j, out_k) collapse(3) map(to:u) has_device_addr(u_)
+    do k = 1, dims(3)
+      do j = 1, dims(2)
+        do i = 1, dims(1)
+          call get_index_reordering(out_i, out_j, out_k, i, j, k, &
+                                    dir_from, dir_to, SZ, cart_padded)
+          u_(out_i, out_j, out_k) = u(i, j, k)
+        end do
+      end do
+    end do
+    !$omp end target teams distribute parallel do


Observation: Re-ordering without using shared (scratchpad) memory is likely to be inefficient on GPU due to non-coalesced memory access

The Cray environment should handle this automatically

The user must now set OMP_TGT_ARCH when building code for OpenMP target offloading using -DOMP_TGT_ARCH

CFD-Xing · 2026-05-18T09:31:57Z

+set(OMP_TGT OFF CACHE BOOL
+  "Enable OpenMP target offloading.")
+set(OMP_TGT_ARCH "" CACHE STRING "Target architecture for OpenMP offloading, e.g. gfx942")
+


I believe that I need OpenMP 5.1, so I would add a CMake check on the OpenMP version

CFD-Xing · 2026-05-18T10:50:02Z

 # CUDA backend tests
 if(${CMAKE_Fortran_COMPILER_ID} STREQUAL "PGI" OR
   ${CMAKE_Fortran_COMPILER_ID} STREQUAL "NVHPC")


Observation: with the current logic, those cuda unit tests will be run if one use OpenMP on a Nvidia machine.

pbartholomew08 force-pushed the omp_gpu branch from 44afac1 to 86a878a Compare January 9, 2026 10:17

pbartholomew08 added 26 commits April 2, 2026 11:16

WIP: Implement a OpenMP target field type and allocator

265a028

Move OpenMP target offloads to omp/target directory

c8ae25e

Optionally build OpenMP Target backend

9dad110

Fix types in OpenMP target block allocator

1be2573

It now behaves polymorphically and will create device fields when that is the `next` type and host fields otherwise

WIP on OMP target vecadd

d1da9cf

Correcting link order

68b0fef

As libx3d2_backends links libx3d2 and xcompact/tests link both the linking order must be libx3d2_backends then libx3d2 to prevent duplicate symbols during linking.

Cleaning up test_vecadd

63dc2ac

The omp backend must assign its allocator based on class

e170514

This is to allow initialisation of the base class whether called by OMP/CPU or the OMP/TGT object

Don't declare the method as a module function

1ad568c

Need to allocate the new field pointer

833e6f5

Specify the target mapping operations when creating a field

85005a7

Initially 'working' OMP target vec add

842fc8d

The code runs through the test successfully - need to confirm offload and check for data movement

Remove debugging print statement

59b40ed

We only need the 3-D view of data on the device

a48a61c

Remove duplicate entry from CMakeLists sources

19acbc9

Mark index calculations as offloadable

c713318

Add support for get/set fields with OMP target

13f435e

WIP - attempting simplified OMP calls

06c1877

WIP allocating memory using OpenMP API

105d7d4

This means data resides on the device only

Continuing ...

6ef42de

Trying to map pointers to target...

362e992

Initial working version of OMPTARGET vec add

dbf7ee0

Note this is only working on AMD GPUs (although it should be supported on NVIDIA GPUs w/Cray compiler)

Restore IBM module

8fbf90c

Minor formatting change

46ca117

Adding support for OMP offload of timestepping

559642d

Update OMPTGT test definitions

9755b76

pbartholomew08 force-pushed the omp_gpu branch from 92b6ffe to 9755b76 Compare April 2, 2026 10:32

Run fprettify

794a8f3

pbartholomew08 marked this pull request as ready for review April 2, 2026 10:54

pbartholomew08 requested a review from ia267 April 2, 2026 10:54

ia267 requested changes Apr 6, 2026

View reviewed changes

CFD-Xing reviewed Apr 7, 2026

View reviewed changes

Comment thread tests/performance/perf_cuda_reorder.f90 Outdated

CFD-Xing reviewed Apr 7, 2026

View reviewed changes

pbartholomew08 added 5 commits April 7, 2026 14:42

Fixing CUDA syntax

083c5f0

Only build 2decomp when requested

e06c7b9

Add support for 'Flang' Fortran compiler

1694df3

Don't pass offload arch to the Cray compiler

1dcff65

The Cray environment should handle this automatically

Add OMP_TGT_ARCH to CMake build process

e83b81a

The user must now set OMP_TGT_ARCH when building code for OpenMP target offloading using -DOMP_TGT_ARCH

CFD-Xing reviewed May 18, 2026

View reviewed changes

Comment thread src/backend/omp/target/allocator.f90

CFD-Xing reviewed May 18, 2026

View reviewed changes

Conversation

pbartholomew08 commented Sep 30, 2025

Uh oh!

ia267 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants