Skip to content

CI: Update ci-cuda tests to maintained workflow#3282

Open
dschwoerer wants to merge 3 commits intonextfrom
ci-cuda
Open

CI: Update ci-cuda tests to maintained workflow#3282
dschwoerer wants to merge 3 commits intonextfrom
ci-cuda

Conversation

@dschwoerer
Copy link
Contributor

No description provided.

@ZedThree
Copy link
Member

Error is in CMake:

  CUDA language enabled prior to setting CMAKE_CUDA_HOST_COMPILER.  Please
  set CMAKE_CUDA_HOST_COMPILER prior to ENABLE_LANGUAGE(CUDA) or PROJECT(..
  LANGUAGES CUDA)

key: zenodo-data-${{ hashFiles('tests/integrated/test-fci-mpi/CMakeLists.txt') }}

- name: Build minimal CUDA 12.2 @ GCC9.4.0 @ Ubuntu 20.04
- name: Build minimal CUDA 12.6 @ GCC11 @ Ubuntu 22.04
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the base image is using ubuntu 22.04 rather than the 24.04 LTS version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see whether we can update to 24.04 and cuda 13.1.1:
https://github.com/boutproject/bout-container-base/actions/runs/22481022925

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that works, please could you also bump fmt to 12.x?

@ZedThree
Copy link
Member

I set that variable in the CMake call, now failing with:

In file included from /usr/local/cuda/include/thrust/system/cuda/detail/execution_policy.h:40,
                 from /usr/local/cuda/include/thrust/iterator/detail/device_system_tag.h:31,
                 from /usr/local/cuda/include/thrust/iterator/iterator_traits.h:75,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/util.h:44,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/core/alignment.h:31,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/core/triple_chevron_launch.h:38,
                 from /spack-env/.spack-env/view/include/cub/device/dispatch/dispatch_scan.cuh:49,
                 from /spack-env/.spack-env/view/include/cub/device/device_scan.cuh:38,
                 from /spack-env/.spack-env/view/include/RAJA/policy/cuda/scan.hpp:28,
                 from /spack-env/.spack-env/view/include/RAJA/policy/cuda.hpp:38,
                 from /spack-env/.spack-env/view/include/RAJA/RAJA.hpp:71,
                 from /__w/BOUT-dev/BOUT-dev/include/bout/field2d.hxx:42,
                 from /__w/BOUT-dev/BOUT-dev/include/bout/boundary_op.hxx:9,
                 from /__w/BOUT-dev/BOUT-dev/src/field/field2d.cxx:32:
/usr/local/cuda/include/thrust/system/cuda/config.h:122:2: error: #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
  122 | #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
      |  ^~~~~

It looks like an issue with the spack environment. @yassineAlouini any ideas? What's different about how we're building here, vs in bout-container-base?

@yassineAlouini
Copy link

I set that variable in the CMake call, now failing with:

In file included from /usr/local/cuda/include/thrust/system/cuda/detail/execution_policy.h:40,
                 from /usr/local/cuda/include/thrust/iterator/detail/device_system_tag.h:31,
                 from /usr/local/cuda/include/thrust/iterator/iterator_traits.h:75,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/util.h:44,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/core/alignment.h:31,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/core/triple_chevron_launch.h:38,
                 from /spack-env/.spack-env/view/include/cub/device/dispatch/dispatch_scan.cuh:49,
                 from /spack-env/.spack-env/view/include/cub/device/device_scan.cuh:38,
                 from /spack-env/.spack-env/view/include/RAJA/policy/cuda/scan.hpp:28,
                 from /spack-env/.spack-env/view/include/RAJA/policy/cuda.hpp:38,
                 from /spack-env/.spack-env/view/include/RAJA/RAJA.hpp:71,
                 from /__w/BOUT-dev/BOUT-dev/include/bout/field2d.hxx:42,
                 from /__w/BOUT-dev/BOUT-dev/include/bout/boundary_op.hxx:9,
                 from /__w/BOUT-dev/BOUT-dev/src/field/field2d.cxx:32:
/usr/local/cuda/include/thrust/system/cuda/config.h:122:2: error: #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
  122 | #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
      |  ^~~~~

It looks like an issue with the spack environment. @yassineAlouini any ideas? What's different about how we're building here, vs in bout-container-base?

I will give this a look this weekend if it is still not resolved. 👌

@dschwoerer
Copy link
Contributor Author

We now get a different error:


2026-02-27T10:27:43.0122165Z [ 61%] Building CUDA object CMakeFiles/bout++.dir/src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx.o
2026-02-27T10:27:43.0175546Z nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
2026-02-27T10:27:47.0313978Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h: In function 'std::system_error fmt::v10::vsystem_error(int, fmt::v10::string_view, fmt::v10::format_args)':
2026-02-27T10:27:47.0316843Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h:150:8: error: expected primary-expression before 'class'
2026-02-27T10:27:47.0319058Z   150 |   return std::system_error(ec, vformat(fmt, args));
2026-02-27T10:27:47.0319601Z       |        ^~~~~
2026-02-27T10:27:47.0320798Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h:150:7: error: expected ';' before 'class'
2026-02-27T10:27:47.0322165Z   150 |   return std::system_error(ec, vformat(fmt, args));
2026-02-27T10:27:47.0322692Z       |       ^~~~~~
2026-02-27T10:27:47.0323029Z       |       ;
2026-02-27T10:27:47.0324263Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h:150:8: error: expected primary-expression before 'class'
2026-02-27T10:27:47.0325700Z   150 |   return std::system_error(ec, vformat(fmt, args));
2026-02-27T10:27:47.0326207Z       |        ^~~~~
2026-02-27T10:27:47.0421325Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h: In function 'void fmt::v10::format_system_error(fmt::v10::detail::buffer<char>&, int, const char*)':
2026-02-27T10:27:47.0423855Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h:1414:32: error: expected primary-expression before 'class'
2026-02-27T10:27:47.0425484Z  1414 |     write(std::back_inserter(out), std::system_error(ec, message).what());
2026-02-27T10:27:47.0426176Z       |                                ^~~~~
2026-02-27T10:27:47.1229664Z [ 61%] Building CUDA object CMakeFiles/bout++.dir/src/invert/laplace/impls/petsc/petsc_laplace.cxx.o
2026-02-27T10:27:47.1291053Z nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
2026-02-27T10:27:49.1813197Z [ 62%] Building CUDA object CMakeFiles/bout++.dir/src/invert/laplace/impls/petsc3damg/petsc3damg.cxx.o
2026-02-27T10:27:49.1876845Z nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
2026-02-27T10:27:50.1917462Z /__w/BOUT-dev/BOUT-dev/include/bout/paralleltransform.hxx(107): warning #611-D: overloaded virtual function "ParallelTransform::toFieldAligned" is only partially overridden in class "ParallelTransformIdentity"
2026-02-27T10:27:50.1919340Z   class ParallelTransformIdentity : public ParallelTransform {
2026-02-27T10:27:50.1919971Z         ^
2026-02-27T10:27:50.1920153Z 
2026-02-27T10:27:50.1920884Z Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
2026-02-27T10:27:50.1921642Z 
2026-02-27T10:27:50.1933708Z /__w/BOUT-dev/BOUT-dev/include/bout/paralleltransform.hxx(107): warning #611-D: overloaded virtual function "ParallelTransform::fromFieldAligned" is only partially overridden in class "ParallelTransformIdentity"
2026-02-27T10:27:50.1935562Z   class ParallelTransformIdentity : public ParallelTransform {
2026-02-27T10:27:50.1936238Z         ^
2026-02-27T10:27:50.1936797Z 
2026-02-27T10:27:50.2021861Z /__w/BOUT-dev/BOUT-dev/include/bout/paralleltransform.hxx(182): warning #611-D: overloaded virtual function "ParallelTransform::toFieldAligned" is only partially overridden in class "ShiftedMetric"
2026-02-27T10:27:50.2024868Z   class ShiftedMetric : public ParallelTransform {
2026-02-27T10:27:50.2025401Z         ^
2026-02-27T10:27:50.2025575Z 
2026-02-27T10:27:50.2042796Z /__w/BOUT-dev/BOUT-dev/include/bout/paralleltransform.hxx(182): warning #611-D: overloaded virtual function "ParallelTransform::fromFieldAligned" is only partially overridden in class "ShiftedMetric"
2026-02-27T10:27:50.2045796Z   class ShiftedMetric : public ParallelTransform {
2026-02-27T10:27:50.2046328Z         ^
2026-02-27T10:27:50.2046506Z 
2026-02-27T10:27:51.8038656Z [ 62%] Building CUDA object CMakeFiles/bout++.dir/src/invert/laplace/impls/serial_band/serial_band.cxx.o
2026-02-27T10:27:51.8098247Z nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
2026-02-27T10:27:51.8502656Z make[2]: *** [CMakeFiles/bout++.dir/build.make:715: CMakeFiles/bout++.dir/src/invert/laplace/impls/pcr/pcr.cxx.o] Error 1

It seems to use the provided fmt. Did we recently change something here, that requires a newer fmt?

@ZedThree
Copy link
Member

 /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h:150:8: error: expected primary-expression before 'class'
   150 |   return std::system_error(ec, vformat(fmt, args));
       |        ^~~~~

The error doesn't match up with the highlighted bit, so something funky is going on.

Did we recently change something here, that requires a newer fmt?

I don't think so, but the C++20 stuff might need a newer version. At any rate, it would be good to match the bundled version

@yassineAlouini
Copy link

After investigating the build failures, here's a summary of the root cause and two fix options:

Root Cause

There is a fmt version mismatch between what the Spack environment in bout-container-base provides and what BOUT-dev bundles:

Source fmt version
Spack in ci-cuda Dockerfile 10.2.1
BOUT-dev bundled submodule 12.1.0 (FMT_VERSION 120100)

The cmake call in the BOUT-dev CI passes -DBOUT_USE_SYSTEM_FMT=on, which forces BOUT-dev to use the Spack-provided fmt 10.2.1 instead of its bundled 12.1.0. That older version has a known issue with nvcc: the parameter named fmt in format-inl.h:150 conflicts with nvcc's C++20 parsing, producing the misleading expected primary-expression before 'class' error.

Additionally, Spack v0.23.0 (used in the Dockerfile) only goes up to fmt 11.0.2 — it doesn't have fmt@12.x at all.


Fix Options

Option A (recommended — cleanest): Remove fmt from the Spack environment in bout-container-base, and remove -DBOUT_USE_SYSTEM_FMT=on from the BOUT-dev CI cmake call — letting BOUT-dev use its own bundled fmt 12.1.0.

Changes needed:

  • bout-container-base (ci-cuda/Dockerfile): remove spack add fmt@10.2.1 and fmt from the spack load line in bout-env.bash
  • BOUT-dev (tests.yml): remove -DBOUT_USE_SYSTEM_FMT=on from the cmake invocation

Option B: Upgrade Spack fmt to 11.0.2 (the latest available in Spack v0.23.0) and keep -DBOUT_USE_SYSTEM_FMT=on — but this still wouldn't match BOUT-dev's bundled 12.1.0 and is unverified to fix the nvcc parsing issue.

Option A is the cleaner path since it eliminates the dependency conflict and leverages the already-working bundled version.

@ZedThree
Copy link
Member

Option A sounds reasonable. Is there a reason to not consider upgrading spack to 1.x?

@yassineAlouini
Copy link

Option A sounds reasonable. Is there a reason to not consider upgrading spack to 1.x?

I will explore option A further and let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants