Skip to content

[Experiment] ROCm backend#2300

Open
NripeshN wants to merge 217 commits intoml-explore:mainfrom
NripeshN:rocm-support
Open

[Experiment] ROCm backend#2300
NripeshN wants to merge 217 commits intoml-explore:mainfrom
NripeshN:rocm-support

Conversation

@NripeshN
Copy link
Copy Markdown
Contributor

@NripeshN NripeshN commented Jun 16, 2025

Experiment with ROCm backend.

install MLX with ROCm backend using:

mkdir build && cd build
cmake -DMLX_BUILD_ROCM=ON \
      -DCMAKE_PREFIX_PATH=/opt/rocm \
      -DCMAKE_HIP_ARCHITECTURES="gfx90a;gfx1100" \
      ..
make -j$(nproc)

closes #2556

Inspired by @zcbenz

@NripeshN NripeshN changed the title [Experiment] ROCm backend initial push [Experiment] ROCm backend Jun 16, 2025
@lin72h
Copy link
Copy Markdown

lin72h commented Jun 17, 2025

What an unexpected and amazing surprise! I'm absolutely thrilled.

@NripeshN
Copy link
Copy Markdown
Contributor Author

@awni
What do you think of this PR? Does this have the potential to be merged into main? I can turn this PR from experimental to WIP if so.

@angeloskath
Copy link
Copy Markdown
Member

I think this is good to stay as an experiment branch for some time while we work on core and CUDA. I don't think we have the bandwidth to merge this for a few months at least. Sorry if this is disappointing @NripeshN I don't mean to discourage you working on it.

@akshat2602
Copy link
Copy Markdown

I would love to see the ROCm backend get more traction. The new AI series of processors by AMD have a similar advantage to Apple Silicon with unified memory and getting MLX to run on those processors would be neat.

@countradooku
Copy link
Copy Markdown

Stole my idea :(

@goniz
Copy link
Copy Markdown

goniz commented Jan 22, 2026

How is this even possible for such an awesome PR to be left like this?

Copilot AI review requested due to automatic review settings January 24, 2026 17:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds experimental ROCm backend support to MLX, enabling execution on AMD GPUs. The implementation mirrors the CUDA backend structure, providing HIP-based implementations of core operations, memory management, and device handling.

Changes:

  • Added ROCm backend infrastructure with device management, memory allocation, and stream handling
  • Implemented HIP kernels for unary, binary, ternary operations, reductions, normalization (softmax, layer_norm, rms_norm), RoPE, and sorting
  • Updated build system (CMake) to support ROCm compilation with configurable GPU architectures

Reviewed changes

Copilot reviewed 59 out of 59 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
CMakeLists.txt Added MLX_BUILD_ROCM option and ROCm library detection
mlx/CMakeLists.txt Integrated ROCm backend build configuration
mlx/device.cpp Added ROCm device availability checks
mlx/backend/rocm/*.hip HIP kernel implementations for various operations
mlx/backend/rocm/device.* ROCm device and stream management
mlx/backend/rocm/allocator.* ROCm-specific memory allocator using HIP unified memory
mlx/backend/rocm/worker.* Async task execution worker for stream synchronization
mlx/backend/rocm/utils.* HIP utility functions and error handling
mlx/backend/rocm/jit_module.* JIT compilation support using HIPRTC
mlx/backend/rocm/device/*.hpp Device-side utility functions and type definitions
mlx/backend/rocm/CMakeLists.txt ROCm backend build configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ather, scatter, logsumexp, random bits generation, and sorting. Introduce new kernels for efficient computation and integrate with existing ROCm utilities. Update CMake configuration to include new source files and dependencies. Enhance error handling and ensure compatibility with different data types. This commit significantly expands the functionality of the ROCm backend.
@goniz
Copy link
Copy Markdown

goniz commented Jan 24, 2026

👑👑👑

@NripeshN
Copy link
Copy Markdown
Contributor Author

Can anyone run

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON" pip install -e .
CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES={based on your GPU}" pip install -e .

Replace {based on your GPU} with your GPU architecture

You can run

rocm-smi

to get your GPU information

@goniz
Copy link
Copy Markdown

goniz commented Jan 24, 2026

I'm getting this CMake error:

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151" pip install -e .

      -- Configuring done (4.8s)
      CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
      Please set them or make sure they are set and tested correctly in the CMake files:
      /home/goniz/Work/mlx/LAPACK_INCLUDE_DIRS
         used as include directory in directory /home/goniz/Work/mlx
      
      CMake Error in CMakeLists.txt:
        HIP_ARCHITECTURES is empty for target "mlx".
      
      
      CMake Error in CMakeLists.txt:
        HIP_ARCHITECTURES is empty for target "mlx".
      
      
      -- Generating done (0.0s)
      CMake Generate step failed.  Build files cannot be regene
rated correctly.

Running on Strix Halo (gfx1151)

@NripeshN
Copy link
Copy Markdown
Contributor Author

I'm getting this CMake error:

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151" pip install -e .
     -- Configuring done (4.8s)
     CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
     Please set them or make sure they are set and tested correctly in the CMake files:
     /home/goniz/Work/mlx/LAPACK_INCLUDE_DIRS
        used as include directory in directory /home/goniz/Work/mlx
     
     CMake Error in CMakeLists.txt:
       HIP_ARCHITECTURES is empty for target "mlx".
     
     
     CMake Error in CMakeLists.txt:
       HIP_ARCHITECTURES is empty for target "mlx".
     
     
     -- Generating done (0.0s)
     CMake Generate step failed.  Build files cannot be regene
rated correctly.

Running on Strix Halo (gfx1151)

Could you retry with the latest push please (p.s. keep your fingers crossed while it compiles, worked for me 138th time)😅

… string formatting, replacing fmt library usage. Remove unused event.cpp file. Update kernel name generation and parameter formatting for consistency.
@goniz
Copy link
Copy Markdown

goniz commented Jan 25, 2026

  Created wheel for mlx: filename=mlx-0.30.4.dev20260125+cadf18c1-0.editable-cp314-cp314-linux_x86_64.whl size=4722 sha256=72c664adbfc4fb9ec317522a8d83b84f85d599d08bd691d7fec3abfdb6f3a5e9
  Stored in directory: /tmp/pip-ephem-wheel-cache-nt7w6bq0/wheels/8a/63/d1/d7d629a5ff73457822bb71aa527c083674bb19ca314735cd05
Successfully built mlx
Installing collected packages: mlx
Successfully installed mlx-0.30.4.dev20260125+cadf18c1

Now what can I test? 😍

@goniz
Copy link
Copy Markdown

goniz commented Jan 25, 2026

I'm getting this:

ImportError: /home/goniz/Work/mlx/python/mlx/lib/libmlx.so: undefined symbol: _ZN3mlx4core11Convolution8eval_gpuERKSt6vectorINS0_5arrayESaIS3_EERS3_

@NripeshN
Copy link
Copy Markdown
Contributor Author

I'm getting this:

ImportError: /home/goniz/Work/mlx/python/mlx/lib/libmlx.so: undefined symbol: _ZN3mlx4core11Convolution8eval_gpuERKSt6vectorINS0_5arrayESaIS3_EERS3_

I forgot to test the Python build my bad, can you try it now?

Unfortunately I might not be able to help after it compiles, I don't have an AMD GPU to run tests😔 I've tried replicating most things from cuda, so hopefully it works

@Geramy
Copy link
Copy Markdown

Geramy commented Mar 30, 2026

@NripeshN if you are interested this is a very interesting read.
https://seb-v.github.io/optimization/update/2025/01/20/Fast-GPU-Matrix-multiplication.html
It really exposes amazing ways to optimize kernels for RDNA 3 and up.

Geramy added 4 commits March 30, 2026 17:21
Replace SmallSizePool with a generalized SlabAllocator containing 18
power-of-2 size class pools (8B through 1MB). Each pool pre-allocates
slab pages and sub-allocates via O(1) free-list operations, eliminating
hipExtMallocWithFlags calls for small/medium allocations during decode.

- SizeClassPool: configurable block size, grow-on-demand slab pages
- SlabAllocator: O(1) size-class dispatch via bit ops
- Pre-allocates tiers 0-11 (8B-16KB) at startup (~5.8MB)
- Applies hipMemAdvise/hipMemPrefetchAsync on slab pages
- BufferCache still handles >1MB allocations
GEMM tuning: Request 8 algorithms from hipBLASLt heuristic and
benchmark each on first call per (M,N,K) shape. Cache the winner
for subsequent calls. Finds lower-VGPR kernels for better CU occupancy.

Copy reduction: Replace hipMemcpyAsync-based shape/stride passing in
copy_general and copy_general_input with by-value hip_array kernel
arguments. Eliminates 3 HIP API calls per general copy dispatch.

Results (Qwen3.5-35B-A3B-4bit):
- hipMemcpyAsync: 964 -> 77 (-92%)
- Gen tok/s: 25.1 -> 26.6 (+6%)
- Short gen: 21 -> 46 tok/s (+120%)
On integrated GPUs with fine-grained coherent unified memory,
hipStreamSynchronize is unnecessary when the stream has no pending
work. Use hipStreamQuery (non-blocking) to check first, only sync
when the stream is actually busy.

Results: hipStreamSynchronize calls reduced from 5683 to 54 (-99%).
Auto-tuning benchmarks 8 GEMM algorithms per shape on first call,
adding ~200ms startup overhead. For quantized models the regular
GEMM path is rarely used, so the overhead is wasted. Disable by
default; enable with MLX_ROCM_HIPBLASLT_TUNE=1 for non-quantized.

Warm prompt restored: Qwen3-8B 1092 tok/s, Qwen3.5-35B 795 tok/s.
Geramy added 2 commits March 30, 2026 20:32
hipEventSynchronize with hipEventBlockingSync causes CPU-GPU contention
on integrated GPUs where they share compute resources. Replace with a
hipEventQuery spin-loop that yields the thread between polls.

Also remove hipEventBlockingSync flag from CopyableHipEvent creation
to prevent kernel-level blocking waits.

Results (100 tokens, Qwen3.5-35B):
- hipEventSynchronize: 100 -> 0 (eliminated)
- Gen tok/s: 22.7 -> 25.5 (+12%)
Process 16 output columns per block instead of 8, so adjacent weight
rows share the same K-range in L2 cache. All 16 warps in the block
iterate through the same K-tiles simultaneously, keeping weight data
hot in L2 across columns.

Previous kernel: 8 columns/block, each warp streams full K independently.
L2 hit rate ~10% because weights evicted before reuse.

New kernel: 16 columns/block, weight tiles stay in L2 for 16x reuse.
Expected L2 hit rate improvement: 10% -> 40-70%.

Results:
- Qwen3-8B gen: 14.1 -> 23.6 tok/s (+67%)
- Qwen3.5-35B gen: 26.5 -> 30.8 tok/s (+16%)
@goniz
Copy link
Copy Markdown

goniz commented Mar 31, 2026

@goniz do you want to test this again? We should be at a point where its working.

I had to do this for it to compile on my machine (Strix halo, Omarchy, ROCm 7.2):

diff --git i/mlx/backend/rocm/sort.hip w/mlx/backend/rocm/sort.hip
index 2f00ea9a0..c0d228feb 100644
--- i/mlx/backend/rocm/sort.hip
+++ w/mlx/backend/rocm/sort.hip
@@ -7,16 +7,6 @@
 #include "mlx/primitives.h"

 #include <hip/hip_runtime.h>
-
-// Workaround: rocprim headers use placement new in __device__ code,
-// which requires __device__ overloads of operator new/delete.
-#ifdef __HIP_DEVICE_COMPILE__
-__device__ inline void* operator new(size_t, void* p) noexcept { return p; }
-__device__ inline void* operator new[](size_t, void* p) noexcept { return p; }
-__device__ inline void operator delete(void*, void*) noexcept {}
-__device__ inline void operator delete[](void*, void*) noexcept {}
-#endif
-
 #include <rocprim/rocprim.hpp>
 #include <cassert>
 #include <limits>

But it works

@Geramy
Copy link
Copy Markdown

Geramy commented Mar 31, 2026

@angeloskath how much longer before you think this could get merged, of course we are going to need a review and I assume that will come with some required changes. But it would be nice to see if we could start getting some feedback to plan to land this soon. Things are operating nicely actually.

@Geramy
Copy link
Copy Markdown

Geramy commented Mar 31, 2026

@goniz do you want to test this again? We should be at a point where its working.

I had to do this for it to compile on my machine (Strix halo, Omarchy, ROCm 7.2):

diff --git i/mlx/backend/rocm/sort.hip w/mlx/backend/rocm/sort.hip
index 2f00ea9a0..c0d228feb 100644
--- i/mlx/backend/rocm/sort.hip
+++ w/mlx/backend/rocm/sort.hip
@@ -7,16 +7,6 @@
 #include "mlx/primitives.h"

 #include <hip/hip_runtime.h>
-
-// Workaround: rocprim headers use placement new in __device__ code,
-// which requires __device__ overloads of operator new/delete.
-#ifdef __HIP_DEVICE_COMPILE__
-__device__ inline void* operator new(size_t, void* p) noexcept { return p; }
-__device__ inline void* operator new[](size_t, void* p) noexcept { return p; }
-__device__ inline void operator delete(void*, void*) noexcept {}
-__device__ inline void operator delete[](void*, void*) noexcept {}
-#endif
-
 #include <rocprim/rocprim.hpp>
 #include <cassert>
 #include <limits>

But it works

I'll have this fixed I primarily run 7.12

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Mar 31, 2026

Here is my attempt to build it on Pop!_OS 22.04 for use with my gfx1103 card

After installing CMake 4.3.1 from source (because a later version than 3.22.1 installed via apt was needed for this build)

$ which cmake; /usr/local/bin/cmake --version
/usr/local/bin/cmake
cmake version 4.3.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

Then, when I tried to build from with:

$ mkdir build && cd build
$ /usr/local/bin/cmake -DGGML_BLAS=ON \
                     -DGGML_BLAS_VENDOR=OpenBLAS \
                     -DMLX_BUILD_ROCM=ON \
                     -DCMAKE_PREFIX_PATH=/opt/rocm \
                     -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..

I got this

-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Error at /usr/local/share/cmake-4.3/Modules/FindPackageHandleStandardArgs.cmake:290 (message):
  Could NOT find BLAS (missing: BLAS_LIBRARIES)
Call Stack (most recent call first):
  /usr/local/share/cmake-4.3/Modules/FindPackageHandleStandardArgs.cmake:654 (_FPHSA_FAILURE_MESSAGE)
  /usr/local/share/cmake-4.3/Modules/FindBLAS.cmake:1509 (find_package_handle_standard_args)
  /usr/local/share/cmake-4.3/Modules/FindLAPACK.cmake:275 (find_package)
  /usr/local/share/cmake-4.3/Modules/FindLAPACK.cmake:321 (_lapack_find_dependency)
  CMakeLists.txt:326 (find_package)

After installing OpenBLAS via:

sudo apt install libopenblas-dev

I tried again:

$ /usr/local/bin/cmake -DGGML_BLAS=ON \
                                         -DGGML_BLAS_VENDOR=OpenBLAS \
                                         -DMLX_BUILD_ROCM=ON \
                                         -DCMAKE_PREFIX_PATH=/opt/rocm \
                                         -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /usr/lib/x86_64-linux-gnu/libopenblas.so;-lm;-ldl
-- Lapack lib /usr/lib/x86_64-linux-gnu/libopenblas.so-lm-ldl
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from _deps/json-src/include/
-- Downloading gguflib
CMake Warning (dev) at /opt/rocm/lib/cmake/hip/hip-config-amd.cmake:86 (message):
  amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
  /opt/rocm/lib/cmake/hip/hip-config.cmake:149 (include)
  mlx/backend/rocm/CMakeLists.txt:8 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is 
CMake Warning (dev) at /opt/rocm/lib/cmake/hip/hip-config-amd.cmake:86 (message):
  amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
  /opt/rocm/lib/cmake/hip/hip-config.cmake:149 (include)
  /usr/local/share/cmake-4.3/Modules/CMakeFindDependencyMacro.cmake:93 (find_package)
  /usr/local/share/cmake-4.3/Modules/CMakeFindDependencyMacro.cmake:125 (__find_dependency_common)
  /opt/rocm/lib/cmake/rocblas/rocblas-config.cmake:90 (find_dependency)
  mlx/backend/rocm/CMakeLists.txt:9 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is 
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
  Could not find a package configuration file provided by "rocwmma" with any
  of the following names:

    rocwmma.cps
    rocwmmaConfig.cmake
    rocwmma-config.cmake

  Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
  "rocwmma_DIR" to a directory containing one of the above files.  If
  "rocwmma" provides a separate development package or SDK, be sure it has
  been installed.

@Geramy
Copy link
Copy Markdown

Geramy commented Mar 31, 2026

Here is my attempt to build it on Pop!_OS 22.04 for use with my gfx1103 card

After installing CMake 4.3.1 from source (because a later version than 3.22.1 installed via apt was needed for this build)

$ which cmake; /usr/local/bin/cmake --version
/usr/local/bin/cmake
cmake version 4.3.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

Then, when I tried to build from with:

$ mkdir build && cd build
$ /usr/local/bin/cmake -DGGML_BLAS=ON \
                     -DGGML_BLAS_VENDOR=OpenBLAS \
                     -DMLX_BUILD_ROCM=ON \
                     -DCMAKE_PREFIX_PATH=/opt/rocm \
                     -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..

I got this

-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Error at /usr/local/share/cmake-4.3/Modules/FindPackageHandleStandardArgs.cmake:290 (message):
  Could NOT find BLAS (missing: BLAS_LIBRARIES)
Call Stack (most recent call first):
  /usr/local/share/cmake-4.3/Modules/FindPackageHandleStandardArgs.cmake:654 (_FPHSA_FAILURE_MESSAGE)
  /usr/local/share/cmake-4.3/Modules/FindBLAS.cmake:1509 (find_package_handle_standard_args)
  /usr/local/share/cmake-4.3/Modules/FindLAPACK.cmake:275 (find_package)
  /usr/local/share/cmake-4.3/Modules/FindLAPACK.cmake:321 (_lapack_find_dependency)
  CMakeLists.txt:326 (find_package)

After installing OpenBLAS via:

sudo apt install libopenblas-dev

I tried again:

$ /usr/local/bin/cmake -DGGML_BLAS=ON \
                                         -DGGML_BLAS_VENDOR=OpenBLAS \
                                         -DMLX_BUILD_ROCM=ON \
                                         -DCMAKE_PREFIX_PATH=/opt/rocm \
                                         -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /usr/lib/x86_64-linux-gnu/libopenblas.so;-lm;-ldl
-- Lapack lib /usr/lib/x86_64-linux-gnu/libopenblas.so-lm-ldl
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from _deps/json-src/include/
-- Downloading gguflib
CMake Warning (dev) at /opt/rocm/lib/cmake/hip/hip-config-amd.cmake:86 (message):
  amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
  /opt/rocm/lib/cmake/hip/hip-config.cmake:149 (include)
  mlx/backend/rocm/CMakeLists.txt:8 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is 
CMake Warning (dev) at /opt/rocm/lib/cmake/hip/hip-config-amd.cmake:86 (message):
  amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
  /opt/rocm/lib/cmake/hip/hip-config.cmake:149 (include)
  /usr/local/share/cmake-4.3/Modules/CMakeFindDependencyMacro.cmake:93 (find_package)
  /usr/local/share/cmake-4.3/Modules/CMakeFindDependencyMacro.cmake:125 (__find_dependency_common)
  /opt/rocm/lib/cmake/rocblas/rocblas-config.cmake:90 (find_dependency)
  mlx/backend/rocm/CMakeLists.txt:9 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is 
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
  Could not find a package configuration file provided by "rocwmma" with any
  of the following names:

    rocwmma.cps
    rocwmmaConfig.cmake
    rocwmma-config.cmake

  Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
  "rocwmma_DIR" to a directory containing one of the above files.  If
  "rocwmma" provides a separate development package or SDK, be sure it has
  been installed.

You need to install rocm 7.2 at least I believe.Thats probably your primary problem and secondary is to specify the HIP_PLATFORM=amd I believe.

Geramy added 2 commits March 31, 2026 16:23
…, SliceUpdate donation

Arch detection and tuning:
- RocmArchTier enum: RDNA 2/3/3.5/4/CDNA with fine-grained gfx detection
- HWInfo struct: CU count, SIMDs, L2 size, WMMA capability from hipDeviceProp
- ArchTuning: per-arch kernel parameters (QMV tile_n, crossover thresholds)
- Runtime TILE_N for qmv_tiled_kernel via kernel argument (no template bloat)
- MLX_ROCM_QMV_TILE_N env var for manual tuning

WMMA flash attention:
- flash_attention_wmma.hip: rocwmma 16x16x16 tiled kernel for bf16/fp16
- Dispatches for prefill (qL > 4) on supported head dims (64/128/256)
- Integrated into ScaledDotProductAttention dispatch

Arena allocator (DecodeArena):
- Deterministic bump allocator for HIP Graph capture
- Hooked into RocmAllocator malloc/free path
- Proven: 18 KB per decode step with stable addresses

SliceUpdate donation:
- Skip base array copy when input has unique ownership (refcount==1)
- Helps prefill path (200 donated during prompt processing)

GPU memcpy:
- mlx_gpu_memcpy_async (extern C) for direct KV cache writes
- gpu_arena/gpu_graph wrapper functions for engine integration
…V weight loads

custom_kernel.cpp: The hip_kernel lambda's 8th parameter was named
`verbose` but the CustomKernelFunction typedef passes
ensure_row_contiguous in that slot. When gated_delta.cpp called with
ensure_row_contiguous=true, it triggered a cout dump of the full
compiled kernel source — polluting stdout and appearing as model
output on every MoE inference (Qwen3.5-35B, Coder-Next).

qdequant.hpp: Add load_weight_vec<BITS>() helper that loads PPT
uint32 words via a single wide vector load (uint2 for 4-bit,
uint4 for 8-bit) instead of PPT scalar loads.

qmv_tiled_kernel.hip: Use load_weight_vec in both qmv_tiled_kernel
and gather_qmv_tiled_kernel with a warp-uniform branch to separate
the vectorized fast path from the bounds-checked tail path.
@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 1, 2026

After ensuring I have ROCm, I'm still unable to build (within a rocm/dev-ubuntu-22.04:7.1.1-complete container) after following these steps to ensure it is installed properly:

# cmake -DGGML_BLAS=ON \
                -DGGML_BLAS_VENDOR=OpenBLAS \
                -DMLX_BUILD_ROCM=ON \
                -DCMAKE_PREFIX_PATH=/opt/rocm \
                -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Lapack lib /usr/lib/x86_64-linux-gnu/libopenblas.so-lm-ldl
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from /home/chimezie/Projects/mlx-lm-rocm-support/build/_deps/json-src/include/
-- Downloading gguflib
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
  Could not find a package configuration file provided by "rocwmma" with any
  of the following names:

    rocwmma.cps
    rocwmmaConfig.cmake
    rocwmma-config.cmake

  Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
  "rocwmma_DIR" to a directory containing one of the above files.  If
  "rocwmma" provides a separate development package or SDK, be sure it has
  been installed.

-- Configuring incomplete, errors occurred!

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 1, 2026

rocwmma

try this
sudo apt-get install rocwmma-dev

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 2, 2026

Still no joy:

# sudo apt install libblas-dev liblapack-dev git rocwmma-dev        
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libblas-dev is already the newest version (3.10.0-2ubuntu1).
liblapack-dev is already the newest version (3.10.0-2ubuntu1).
git is already the newest version (1:2.34.1-1ubuntu1.17).
rocwmma-dev is already the newest version (2.1.0.70101-38~22.04).
0 upgraded, 0 newly installed, 0 to remove and 57 not upgraded.
# cmake -DGGML_BLAS=ON \
                -DGGML_BLAS_VENDOR=OpenBLAS \
                -DMLX_BUILD_ROCM=ON \
                -DCMAKE_PREFIX_PATH=/opt/rocm \
               -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Lapack lib /usr/lib/x86_64-linux-gnu/liblapack.so/usr/lib/x86_64-linux-gnu/libblas.so
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from _deps/json-src/include/
-- Downloading gguflib
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
  Could not find a package configuration file provided by "rocwmma" with any
  of the following names:

    rocwmma.cps
    rocwmmaConfig.cmake
    rocwmma-config.cmake

  Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
  "rocwmma_DIR" to a directory containing one of the above files.  If
  "rocwmma" provides a separate development package or SDK, be sure it has
  been installed.

-- Configuring incomplete, errors occurred!

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 2, 2026

Still no joy:

# sudo apt install libblas-dev liblapack-dev git rocwmma-dev        
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libblas-dev is already the newest version (3.10.0-2ubuntu1).
liblapack-dev is already the newest version (3.10.0-2ubuntu1).
git is already the newest version (1:2.34.1-1ubuntu1.17).
rocwmma-dev is already the newest version (2.1.0.70101-38~22.04).
0 upgraded, 0 newly installed, 0 to remove and 57 not upgraded.
# cmake -DGGML_BLAS=ON \
                -DGGML_BLAS_VENDOR=OpenBLAS \
                -DMLX_BUILD_ROCM=ON \
                -DCMAKE_PREFIX_PATH=/opt/rocm \
               -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Lapack lib /usr/lib/x86_64-linux-gnu/liblapack.so/usr/lib/x86_64-linux-gnu/libblas.so
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from _deps/json-src/include/
-- Downloading gguflib
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
  Could not find a package configuration file provided by "rocwmma" with any
  of the following names:

    rocwmma.cps
    rocwmmaConfig.cmake
    rocwmma-config.cmake

  Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
  "rocwmma_DIR" to a directory containing one of the above files.  If
  "rocwmma" provides a separate development package or SDK, be sure it has
  been installed.

-- Configuring incomplete, errors occurred!

Can you try this out?

  1. Add the AMD ROCm GPG Key
    First, securely add AMD's package signing key:

Bash
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/rocm.gpg
2. Add the ROCm 7.2 Repository & Set Pinning
Pop!_OS 24.04 uses the Ubuntu "Noble" base, so we use the noble repository. We also need to set the apt pin priority (which System76 explicitly recommends) so your system pulls ROCm packages properly without overriding critical Pop!_OS packages:

Bash
sudo tee /etc/apt/sources.list.d/rocm.list << EOF
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.2 noble main
EOF

sudo tee /etc/apt/preferences.d/rocm-pin-600 << EOF
Package: *
Pin: release o=repo.radeon.com
Pin-Priority: 600
EOF

sudo apt update
3. Install ROCm (User-Space Only)
Now, install the core ROCm compute libraries and the specific development headers MLX needs. By just installing rocm, apt pulls the HIP runtime and math libraries without touching your kernel:

Bash
sudo apt install rocm
sudo apt install rocwmma-dev hipblaslt-dev rocblas-dev rocm-core rocm-llvm
4. Assign GPU Permissions
For the ROCm libraries to talk to your GPU without sudo, your user needs to be in the render and video groups:

Bash
sudo usermod -a -G render,video $LOGNAME

If you dont like this method you could install the rocm manually with my bash script which brings in the newest. 7.12 bleeding edge.

FILTER_GFX=gfx1151 is set for me I have set it to gfx1103 for you already.

#!/bin/bash

# --- Configuration ---
BUCKET_URL="https://therock-nightly-tarball.s3.amazonaws.com"
INSTALL_DIR="/opt/rocm"
TEMP_DIR="/tmp/rocm_install_$(date +%s)"
FILTER_GFX="gfx110X"
FILTER_VER="7.12"
SERVICE_NAME="lemonade-server"

# Detect the real user (who ran sudo) to ensure build artifacts aren't owned by root
REAL_USER=${SUDO_USER:-$USER}
USER_HOME=$(getent passwd "$REAL_USER" | cut -d: -f6)
PROJECT_DIR="$USER_HOME/Documents/Programming/llama.rocm/llama.cpp"

# --- Functions ---
echo_step() { echo -e "\n\033[1;34m[INFO]\033[0m $1"; }
echo_err()  { echo -e "\n\033[1;31m[ERROR]\033[0m $1"; }

# Check Root
if [ "$EUID" -ne 0 ]; then 
    echo_err "Please run as root (sudo ./script.sh)"
    exit 1
fi

# ==========================================
# PART 1: Install ROCm (Requires Root)
# ==========================================

echo_step "Fetching release list from $BUCKET_URL..."
FILE_LIST=$(curl -s "$BUCKET_URL")

LATEST_FILE=$(echo "$FILE_LIST" | grep -oP '(?<=<Key>).*?(?=</Key>)' | grep "$FILTER_GFX" | grep "$FILTER_VER" | sort -r | head -n 1)

if [ -z "$LATEST_FILE" ]; then
    echo_err "No files found matching filters."
    exit 1
fi

echo_step "Found latest build: $LATEST_FILE"
mkdir -p "$TEMP_DIR"
cd "$TEMP_DIR"

echo_step "Downloading..."
curl -O "$BUCKET_URL/$LATEST_FILE"

if [ -d "$INSTALL_DIR" ]; then
    echo_step "Removing old ROCm install..."
    rm -rf "$INSTALL_DIR"
fi
mkdir -p "$INSTALL_DIR"

echo_step "Extracting to $INSTALL_DIR..."
tar -xvf "$LATEST_FILE" -C "$INSTALL_DIR" --strip-components=1 > /dev/null
rm -rf "$TEMP_DIR"

echo_step "Done! ROCm updated."

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 2, 2026

I'm getting this:

ImportError: /home/goniz/Work/mlx/python/mlx/lib/libmlx.so: undefined symbol: _ZN3mlx4core11Convolution8eval_gpuERKSt6vectorINS0_5arrayESaIS3_EERS3_

I forgot to test the Python build my bad, can you try it now?

Unfortunately I might not be able to help after it compiles, I don't have an AMD GPU to run tests😔 I've tried replicating most things from cuda, so hopefully it works

Do you know what we have to do in order to get this merged into MLX?

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 2, 2026

@zcbenz Thank you for helping me on the other issue regarding float8 I was hoping you could lend me a hand here on this PR and let me know if there is anyone at MLX that can help give us guidance on getting this PR to land in main. Whatever is needed let me know and i'll do what I can. If need be I'll maintain the ROCm backend.

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 3, 2026

I installed Cmake 4.3.1 and ran rocm-7.12.sh with a filter of FILTER_GFX="gfx110X" in a clean container of the rocm/dev-ubuntu-22.04 image

Then make -j$(nproc) fails (after a successful run of cmake) with the following error:

# make -j$(nproc)
[  1%] Built target cpu_compiled_preamble
[  1%] Compiling HIP source /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip
[  1%] Built target mlx_version
[  3%] Built target gguflib
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
In file included from /path/to/mlx-lm-rocm-support/mlx/stream.h:8:
/path/to/mlx-lm-rocm-support/mlx/device.h:28:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   28 |   bool operator==(const Device&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
/path/to/mlx-lm-rocm-support/mlx/stream.h:18:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   18 |   bool operator==(const Stream&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:15:
In file included from /opt/rocm/include/thrust/execution_policy.h:24:
In file included from /opt/rocm/include/thrust/detail/config.h:22:
In file included from /opt/rocm/include/thrust/detail/config/config.h:23:
In file included from /opt/rocm/include/thrust/detail/config/libcxx.h:44:
In file included from /opt/rocm/include/cuda/std/version:31:
In file included from /opt/rocm/include/cuda/std/detail/__config:67:
/opt/rocm/include/cuda/std/detail/libcxx/include/__config:371:12: warning: Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly! [-W#warnings]
  371 | #          warning Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly!
      |            ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:14:25: error: redefinition of 'operator new'
   14 | __device__ inline void* operator new(size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:95:25: note: previous definition is here
   95 | __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:15:25: error: redefinition of 'operator new[]'
   15 | __device__ inline void* operator new[](size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:98:25: note: previous definition is here
   98 | __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:16:24: error: redefinition of 'operator delete'
   16 | __device__ inline void operator delete(void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:102:24: note: previous definition is here
  102 | __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:17:24: error: redefinition of 'operator delete[]'
   17 | __device__ inline void operator delete[](void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:103:24: note: previous definition is here
  103 | __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:49:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<bool>>' requested here
   49 |     MLX_INTERNAL_DTYPE_SWITCH_CASE(bool_, bool);
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<signed char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:21:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   21 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int8, int8_t);     \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:22:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   22 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int16, int16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:23:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   23 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int32, int32_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:24:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   24 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int64, int64_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:25:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   25 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint8, uint8_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:26:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   26 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint16, uint16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:27:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   27 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint32, uint32_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:28:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   28 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint64, uint64_t)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_Float16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:31:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   31 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float16, float16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_BFloat16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:32:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   32 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(bfloat16, bfloat16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<float>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:33:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   33 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float32, float);       \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<double>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:34:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   34 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float64, double)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 warnings and 4 errors generated when compiling for gfx1103.
failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx1103 -O3  -c -x hip /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip -o "/path/to/mlx-lm-rocm-support/build/mlx/backend/rocm/hip_objs/sort.o" -fPIC -DMLX_USE_ROCM -I/path/to/mlx-lm-rocm-support -I/opt/rocm/include -I/usr/include/c++/11 -I/usr/include/x86_64-linux-gnu/c++/11 -I/usr/include/c++/11/backward -I/usr/include/x86_64-linux-gnu -I/usr/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -I/opt/rocm/include -std=c++17 -parallel-jobs=8
make[2]: *** [mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/build.make:254: mlx/backend/rocm/hip_objs/sort.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1549: mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Geramy added 2 commits April 3, 2026 17:55
ROCm 7.12 (clang 22) provides __device__ placement new/delete
via cuda_wrappers/new, causing redefinition errors. Guard with
__CLANG_CUDA_WRAPPERS_NEW so the manual definitions are only
compiled on older ROCm versions that lack them.
@Geramy
Copy link
Copy Markdown

Geramy commented Apr 4, 2026

I installed Cmake 4.3.1 and ran rocm-7.12.sh with a filter of FILTER_GFX="gfx110X" in a clean container of the rocm/dev-ubuntu-22.04 image

Then make -j$(nproc) fails (after a successful run of cmake) with the following error:

# make -j$(nproc)
[  1%] Built target cpu_compiled_preamble
[  1%] Compiling HIP source /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip
[  1%] Built target mlx_version
[  3%] Built target gguflib
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
In file included from /path/to/mlx-lm-rocm-support/mlx/stream.h:8:
/path/to/mlx-lm-rocm-support/mlx/device.h:28:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   28 |   bool operator==(const Device&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
/path/to/mlx-lm-rocm-support/mlx/stream.h:18:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   18 |   bool operator==(const Stream&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:15:
In file included from /opt/rocm/include/thrust/execution_policy.h:24:
In file included from /opt/rocm/include/thrust/detail/config.h:22:
In file included from /opt/rocm/include/thrust/detail/config/config.h:23:
In file included from /opt/rocm/include/thrust/detail/config/libcxx.h:44:
In file included from /opt/rocm/include/cuda/std/version:31:
In file included from /opt/rocm/include/cuda/std/detail/__config:67:
/opt/rocm/include/cuda/std/detail/libcxx/include/__config:371:12: warning: Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly! [-W#warnings]
  371 | #          warning Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly!
      |            ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:14:25: error: redefinition of 'operator new'
   14 | __device__ inline void* operator new(size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:95:25: note: previous definition is here
   95 | __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:15:25: error: redefinition of 'operator new[]'
   15 | __device__ inline void* operator new[](size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:98:25: note: previous definition is here
   98 | __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:16:24: error: redefinition of 'operator delete'
   16 | __device__ inline void operator delete(void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:102:24: note: previous definition is here
  102 | __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:17:24: error: redefinition of 'operator delete[]'
   17 | __device__ inline void operator delete[](void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:103:24: note: previous definition is here
  103 | __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:49:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<bool>>' requested here
   49 |     MLX_INTERNAL_DTYPE_SWITCH_CASE(bool_, bool);
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<signed char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:21:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   21 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int8, int8_t);     \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:22:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   22 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int16, int16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:23:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   23 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int32, int32_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:24:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   24 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int64, int64_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:25:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   25 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint8, uint8_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:26:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   26 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint16, uint16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:27:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   27 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint32, uint32_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:28:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   28 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint64, uint64_t)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_Float16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:31:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   31 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float16, float16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_BFloat16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:32:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   32 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(bfloat16, bfloat16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<float>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:33:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   33 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float32, float);       \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<double>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:34:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   34 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float64, double)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 warnings and 4 errors generated when compiling for gfx1103.
failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx1103 -O3  -c -x hip /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip -o "/path/to/mlx-lm-rocm-support/build/mlx/backend/rocm/hip_objs/sort.o" -fPIC -DMLX_USE_ROCM -I/path/to/mlx-lm-rocm-support -I/opt/rocm/include -I/usr/include/c++/11 -I/usr/include/x86_64-linux-gnu/c++/11 -I/usr/include/c++/11/backward -I/usr/include/x86_64-linux-gnu -I/usr/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -I/opt/rocm/include -std=c++17 -parallel-jobs=8
make[2]: *** [mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/build.make:254: mlx/backend/rocm/hip_objs/sort.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1549: mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

I installed Cmake 4.3.1 and ran rocm-7.12.sh with a filter of FILTER_GFX="gfx110X" in a clean container of the rocm/dev-ubuntu-22.04 image

Then make -j$(nproc) fails (after a successful run of cmake) with the following error:

# make -j$(nproc)
[  1%] Built target cpu_compiled_preamble
[  1%] Compiling HIP source /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip
[  1%] Built target mlx_version
[  3%] Built target gguflib
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
In file included from /path/to/mlx-lm-rocm-support/mlx/stream.h:8:
/path/to/mlx-lm-rocm-support/mlx/device.h:28:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   28 |   bool operator==(const Device&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
/path/to/mlx-lm-rocm-support/mlx/stream.h:18:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   18 |   bool operator==(const Stream&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:15:
In file included from /opt/rocm/include/thrust/execution_policy.h:24:
In file included from /opt/rocm/include/thrust/detail/config.h:22:
In file included from /opt/rocm/include/thrust/detail/config/config.h:23:
In file included from /opt/rocm/include/thrust/detail/config/libcxx.h:44:
In file included from /opt/rocm/include/cuda/std/version:31:
In file included from /opt/rocm/include/cuda/std/detail/__config:67:
/opt/rocm/include/cuda/std/detail/libcxx/include/__config:371:12: warning: Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly! [-W#warnings]
  371 | #          warning Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly!
      |            ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:14:25: error: redefinition of 'operator new'
   14 | __device__ inline void* operator new(size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:95:25: note: previous definition is here
   95 | __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:15:25: error: redefinition of 'operator new[]'
   15 | __device__ inline void* operator new[](size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:98:25: note: previous definition is here
   98 | __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:16:24: error: redefinition of 'operator delete'
   16 | __device__ inline void operator delete(void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:102:24: note: previous definition is here
  102 | __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:17:24: error: redefinition of 'operator delete[]'
   17 | __device__ inline void operator delete[](void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:103:24: note: previous definition is here
  103 | __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:49:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<bool>>' requested here
   49 |     MLX_INTERNAL_DTYPE_SWITCH_CASE(bool_, bool);
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<signed char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:21:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   21 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int8, int8_t);     \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:22:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   22 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int16, int16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:23:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   23 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int32, int32_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:24:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   24 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int64, int64_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:25:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   25 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint8, uint8_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:26:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   26 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint16, uint16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:27:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   27 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint32, uint32_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:28:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   28 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint64, uint64_t)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_Float16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:31:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   31 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float16, float16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_BFloat16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:32:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   32 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(bfloat16, bfloat16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<float>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:33:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   33 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float32, float);       \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<double>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:34:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   34 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float64, double)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 warnings and 4 errors generated when compiling for gfx1103.
failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx1103 -O3  -c -x hip /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip -o "/path/to/mlx-lm-rocm-support/build/mlx/backend/rocm/hip_objs/sort.o" -fPIC -DMLX_USE_ROCM -I/path/to/mlx-lm-rocm-support -I/opt/rocm/include -I/usr/include/c++/11 -I/usr/include/x86_64-linux-gnu/c++/11 -I/usr/include/c++/11/backward -I/usr/include/x86_64-linux-gnu -I/usr/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -I/opt/rocm/include -std=c++17 -parallel-jobs=8
make[2]: *** [mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/build.make:254: mlx/backend/rocm/hip_objs/sort.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1549: mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Please try the newest version, I have added a guard which should help.

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 4, 2026

I completed the build:

[..snip..]
[ 98%] Built target linear_regression
[ 98%] Built target metal_capture
[ 98%] Built target test_teardown
[ 98%] Built target logistic_regression
[ 98%] Built target tutorial
[ 98%] Built target distributed
[100%] Linking CXX executable tests
[100%] Built target tests
root@pop-os:/path/to/mlx-lm-rocm-support/build# python
bash: python: command not found
root@pop-os:/path/to/mlx-lm-rocm-support/build# curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env   # or: export PATH="$HOME/.local/bin:$PATH"
downloading uv 0.11.3 x86_64-unknown-linux-gnu
installing to /root/.local/bin
  uv
  uvx
everything's installed!

To add $HOME/.local/bin to your PATH, either restart your shell or run:

    source $HOME/.local/bin/env (sh, bash, zsh)
    source $HOME/.local/bin/env.fish (fish)
root@pop-os:/path/to/mlx-lm-rocm-support/build# uv --version
uv 0.11.3 (x86_64-unknown-linux-gnu)
root@pop-os:/path/to/mlx-lm-rocm-support/build# uv venv /opt/mlx-env --python 3.12
source /opt/mlx-env/bin/activate
Using CPython 3.12.13
Creating virtual environment at: /opt/mlx-env
Activate with: source /opt/mlx-env/bin/activate

But failed to import mlx_lm or run mlx_lm.chat :

# uv run --active ipython
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.12`.
Python 3.12.13 (main, Mar 24 2026, 22:49:22) [Clang 22.1.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.12.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `F2` or %edit with no arguments to open an empty editor with a temporary file.

In [1]: import mlx_lm
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 1
----> 1 import mlx_lm

File /path/to/mlx_lm/__init__.py:9
      5 from ._version import __version__
      7 os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
----> 9 from .convert import convert
     10 from .generate import batch_generate, generate, stream_generate
     11 from .utils import load

File /path/to/mlx_lm/convert.py:7
      4 from pathlib import Path
      5 from typing import Callable, Optional, Union
----> 7 import mlx.core as mx
      8 import mlx.nn as nn
      9 from mlx.utils import tree_map_with_path

ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi

In [2]: exit
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# CMAKE_BUILD_PARALLEL_LEVEL=$(nproc)   uv pip install -e ".[dev]" --no-build-isolation
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 42 packages in 299ms
      Built mlx @ /path/to//mlx-lm-rocm-support
Prepared 1 package in 1m 43s
Uninstalled 1 package in 0.44ms
Installed 1 package in 2ms
 ~ mlx==0.31.2.dev20260404+a866ff4f (from file:///path/to/mlx-lm-rocm-support)
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# pushd
/path/to/mlx-lm /path/to/mlx-lm-rocm-support
(mlx-env) root@pop-os:/path/to/mlx-lm# git pull; CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -U .  
Already up to date.
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 32 packages in 791ms
      Built mlx-lm @ file:///path/to/mlx-lm
Prepared 1 package in 392ms
Uninstalled 1 package in 1ms
Installed 1 package in 10ms
 ~ mlx-lm==0.31.2 (from file:///path/to/mlx-lm)
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.
mlx_lm.awq            mlx_lm.chat           mlx_lm.dynamic_quant  mlx_lm.generate       mlx_lm.manage         mlx_lm.share          
mlx_lm.benchmark      mlx_lm.convert        mlx_lm.evaluate       mlx_lm.gptq           mlx_lm.perplexity     mlx_lm.upload         
mlx_lm.cache_prompt   mlx_lm.dwq            mlx_lm.fuse           mlx_lm.lora           mlx_lm.server         
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.chat --help
Traceback (most recent call last):
  File "/opt/mlx-env/bin/mlx_lm.chat", line 4, in <module>
    from mlx_lm.chat import main
  File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/__init__.py", line 9, in <module>
    from .convert import convert
  File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/convert.py", line 7, in <module>
    import mlx.core as mx
ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi
(mlx-env) root@pop-os:/path/to/mlx-lm# 

When MLX_BUILD_ROCM=OFF, the Python binding unconditionally references
mlx::core::fast::hip_kernel but no_rocm.cpp only stubbed
rocm::is_available(). Add a throwing stub matching the pattern used by
no_metal.cpp and no_cuda.cpp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Geramy
Copy link
Copy Markdown

Geramy commented Apr 4, 2026

# uv run --active ipython
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.12`.
Python 3.12.13 (main, Mar 24 2026, 22:49:22) [Clang 22.1.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.12.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `F2` or %edit with no arguments to open an empty editor with a temporary file.

In [1]: import mlx_lm
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 1
----> 1 import mlx_lm

File /path/to/mlx_lm/__init__.py:9
      5 from ._version import __version__
      7 os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
----> 9 from .convert import convert
     10 from .generate import batch_generate, generate, stream_generate
     11 from .utils import load

File /path/to/mlx_lm/convert.py:7
      4 from pathlib import Path
      5 from typing import Callable, Optional, Union
----> 7 import mlx.core as mx
      8 import mlx.nn as nn
      9 from mlx.utils import tree_map_with_path

ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi

In [2]: exit
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# CMAKE_BUILD_PARALLEL_LEVEL=$(nproc)   uv pip install -e ".[dev]" --no-build-isolation
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 42 packages in 299ms
      Built mlx @ /path/to//mlx-lm-rocm-support
Prepared 1 package in 1m 43s
Uninstalled 1 package in 0.44ms
Installed 1 package in 2ms
 ~ mlx==0.31.2.dev20260404+a866ff4f (from file:///path/to/mlx-lm-rocm-support)
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# pushd
/path/to/mlx-lm /path/to/mlx-lm-rocm-support
(mlx-env) root@pop-os:/path/to/mlx-lm# git pull; CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -U .  
Already up to date.
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 32 packages in 791ms
      Built mlx-lm @ file:///path/to/mlx-lm
Prepared 1 package in 392ms
Uninstalled 1 package in 1ms
Installed 1 package in 10ms
 ~ mlx-lm==0.31.2 (from file:///path/to/mlx-lm)
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.
mlx_lm.awq            mlx_lm.chat           mlx_lm.dynamic_quant  mlx_lm.generate       mlx_lm.manage         mlx_lm.share          
mlx_lm.benchmark      mlx_lm.convert        mlx_lm.evaluate       mlx_lm.gptq           mlx_lm.perplexity     mlx_lm.upload         
mlx_lm.cache_prompt   mlx_lm.dwq            mlx_lm.fuse           mlx_lm.lora           mlx_lm.server         
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.chat --help
Traceback (most recent call last):
  File "/opt/mlx-env/bin/mlx_lm.chat", line 4, in <module>
    from mlx_lm.chat import main
  File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/__init__.py", line 9, in <module>
    from .convert import convert
  File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/convert.py", line 7, in <module>
    import mlx.core as mx
ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi
(mlx-env) root@pop-os:/path/to/mlx-lm# 

please try again we where missing no-rocm stubs.

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 4, 2026

@chimezie Please rebuild with these instructions.

Building mlx for ROCm

Prerequisites

ROCm packages installed (via apt or your distro's package manager):

  • hip-dev, rocblas-dev, rocthrust-dev, rocprim-dev, hiprand-dev, rocwmma-dev
  • hiprtc-dev, hipblaslt-dev (for JIT and optimized GEMM)
  • OpenBLAS: libopenblas-dev

Build command

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DCMAKE_HIP_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo"

CMAKE_BUILD_PARALLEL_LEVEL=$(nproc)
uv pip install -e ".[dev]" --no-build-isolation

Architecture targeting

Replace gfx1150 with your GPU's architecture. Common values:

  │ Family  │            Architecture            │                    GPU                     │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ CDNA    │ gfx908                             │ MI100                                      │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ CDNA2   │ gfx90a                             │ MI200                                      │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ CDNA3   │ gfx942                             │ MI300                                      │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ RDNA1   │ gfx1010, gfx1011, gfx1012          │ RX 5000 series                             │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ RDNA2   │ gfx1030, gfx1031, gfx1032          │ RX 6000 series                             │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ RDNA3   │ gfx1100, gfx1101, gfx1102, gfx1103 │ RX 7900 / 7600 / iGPU (Phoenix/Hawk Point) │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ RDNA3.5 │ gfx1150, gfx1151, gfx1152          │ Ryzen AI / Radeon 8060S                    │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ RDNA4   │ gfx1200, gfx1201                   │ RX 9000 series                             │
  └─────────┴────────────────────────────────────┴────────────────────────────────────────────┘

Omit -DCMAKE_HIP_ARCHITECTURES=... to build for all supported architectures (slow — compiles 17 targets).

The CI does it this way

From .github/workflows/build_rocm.yml:

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo"

The key flag you were missing was -DMLX_BUILD_ROCM=ON inside CMAKE_ARGS. Without it, setup.py passes the default
MLX_BUILD_ROCM=OFF to cmake and you get the no_rocm.cpp stub instead of the real backend.

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 4, 2026

# git pull
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 6 (delta 4), reused 6 (delta 4), pack-reused 0 (from 0)
Unpacking objects: 100% (6/6), 933 bytes | 466.00 KiB/s, done.
From https://github.com/NripeshN/mlx
   a866ff4f..71d03e59  rocm-support -> origin/rocm-support
Updating a866ff4f..71d03e59
Fast-forward
 mlx/backend/rocm/no_rocm.cpp | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)
# git pull; CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -U .
Already up to date.
Using Python 3.11.15 environment at: /opt/mlx-env
Resolved 1 package in 891ms
      Built mlx @ file:///path/to/mlx-lm-rocm-support
Prepared 1 package in 35.10s
Uninstalled 1 package in 7ms
Installed 1 package in 7ms
 - mlx==0.31.2.dev20260404+a866ff4f (from file:///path/to/mlx-lm-rocm-support)
 + mlx==0.31.2.dev20260404+71d03e59 (from file:///path/to/mlx-lm-rocm-support)
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# uv run --active ipython
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.11`.
Python 3.11.15 (main, Mar 24 2026, 22:50:29) [Clang 22.1.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.10.1 -- An enhanced Interactive Python. Type '?' for help.
Tip: IPython supports combining unicode identifiers, eg F\vec<tab> will become F⃗, useful for physics equations. Play with \dot \ddot and others.

In [1]: import mlx_lm

In [2]: exit
# mlx_lm.chat --model mlx-community/Qwen3-1.7B-4bit 
Downloading (incomplete total...): 0.00B [00:00, ?B/s]                                                                                                                                       Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.                                | 0/9 [00:00<?, ?it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:09<00:00,  1.04s/it]
Download complete: : 984MB [00:09, 105MB/s]              ██████████████████████████▉                                                                            | 4/9 [00:09<00:13,  2.75s/it]
[INFO] Starting chat session with mlx-community/Qwen3-1.7B-4bit.
The command list:
- 'q' to exit
- 'r' to reset the chat
- 'h' to display these commands
>> What is JSON?
<think>
Okay, so the user is asking, "What is JSON?" Let me start by breaking this down. JSON stands for JavaScript Object Notation. I remember that JSON is a data format that's used to represent data in a structured

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ROCm Support for AMD GPUs

9 participants