Conversation
|
What an unexpected and amazing surprise! I'm absolutely thrilled. |
|
@awni |
|
I think this is good to stay as an experiment branch for some time while we work on core and CUDA. I don't think we have the bandwidth to merge this for a few months at least. Sorry if this is disappointing @NripeshN I don't mean to discourage you working on it. |
|
I would love to see the ROCm backend get more traction. The new AI series of processors by AMD have a similar advantage to Apple Silicon with unified memory and getting MLX to run on those processors would be neat. |
|
Stole my idea :( |
|
How is this even possible for such an awesome PR to be left like this? |
There was a problem hiding this comment.
Pull request overview
This PR adds experimental ROCm backend support to MLX, enabling execution on AMD GPUs. The implementation mirrors the CUDA backend structure, providing HIP-based implementations of core operations, memory management, and device handling.
Changes:
- Added ROCm backend infrastructure with device management, memory allocation, and stream handling
- Implemented HIP kernels for unary, binary, ternary operations, reductions, normalization (softmax, layer_norm, rms_norm), RoPE, and sorting
- Updated build system (CMake) to support ROCm compilation with configurable GPU architectures
Reviewed changes
Copilot reviewed 59 out of 59 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| CMakeLists.txt | Added MLX_BUILD_ROCM option and ROCm library detection |
| mlx/CMakeLists.txt | Integrated ROCm backend build configuration |
| mlx/device.cpp | Added ROCm device availability checks |
| mlx/backend/rocm/*.hip | HIP kernel implementations for various operations |
| mlx/backend/rocm/device.* | ROCm device and stream management |
| mlx/backend/rocm/allocator.* | ROCm-specific memory allocator using HIP unified memory |
| mlx/backend/rocm/worker.* | Async task execution worker for stream synchronization |
| mlx/backend/rocm/utils.* | HIP utility functions and error handling |
| mlx/backend/rocm/jit_module.* | JIT compilation support using HIPRTC |
| mlx/backend/rocm/device/*.hpp | Device-side utility functions and type definitions |
| mlx/backend/rocm/CMakeLists.txt | ROCm backend build configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ather, scatter, logsumexp, random bits generation, and sorting. Introduce new kernels for efficient computation and integrate with existing ROCm utilities. Update CMake configuration to include new source files and dependencies. Enhance error handling and ensure compatibility with different data types. This commit significantly expands the functionality of the ROCm backend.
|
👑👑👑 |
|
Can anyone run CMAKE_ARGS="-DMLX_BUILD_ROCM=ON" pip install -e .
CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES={based on your GPU}" pip install -e .Replace {based on your GPU} with your GPU architecture You can run rocm-smito get your GPU information |
|
I'm getting this CMake error: Running on Strix Halo (gfx1151) |
Could you retry with the latest push please (p.s. keep your fingers crossed while it compiles, worked for me 138th time)😅 |
… string formatting, replacing fmt library usage. Remove unused event.cpp file. Update kernel name generation and parameter formatting for consistency.
Now what can I test? 😍 |
|
I'm getting this: |
I forgot to test the Python build my bad, can you try it now? Unfortunately I might not be able to help after it compiles, I don't have an AMD GPU to run tests😔 I've tried replicating most things from cuda, so hopefully it works |
|
@NripeshN if you are interested this is a very interesting read. |
Replace SmallSizePool with a generalized SlabAllocator containing 18 power-of-2 size class pools (8B through 1MB). Each pool pre-allocates slab pages and sub-allocates via O(1) free-list operations, eliminating hipExtMallocWithFlags calls for small/medium allocations during decode. - SizeClassPool: configurable block size, grow-on-demand slab pages - SlabAllocator: O(1) size-class dispatch via bit ops - Pre-allocates tiers 0-11 (8B-16KB) at startup (~5.8MB) - Applies hipMemAdvise/hipMemPrefetchAsync on slab pages - BufferCache still handles >1MB allocations
GEMM tuning: Request 8 algorithms from hipBLASLt heuristic and benchmark each on first call per (M,N,K) shape. Cache the winner for subsequent calls. Finds lower-VGPR kernels for better CU occupancy. Copy reduction: Replace hipMemcpyAsync-based shape/stride passing in copy_general and copy_general_input with by-value hip_array kernel arguments. Eliminates 3 HIP API calls per general copy dispatch. Results (Qwen3.5-35B-A3B-4bit): - hipMemcpyAsync: 964 -> 77 (-92%) - Gen tok/s: 25.1 -> 26.6 (+6%) - Short gen: 21 -> 46 tok/s (+120%)
On integrated GPUs with fine-grained coherent unified memory, hipStreamSynchronize is unnecessary when the stream has no pending work. Use hipStreamQuery (non-blocking) to check first, only sync when the stream is actually busy. Results: hipStreamSynchronize calls reduced from 5683 to 54 (-99%).
Auto-tuning benchmarks 8 GEMM algorithms per shape on first call, adding ~200ms startup overhead. For quantized models the regular GEMM path is rarely used, so the overhead is wasted. Disable by default; enable with MLX_ROCM_HIPBLASLT_TUNE=1 for non-quantized. Warm prompt restored: Qwen3-8B 1092 tok/s, Qwen3.5-35B 795 tok/s.
hipEventSynchronize with hipEventBlockingSync causes CPU-GPU contention on integrated GPUs where they share compute resources. Replace with a hipEventQuery spin-loop that yields the thread between polls. Also remove hipEventBlockingSync flag from CopyableHipEvent creation to prevent kernel-level blocking waits. Results (100 tokens, Qwen3.5-35B): - hipEventSynchronize: 100 -> 0 (eliminated) - Gen tok/s: 22.7 -> 25.5 (+12%)
Process 16 output columns per block instead of 8, so adjacent weight rows share the same K-range in L2 cache. All 16 warps in the block iterate through the same K-tiles simultaneously, keeping weight data hot in L2 across columns. Previous kernel: 8 columns/block, each warp streams full K independently. L2 hit rate ~10% because weights evicted before reuse. New kernel: 16 columns/block, weight tiles stay in L2 for 16x reuse. Expected L2 hit rate improvement: 10% -> 40-70%. Results: - Qwen3-8B gen: 14.1 -> 23.6 tok/s (+67%) - Qwen3.5-35B gen: 26.5 -> 30.8 tok/s (+16%)
I had to do this for it to compile on my machine (Strix halo, Omarchy, ROCm 7.2): diff --git i/mlx/backend/rocm/sort.hip w/mlx/backend/rocm/sort.hip
index 2f00ea9a0..c0d228feb 100644
--- i/mlx/backend/rocm/sort.hip
+++ w/mlx/backend/rocm/sort.hip
@@ -7,16 +7,6 @@
#include "mlx/primitives.h"
#include <hip/hip_runtime.h>
-
-// Workaround: rocprim headers use placement new in __device__ code,
-// which requires __device__ overloads of operator new/delete.
-#ifdef __HIP_DEVICE_COMPILE__
-__device__ inline void* operator new(size_t, void* p) noexcept { return p; }
-__device__ inline void* operator new[](size_t, void* p) noexcept { return p; }
-__device__ inline void operator delete(void*, void*) noexcept {}
-__device__ inline void operator delete[](void*, void*) noexcept {}
-#endif
-
#include <rocprim/rocprim.hpp>
#include <cassert>
#include <limits>But it works |
|
@angeloskath how much longer before you think this could get merged, of course we are going to need a review and I assume that will come with some required changes. But it would be nice to see if we could start getting some feedback to plan to land this soon. Things are operating nicely actually. |
I'll have this fixed I primarily run 7.12 |
|
Here is my attempt to build it on Pop!_OS 22.04 for use with my gfx1103 card After installing CMake 4.3.1 from source (because a later version than 3.22.1 installed via apt was needed for this build)$ which cmake; /usr/local/bin/cmake --version
/usr/local/bin/cmake
cmake version 4.3.1
CMake suite maintained and supported by Kitware (kitware.com/cmake).Then, when I tried to build from with: $ mkdir build && cd build
$ /usr/local/bin/cmake -DGGML_BLAS=ON \
-DGGML_BLAS_VENDOR=OpenBLAS \
-DMLX_BUILD_ROCM=ON \
-DCMAKE_PREFIX_PATH=/opt/rocm \
-DCMAKE_HIP_ARCHITECTURES="gfx1103" ..I got this -- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Error at /usr/local/share/cmake-4.3/Modules/FindPackageHandleStandardArgs.cmake:290 (message):
Could NOT find BLAS (missing: BLAS_LIBRARIES)
Call Stack (most recent call first):
/usr/local/share/cmake-4.3/Modules/FindPackageHandleStandardArgs.cmake:654 (_FPHSA_FAILURE_MESSAGE)
/usr/local/share/cmake-4.3/Modules/FindBLAS.cmake:1509 (find_package_handle_standard_args)
/usr/local/share/cmake-4.3/Modules/FindLAPACK.cmake:275 (find_package)
/usr/local/share/cmake-4.3/Modules/FindLAPACK.cmake:321 (_lapack_find_dependency)
CMakeLists.txt:326 (find_package)After installing OpenBLAS via: sudo apt install libopenblas-devI tried again: |
You need to install rocm 7.2 at least I believe.Thats probably your primary problem and secondary is to specify the HIP_PLATFORM=amd I believe. |
…, SliceUpdate donation Arch detection and tuning: - RocmArchTier enum: RDNA 2/3/3.5/4/CDNA with fine-grained gfx detection - HWInfo struct: CU count, SIMDs, L2 size, WMMA capability from hipDeviceProp - ArchTuning: per-arch kernel parameters (QMV tile_n, crossover thresholds) - Runtime TILE_N for qmv_tiled_kernel via kernel argument (no template bloat) - MLX_ROCM_QMV_TILE_N env var for manual tuning WMMA flash attention: - flash_attention_wmma.hip: rocwmma 16x16x16 tiled kernel for bf16/fp16 - Dispatches for prefill (qL > 4) on supported head dims (64/128/256) - Integrated into ScaledDotProductAttention dispatch Arena allocator (DecodeArena): - Deterministic bump allocator for HIP Graph capture - Hooked into RocmAllocator malloc/free path - Proven: 18 KB per decode step with stable addresses SliceUpdate donation: - Skip base array copy when input has unique ownership (refcount==1) - Helps prefill path (200 donated during prompt processing) GPU memcpy: - mlx_gpu_memcpy_async (extern C) for direct KV cache writes - gpu_arena/gpu_graph wrapper functions for engine integration
…V weight loads custom_kernel.cpp: The hip_kernel lambda's 8th parameter was named `verbose` but the CustomKernelFunction typedef passes ensure_row_contiguous in that slot. When gated_delta.cpp called with ensure_row_contiguous=true, it triggered a cout dump of the full compiled kernel source — polluting stdout and appearing as model output on every MoE inference (Qwen3.5-35B, Coder-Next). qdequant.hpp: Add load_weight_vec<BITS>() helper that loads PPT uint32 words via a single wide vector load (uint2 for 4-bit, uint4 for 8-bit) instead of PPT scalar loads. qmv_tiled_kernel.hip: Use load_weight_vec in both qmv_tiled_kernel and gather_qmv_tiled_kernel with a warp-uniform branch to separate the vectorized fast path from the bounds-checked tail path.
|
After ensuring I have ROCm, I'm still unable to build (within a rocm/dev-ubuntu-22.04:7.1.1-complete container) after following these steps to ensure it is installed properly: # cmake -DGGML_BLAS=ON \
-DGGML_BLAS_VENDOR=OpenBLAS \
-DMLX_BUILD_ROCM=ON \
-DCMAKE_PREFIX_PATH=/opt/rocm \
-DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Lapack lib /usr/lib/x86_64-linux-gnu/libopenblas.so-lm-ldl
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from /home/chimezie/Projects/mlx-lm-rocm-support/build/_deps/json-src/include/
-- Downloading gguflib
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
Could not find a package configuration file provided by "rocwmma" with any
of the following names:
rocwmma.cps
rocwmmaConfig.cmake
rocwmma-config.cmake
Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
"rocwmma_DIR" to a directory containing one of the above files. If
"rocwmma" provides a separate development package or SDK, be sure it has
been installed.
-- Configuring incomplete, errors occurred! |
try this |
|
Still no joy: # sudo apt install libblas-dev liblapack-dev git rocwmma-dev
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libblas-dev is already the newest version (3.10.0-2ubuntu1).
liblapack-dev is already the newest version (3.10.0-2ubuntu1).
git is already the newest version (1:2.34.1-1ubuntu1.17).
rocwmma-dev is already the newest version (2.1.0.70101-38~22.04).
0 upgraded, 0 newly installed, 0 to remove and 57 not upgraded.# cmake -DGGML_BLAS=ON \
-DGGML_BLAS_VENDOR=OpenBLAS \
-DMLX_BUILD_ROCM=ON \
-DCMAKE_PREFIX_PATH=/opt/rocm \
-DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Lapack lib /usr/lib/x86_64-linux-gnu/liblapack.so/usr/lib/x86_64-linux-gnu/libblas.so
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from _deps/json-src/include/
-- Downloading gguflib
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
Could not find a package configuration file provided by "rocwmma" with any
of the following names:
rocwmma.cps
rocwmmaConfig.cmake
rocwmma-config.cmake
Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
"rocwmma_DIR" to a directory containing one of the above files. If
"rocwmma" provides a separate development package or SDK, be sure it has
been installed.
-- Configuring incomplete, errors occurred! |
Can you try this out?
Bash Bash sudo tee /etc/apt/preferences.d/rocm-pin-600 << EOF sudo apt update Bash Bash If you dont like this method you could install the rocm manually with my bash script which brings in the newest. 7.12 bleeding edge. FILTER_GFX=gfx1151 is set for me I have set it to gfx1103 for you already. |
Do you know what we have to do in order to get this merged into MLX? |
|
@zcbenz Thank you for helping me on the other issue regarding float8 I was hoping you could lend me a hand here on this PR and let me know if there is anyone at MLX that can help give us guidance on getting this PR to land in main. Whatever is needed let me know and i'll do what I can. If need be I'll maintain the ROCm backend. |
|
I installed Cmake 4.3.1 and ran Then # make -j$(nproc)
[ 1%] Built target cpu_compiled_preamble
[ 1%] Compiling HIP source /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip
[ 1%] Built target mlx_version
[ 3%] Built target gguflib
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
In file included from /path/to/mlx-lm-rocm-support/mlx/stream.h:8:
/path/to/mlx-lm-rocm-support/mlx/device.h:28:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
28 | bool operator==(const Device&) const = default;
| ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
/path/to/mlx-lm-rocm-support/mlx/stream.h:18:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
18 | bool operator==(const Stream&) const = default;
| ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:15:
In file included from /opt/rocm/include/thrust/execution_policy.h:24:
In file included from /opt/rocm/include/thrust/detail/config.h:22:
In file included from /opt/rocm/include/thrust/detail/config/config.h:23:
In file included from /opt/rocm/include/thrust/detail/config/libcxx.h:44:
In file included from /opt/rocm/include/cuda/std/version:31:
In file included from /opt/rocm/include/cuda/std/detail/__config:67:
/opt/rocm/include/cuda/std/detail/libcxx/include/__config:371:12: warning: Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly! [-W#warnings]
371 | # warning Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly!
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:14:25: error: redefinition of 'operator new'
14 | __device__ inline void* operator new(size_t, void* p) noexcept { return p; }
| ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:95:25: note: previous definition is here
95 | __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:15:25: error: redefinition of 'operator new[]'
15 | __device__ inline void* operator new[](size_t, void* p) noexcept { return p; }
| ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:98:25: note: previous definition is here
98 | __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:16:24: error: redefinition of 'operator delete'
16 | __device__ inline void operator delete(void*, void*) noexcept {}
| ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:102:24: note: previous definition is here
102 | __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:17:24: error: redefinition of 'operator delete[]'
17 | __device__ inline void operator delete[](void*, void*) noexcept {}
| ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:103:24: note: previous definition is here
103 | __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:49:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<bool>>' requested here
49 | MLX_INTERNAL_DTYPE_SWITCH_CASE(bool_, bool);
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<signed char>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:21:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
21 | MLX_INTERNAL_DTYPE_SWITCH_CASE(int8, int8_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<short>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:22:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
22 | MLX_INTERNAL_DTYPE_SWITCH_CASE(int16, int16_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<int>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:23:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
23 | MLX_INTERNAL_DTYPE_SWITCH_CASE(int32, int32_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<long>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:24:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
24 | MLX_INTERNAL_DTYPE_SWITCH_CASE(int64, int64_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned char>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:25:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
25 | MLX_INTERNAL_DTYPE_SWITCH_CASE(uint8, uint8_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned short>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:26:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
26 | MLX_INTERNAL_DTYPE_SWITCH_CASE(uint16, uint16_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned int>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:27:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
27 | MLX_INTERNAL_DTYPE_SWITCH_CASE(uint32, uint32_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned long>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:28:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
28 | MLX_INTERNAL_DTYPE_SWITCH_CASE(uint64, uint64_t)
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_Float16>>' requested here
51 | MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:31:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
31 | MLX_INTERNAL_DTYPE_SWITCH_CASE(float16, float16_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_BFloat16>>' requested here
51 | MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:32:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
32 | MLX_INTERNAL_DTYPE_SWITCH_CASE(bfloat16, bfloat16_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<float>>' requested here
51 | MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:33:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
33 | MLX_INTERNAL_DTYPE_SWITCH_CASE(float32, float); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<double>>' requested here
51 | MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:34:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
34 | MLX_INTERNAL_DTYPE_SWITCH_CASE(float64, double)
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 warnings and 4 errors generated when compiling for gfx1103.
failed to execute:/opt/rocm/lib/llvm/bin/clang++ --offload-arch=gfx1103 -O3 -c -x hip /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip -o "/path/to/mlx-lm-rocm-support/build/mlx/backend/rocm/hip_objs/sort.o" -fPIC -DMLX_USE_ROCM -I/path/to/mlx-lm-rocm-support -I/opt/rocm/include -I/usr/include/c++/11 -I/usr/include/x86_64-linux-gnu/c++/11 -I/usr/include/c++/11/backward -I/usr/include/x86_64-linux-gnu -I/usr/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -I/opt/rocm/include -std=c++17 -parallel-jobs=8
make[2]: *** [mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/build.make:254: mlx/backend/rocm/hip_objs/sort.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1549: mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/all] Error 2
make: *** [Makefile:146: all] Error 2 |
ROCm 7.12 (clang 22) provides __device__ placement new/delete via cuda_wrappers/new, causing redefinition errors. Guard with __CLANG_CUDA_WRAPPERS_NEW so the manual definitions are only compiled on older ROCm versions that lack them.
Please try the newest version, I have added a guard which should help. |
|
I completed the build: [..snip..]
[ 98%] Built target linear_regression
[ 98%] Built target metal_capture
[ 98%] Built target test_teardown
[ 98%] Built target logistic_regression
[ 98%] Built target tutorial
[ 98%] Built target distributed
[100%] Linking CXX executable tests
[100%] Built target tests
root@pop-os:/path/to/mlx-lm-rocm-support/build# python
bash: python: command not found
root@pop-os:/path/to/mlx-lm-rocm-support/build# curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env # or: export PATH="$HOME/.local/bin:$PATH"
downloading uv 0.11.3 x86_64-unknown-linux-gnu
installing to /root/.local/bin
uv
uvx
everything's installed!
To add $HOME/.local/bin to your PATH, either restart your shell or run:
source $HOME/.local/bin/env (sh, bash, zsh)
source $HOME/.local/bin/env.fish (fish)
root@pop-os:/path/to/mlx-lm-rocm-support/build# uv --version
uv 0.11.3 (x86_64-unknown-linux-gnu)
root@pop-os:/path/to/mlx-lm-rocm-support/build# uv venv /opt/mlx-env --python 3.12
source /opt/mlx-env/bin/activate
Using CPython 3.12.13
Creating virtual environment at: /opt/mlx-env
Activate with: source /opt/mlx-env/bin/activateBut failed to import # uv run --active ipython
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.12`.
Python 3.12.13 (main, Mar 24 2026, 22:49:22) [Clang 22.1.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.12.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `F2` or %edit with no arguments to open an empty editor with a temporary file.
In [1]: import mlx_lm
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import mlx_lm
File /path/to/mlx_lm/__init__.py:9
5 from ._version import __version__
7 os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
----> 9 from .convert import convert
10 from .generate import batch_generate, generate, stream_generate
11 from .utils import load
File /path/to/mlx_lm/convert.py:7
4 from pathlib import Path
5 from typing import Callable, Optional, Union
----> 7 import mlx.core as mx
8 import mlx.nn as nn
9 from mlx.utils import tree_map_with_path
ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi
In [2]: exit
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -e ".[dev]" --no-build-isolation
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 42 packages in 299ms
Built mlx @ /path/to//mlx-lm-rocm-support
Prepared 1 package in 1m 43s
Uninstalled 1 package in 0.44ms
Installed 1 package in 2ms
~ mlx==0.31.2.dev20260404+a866ff4f (from file:///path/to/mlx-lm-rocm-support)
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# pushd
/path/to/mlx-lm /path/to/mlx-lm-rocm-support
(mlx-env) root@pop-os:/path/to/mlx-lm# git pull; CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -U .
Already up to date.
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 32 packages in 791ms
Built mlx-lm @ file:///path/to/mlx-lm
Prepared 1 package in 392ms
Uninstalled 1 package in 1ms
Installed 1 package in 10ms
~ mlx-lm==0.31.2 (from file:///path/to/mlx-lm)
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.
mlx_lm.awq mlx_lm.chat mlx_lm.dynamic_quant mlx_lm.generate mlx_lm.manage mlx_lm.share
mlx_lm.benchmark mlx_lm.convert mlx_lm.evaluate mlx_lm.gptq mlx_lm.perplexity mlx_lm.upload
mlx_lm.cache_prompt mlx_lm.dwq mlx_lm.fuse mlx_lm.lora mlx_lm.server
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.chat --help
Traceback (most recent call last):
File "/opt/mlx-env/bin/mlx_lm.chat", line 4, in <module>
from mlx_lm.chat import main
File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/__init__.py", line 9, in <module>
from .convert import convert
File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/convert.py", line 7, in <module>
import mlx.core as mx
ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi
(mlx-env) root@pop-os:/path/to/mlx-lm# |
When MLX_BUILD_ROCM=OFF, the Python binding unconditionally references mlx::core::fast::hip_kernel but no_rocm.cpp only stubbed rocm::is_available(). Add a throwing stub matching the pattern used by no_metal.cpp and no_cuda.cpp. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
please try again we where missing no-rocm stubs. |
|
@chimezie Please rebuild with these instructions. Building mlx for ROCm Prerequisites ROCm packages installed (via apt or your distro's package manager):
Build command CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DCMAKE_HIP_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo" Architecture targeting Replace gfx1150 with your GPU's architecture. Common values: Omit -DCMAKE_HIP_ARCHITECTURES=... to build for all supported architectures (slow — compiles 17 targets). The CI does it this way From .github/workflows/build_rocm.yml: CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo" The key flag you were missing was -DMLX_BUILD_ROCM=ON inside CMAKE_ARGS. Without it, setup.py passes the default |
|
Experiment with ROCm backend.
install MLX with ROCm backend using:
closes #2556
Inspired by @zcbenz