KI Sub-groups#668
Conversation
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
6852410 to
84730d2
Compare
daea025 to
6343fd2
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #668 +/- ##
==========================================
+ Coverage 62.51% 62.73% +0.22%
==========================================
Files 23 23
Lines 1926 1967 +41
==========================================
+ Hits 1204 1234 +30
- Misses 722 733 +11 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
6343fd2 to
a86cc8c
Compare
a86cc8c to
6f4a517
Compare
f0a545c to
8858825
Compare
7ddba7e to
8de2c42
Compare
8de2c42 to
82d32db
Compare
409a1e7 to
08a8130
Compare
b1d526f to
2f02082
Compare
This comment was marked as resolved.
This comment was marked as resolved.
FWIW, I fixed that, and pocl_jll should include the patch merged upstream to make |
I never got around to removing setting the subgroup size by default so no changes necessary. Also, I bumped the required pocl_standalone_jll to 7.1.3, could/should we yank 7.1.3+0 to guarantee no segfaults from libpocl mismatches? |
…overrides Adds KI.sub_group_ballot(pred::Bool) → UInt32: returns a bitmask with bit (lane-1) set for every lane where pred is true. CPU fallback returns UInt32(pred). SPIR-V/OpenCL has no ballot intrinsic so no POCL override is added. Also adds ext/CUDAExt.jl, the KA extension for CUDA, providing @device_override implementations of all sub-group intrinsics from JuliaGPU#668: - sub_group_size(::CUDABackend), shfl_down_types(::CUDABackend) [host-side] - get_sub_group_{size,max_size,local_id,id,num}() [device-side] - sub_group_barrier(), shfl_down(), sub_group_ballot() [device-side] Builds on top of JuliaGPU#668 (KI Sub-groups). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Anton Smirnov <tonysmn97@gmail.com>
Includes #682