Skip to content

Adapt to KernelIntrinsics/KA 0.10#868

Draft
christiangnrd wants to merge 7 commits into
JuliaGPU:mainfrom
christiangnrd:intrinsics
Draft

Adapt to KernelIntrinsics/KA 0.10#868
christiangnrd wants to merge 7 commits into
JuliaGPU:mainfrom
christiangnrd:intrinsics

Conversation

@christiangnrd

Copy link
Copy Markdown
Member

No description provided.

@christiangnrd

christiangnrd commented Dec 18, 2025

Copy link
Copy Markdown
Member Author

id for AMDGPU is used to identify whether we need to launch hostcalls, based on the specific id of the global variable. It is not needed for KA, so can be dropped as in https://github.com/JuliaGPU/AMDGPU.jl/pull/868/files#diff-082b94339c8f038178ee472ca9b6feec6f27f434c469138f168031f248f223f9R197.

(From JuliaGPU/KernelAbstractions.jl#666)

Is my changing static local code generation from LLVMExternalLinkage to LLVMInternalLinkage going to cause any unintended effects? I had to do so to prevent local memory allocations from being treated as the same one if there happen to be more than one in a kernel. Otherwise I think we’ll have to readd the id parameter.

@pxl-th

pxl-th commented Dec 19, 2025

Copy link
Copy Markdown
Member

Is my changing static local code generation from LLVMExternalLinkage to LLVMInternalLinkage going to cause any unintended effects?

I don't think. I think it is a remnant from pre-HIP times.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMDGPU.jl Benchmarks

Details
Benchmark suite Current: 2c7d7f3 Previous: 9bc35f1 Ratio
amdgpu/synchronization/context/device 610 ns 670 ns 0.91
amdgpu/synchronization/stream/blocking 250 ns 280 ns 0.89
amdgpu/synchronization/stream/nonblocking 330 ns 400 ns 0.82
array/accumulate/Float32/1d 85441 ns 87651 ns 0.97
array/accumulate/Float32/dims=1 246523 ns 290364 ns 0.85
array/accumulate/Float32/dims=1L 136122 ns 136182 ns 1.00
array/accumulate/Float32/dims=2 129102 ns 127982 ns 1.01
array/accumulate/Float32/dims=2L 2696728 ns 2812289 ns 0.96
array/accumulate/Int64/1d 97991 ns 98192 ns 1.00
array/accumulate/Int64/dims=1 327244 ns 243584 ns 1.34
array/accumulate/Int64/dims=1L 170392 ns 168293 ns 1.01
array/accumulate/Int64/dims=2 123442 ns 122592 ns 1.01
array/accumulate/Int64/dims=2L 2930642 ns 2987552 ns 0.98
array/broadcast 147992 ns 147162 ns 1.01
array/construct 1780 ns 1720 ns 1.03
array/copy 39681 ns 39821 ns 1.00
array/copyto!/cpu_to_gpu 115212 ns 115351 ns 1.00
array/copyto!/gpu_to_cpu 111522 ns 97151 ns 1.15
array/copyto!/gpu_to_gpu 131072 ns 131032 ns 1.00
array/iteration/findall/bool 186652 ns 183372 ns 1.02
array/iteration/findall/int 194493 ns 191073 ns 1.02
array/iteration/findfirst/bool 124372 ns 122112 ns 1.02
array/iteration/findfirst/int 116681 ns 116541 ns 1.00
array/iteration/findmin/1d 173792 ns 171772 ns 1.01
array/iteration/findmin/2d 159452 ns 156672 ns 1.02
array/iteration/logical 356025 ns 355875 ns 1.00
array/iteration/scalar 296214 ns 303814 ns 0.97
array/permutedims/2d 75961 ns 74731 ns 1.02
array/permutedims/3d 74371 ns 74861 ns 0.99
array/permutedims/4d 88041 ns 77161 ns 1.14
array/random/rand/Float32 54770 ns 55211 ns 0.99
array/random/rand/Int64 60271 ns 56591 ns 1.07
array/random/rand!/Float32 72341 ns 145892 ns 0.50
array/random/rand!/Int64 95311 ns 61691 ns 1.54
array/random/randn/Float32 94492 ns 87631 ns 1.08
array/random/randn!/Float32 109962 ns 108792 ns 1.01
array/reductions/mapreduce/Float32/1d 134172 ns 134592 ns 1.00
array/reductions/mapreduce/Float32/dims=1 95261 ns 95752 ns 0.99
array/reductions/mapreduce/Float32/dims=1L 776661 ns 776561 ns 1.00
array/reductions/mapreduce/Float32/dims=2 97412 ns 97761 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 298424 ns 296454 ns 1.01
array/reductions/mapreduce/Int64/1d 134762 ns 134982 ns 1.00
array/reductions/mapreduce/Int64/dims=1 95281 ns 95992 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 783101 ns 781911 ns 1.00
array/reductions/mapreduce/Int64/dims=2 97261 ns 97291 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 299704 ns 295774 ns 1.01
array/reductions/reduce/Float32/1d 133562 ns 134512 ns 0.99
array/reductions/reduce/Float32/dims=1 95531 ns 95801 ns 1.00
array/reductions/reduce/Float32/dims=1L 776211 ns 775531 ns 1.00
array/reductions/reduce/Float32/dims=2 97581 ns 97771 ns 1.00
array/reductions/reduce/Float32/dims=2L 295884 ns 302224 ns 0.98
array/reductions/reduce/Int64/1d 134322 ns 135202 ns 0.99
array/reductions/reduce/Int64/dims=1 95111 ns 95691 ns 0.99
array/reductions/reduce/Int64/dims=1L 783552 ns 778971 ns 1.01
array/reductions/reduce/Int64/dims=2 95641 ns 96941 ns 0.99
array/reductions/reduce/Int64/dims=2L 299214 ns 296404 ns 1.01
array/reverse/1d 43831 ns 43920 ns 1.00
array/reverse/1dL 75441 ns 75771 ns 1.00
array/reverse/1dL_inplace 158712 ns 110181 ns 1.44
array/reverse/1d_inplace 136472 ns 77711 ns 1.76
array/reverse/2d 51991 ns 51191 ns 1.02
array/reverse/2dL 102241 ns 102061 ns 1.00
array/reverse/2dL_inplace 179252 ns 178402 ns 1.00
array/reverse/2d_inplace 138762 ns 78972 ns 1.76
array/sorting/1d 343375 ns 342554 ns 1.00
integration/byval/reference 39830 ns 39581 ns 1.01
integration/byval/slices=1 40440 ns 40771 ns 0.99
integration/byval/slices=2 130942 ns 157862 ns 0.83
integration/byval/slices=3 238634 ns 239313 ns 1.00
integration/volumerhs 5043852 ns 5025691 ns 1.00
kernel/indexing 57961 ns 131032 ns 0.44
kernel/indexing_checked 131002 ns 132972 ns 0.99
kernel/launch 1300 ns 1310 ns 0.99
kernel/rand 165183 ns 123892 ns 1.33
latency/import 1612663271 ns 1566079357 ns 1.03
latency/precompile 40134605517 ns 36502576967 ns 1.10
latency/ttfp 2194480087 ns 2134396602 ns 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@christiangnrd christiangnrd changed the title Adapt to KernelIntrinsics Adapt to KernelIntrinsics/KA 0.10 Jun 19, 2026
@luraess

luraess commented Jun 22, 2026

Copy link
Copy Markdown
Member

@christiangnrd I just merged #932 so feel free to transfer #933 to this one and close it.

@christiangnrd

christiangnrd commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

I don't have permission to close issues or PRs on this repo but you can close it and I'll remember to add it to this pr once the KA interface is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants