Adapt to KernelIntrinsics/KA 0.10#868
Conversation
30edeac to
e0c661b
Compare
(From JuliaGPU/KernelAbstractions.jl#666) Is my changing static local code generation from |
I don't think. I think it is a remnant from pre-HIP times. |
bc6c228 to
7810464
Compare
7810464 to
7195d90
Compare
c2a607e to
ac192c6
Compare
There was a problem hiding this comment.
AMDGPU.jl Benchmarks
Details
| Benchmark suite | Current: 2c7d7f3 | Previous: 9bc35f1 | Ratio |
|---|---|---|---|
amdgpu/synchronization/context/device |
610 ns |
670 ns |
0.91 |
amdgpu/synchronization/stream/blocking |
250 ns |
280 ns |
0.89 |
amdgpu/synchronization/stream/nonblocking |
330 ns |
400 ns |
0.82 |
array/accumulate/Float32/1d |
85441 ns |
87651 ns |
0.97 |
array/accumulate/Float32/dims=1 |
246523 ns |
290364 ns |
0.85 |
array/accumulate/Float32/dims=1L |
136122 ns |
136182 ns |
1.00 |
array/accumulate/Float32/dims=2 |
129102 ns |
127982 ns |
1.01 |
array/accumulate/Float32/dims=2L |
2696728 ns |
2812289 ns |
0.96 |
array/accumulate/Int64/1d |
97991 ns |
98192 ns |
1.00 |
array/accumulate/Int64/dims=1 |
327244 ns |
243584 ns |
1.34 |
array/accumulate/Int64/dims=1L |
170392 ns |
168293 ns |
1.01 |
array/accumulate/Int64/dims=2 |
123442 ns |
122592 ns |
1.01 |
array/accumulate/Int64/dims=2L |
2930642 ns |
2987552 ns |
0.98 |
array/broadcast |
147992 ns |
147162 ns |
1.01 |
array/construct |
1780 ns |
1720 ns |
1.03 |
array/copy |
39681 ns |
39821 ns |
1.00 |
array/copyto!/cpu_to_gpu |
115212 ns |
115351 ns |
1.00 |
array/copyto!/gpu_to_cpu |
111522 ns |
97151 ns |
1.15 |
array/copyto!/gpu_to_gpu |
131072 ns |
131032 ns |
1.00 |
array/iteration/findall/bool |
186652 ns |
183372 ns |
1.02 |
array/iteration/findall/int |
194493 ns |
191073 ns |
1.02 |
array/iteration/findfirst/bool |
124372 ns |
122112 ns |
1.02 |
array/iteration/findfirst/int |
116681 ns |
116541 ns |
1.00 |
array/iteration/findmin/1d |
173792 ns |
171772 ns |
1.01 |
array/iteration/findmin/2d |
159452 ns |
156672 ns |
1.02 |
array/iteration/logical |
356025 ns |
355875 ns |
1.00 |
array/iteration/scalar |
296214 ns |
303814 ns |
0.97 |
array/permutedims/2d |
75961 ns |
74731 ns |
1.02 |
array/permutedims/3d |
74371 ns |
74861 ns |
0.99 |
array/permutedims/4d |
88041 ns |
77161 ns |
1.14 |
array/random/rand/Float32 |
54770 ns |
55211 ns |
0.99 |
array/random/rand/Int64 |
60271 ns |
56591 ns |
1.07 |
array/random/rand!/Float32 |
72341 ns |
145892 ns |
0.50 |
array/random/rand!/Int64 |
95311 ns |
61691 ns |
1.54 |
array/random/randn/Float32 |
94492 ns |
87631 ns |
1.08 |
array/random/randn!/Float32 |
109962 ns |
108792 ns |
1.01 |
array/reductions/mapreduce/Float32/1d |
134172 ns |
134592 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
95261 ns |
95752 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1L |
776661 ns |
776561 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2 |
97412 ns |
97761 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
298424 ns |
296454 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
134762 ns |
134982 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1 |
95281 ns |
95992 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1L |
783101 ns |
781911 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
97261 ns |
97291 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
299704 ns |
295774 ns |
1.01 |
array/reductions/reduce/Float32/1d |
133562 ns |
134512 ns |
0.99 |
array/reductions/reduce/Float32/dims=1 |
95531 ns |
95801 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
776211 ns |
775531 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
97581 ns |
97771 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
295884 ns |
302224 ns |
0.98 |
array/reductions/reduce/Int64/1d |
134322 ns |
135202 ns |
0.99 |
array/reductions/reduce/Int64/dims=1 |
95111 ns |
95691 ns |
0.99 |
array/reductions/reduce/Int64/dims=1L |
783552 ns |
778971 ns |
1.01 |
array/reductions/reduce/Int64/dims=2 |
95641 ns |
96941 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
299214 ns |
296404 ns |
1.01 |
array/reverse/1d |
43831 ns |
43920 ns |
1.00 |
array/reverse/1dL |
75441 ns |
75771 ns |
1.00 |
array/reverse/1dL_inplace |
158712 ns |
110181 ns |
1.44 |
array/reverse/1d_inplace |
136472 ns |
77711 ns |
1.76 |
array/reverse/2d |
51991 ns |
51191 ns |
1.02 |
array/reverse/2dL |
102241 ns |
102061 ns |
1.00 |
array/reverse/2dL_inplace |
179252 ns |
178402 ns |
1.00 |
array/reverse/2d_inplace |
138762 ns |
78972 ns |
1.76 |
array/sorting/1d |
343375 ns |
342554 ns |
1.00 |
integration/byval/reference |
39830 ns |
39581 ns |
1.01 |
integration/byval/slices=1 |
40440 ns |
40771 ns |
0.99 |
integration/byval/slices=2 |
130942 ns |
157862 ns |
0.83 |
integration/byval/slices=3 |
238634 ns |
239313 ns |
1.00 |
integration/volumerhs |
5043852 ns |
5025691 ns |
1.00 |
kernel/indexing |
57961 ns |
131032 ns |
0.44 |
kernel/indexing_checked |
131002 ns |
132972 ns |
0.99 |
kernel/launch |
1300 ns |
1310 ns |
0.99 |
kernel/rand |
165183 ns |
123892 ns |
1.33 |
latency/import |
1612663271 ns |
1566079357 ns |
1.03 |
latency/precompile |
40134605517 ns |
36502576967 ns |
1.10 |
latency/ttfp |
2194480087 ns |
2134396602 ns |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
|
@christiangnrd I just merged #932 so feel free to transfer #933 to this one and close it. |
|
I don't have permission to close issues or PRs on this repo but you can close it and I'll remember to add it to this pr once the KA interface is merged |
These always run on the CPU backend which is currently broken in 1.12, creating false negatives for the GPU tests
No description provided.