Now that CCCL v3 can be used for efficient parallel reductions in Python it would be great to create an additional benchmark file - reduce_bench.py with Python-ic JIT-ed kernels for parallel reductions, showcasing the impact of different hyper-parameters on the result.
Now that CCCL v3 can be used for efficient parallel reductions in Python it would be great to create an additional benchmark file -
reduce_bench.pywith Python-ic JIT-ed kernels for parallel reductions, showcasing the impact of different hyper-parameters on the result.