memory-coalescing

Here are 3 public repositories matching this topic...

An academic project on accelerating Neural Network training by optimizing the GEMM kernel on multi-core CPUs and GPUs. (NTUA)

Profile-driven FP32 CUDA GEMM optimization: naive --> tiled --> coalesced --> register-blocked --> bank-padded, benchmarked against cuBLAS.

cuda cublas shared-memory gemm gpu-performance nsight-compute memory-coalescing

High-performance CUDA matrix multiplication kernels - shared memory tiling, register blocking, Roofline Model analysis. Benchmarked against cuBLAS.

Add a description, image, and links to the memory-coalescing topic page so that developers can more easily learn about it.

To associate your repository with the memory-coalescing topic, visit your repo's landing page and select "manage topics."