-
Notifications
You must be signed in to change notification settings - Fork 1
Description
This issue is about design / aesthetics rather than functionailty.
So the CUDA buffer and OpenCL buffer are kind of implemented in opposite ways now. The CUDA buffer has a build-in host buffer unless you specify not to allocate one, and the OpenCL buffer does not have a built-in host buffer so you must provide one externally.
I made a branch in KINC where you can see the difference between Similarity::CUDA::Worker and Similarity::OpenCL::Worker: https://github.com/SystemsGenetics/KINC/compare/opencl-buffer
I'm not saying this is a bad thing, after all there's no rule saying that the CUDA and OpenCL interfaces have to look exactly the same. But I'm wondering if there's an elegant way that we could provide a similar interface for these two classes, especially since CUDA and OpenCL are so similar.
CUDA has an equivalent to the map/unmap methods, it's called Unified Memory (see cuMemAllocManaged and cuMemPrefetchAsync), but you have to use a different alloc function for "managed memory" vs normal CUDA device memory, whereas in OpenCL you do not. Actually this might not be a big deal because unified memory was introduced in Kepler devices, so even our old K20's on Palmetto should be able to use it.
So I think one solution that could make the two buffer interfaces more similar is to refactor CUDABuffer to use unified memory, which would essentially make it look the same as OpenCLBuffer. There would be no explicit device buffer, and the CUDABuffer would then have map/unmap methods too.
@4ctrl-alt-del do you have any thoughts on this? Here's some good reading material on unified memory: