What kind of issue is this?
The question is whether OpenACC can give us the performance we need. Would be interesting to experiment by taking the CPU-only code and moving one or more operations now done with CUDA kernels into OpenACC implementations to compare performance. See https://groups.google.com/d/msg/uwb-braingrid/pWeIUr-CgyE/tXbSG3wdAAAJ
for a link to an OpenACC tutorial.
What is affected by this?
How do we replicate the issue/how would it work?
Expected behavior (i.e. solution or outline of what it would look like)
Other Comments
What kind of issue is this?
The question is whether OpenACC can give us the performance we need. Would be interesting to experiment by taking the CPU-only code and moving one or more operations now done with CUDA kernels into OpenACC implementations to compare performance. See https://groups.google.com/d/msg/uwb-braingrid/pWeIUr-CgyE/tXbSG3wdAAAJ
for a link to an OpenACC tutorial.
What is affected by this?
How do we replicate the issue/how would it work?
Expected behavior (i.e. solution or outline of what it would look like)
Other Comments