PyOpenCLActx: fixes to np.where, actx.to_numpy#357
Conversation
143f99b to
6af81f0
Compare
The linked regression exercises the code path. Pyopencl.Array.transpose is a no-op on the cl.Device simply creates a copy of the buffer with different strides. |
pyopencl.array does not allow array branches with unequal dtypes.
b9ad7d8 to
48dd357
Compare
| # pyopencl supports host transfers only for contiguous arrays. | ||
| return ary.get(queue=self.queue) | ||
|
|
||
| result = self.call_loopy( |
There was a problem hiding this comment.
This doesn't make a great deal of sense to me.
- If host-to-GPU transfers fail, then this will fail, too, because the first thing that this
call_loopywill do is attempt to transfer the array. - Why not simply copy-to-contiguous on the host and then go from there? (And issue a warning for the hidden cost?)
- IMO the nicest solution would be to make use of OpenCL's strided copy primitive. There was some code for that at a point, but it wasn't tested and (unnecessarily) only worked in the
getdirection.
There was a problem hiding this comment.
the first thing that this call_loopy will do is attempt to transfer the array.
Why not simply copy-to-contiguous on the host and then go from there?
I don't think I get these. Over here, we are doing input_ary.copy(order="C") through this loopy kernel on the device before calling ".get".
IMO the nicest solution would be to make use of OpenCL's strided copy primitive. There was some code for that at a point, but it wasn't tested and (unnecessarily) only worked in the get direction.
Agreed. I had seen that, but even if it were merged, it would only work for n-d arrays for n < 4 -- so we will still need this code for the general case.
48dd357 to
2a87adf
Compare
Please refer to commit-by-commit description for the two fixes.