Skip to content

PyOpenCLActx: fixes to np.where, actx.to_numpy#357

Open
kaushikcfd wants to merge 3 commits into
mainfrom
some_fixes_to_pyopencl_actx
Open

PyOpenCLActx: fixes to np.where, actx.to_numpy#357
kaushikcfd wants to merge 3 commits into
mainfrom
some_fixes_to_pyopencl_actx

Conversation

@kaushikcfd
Copy link
Copy Markdown
Collaborator

Please refer to commit-by-commit description for the two fixes.

@kaushikcfd kaushikcfd requested review from alexfikl and inducer May 10, 2026 18:35
Copy link
Copy Markdown
Collaborator

@alexfikl alexfikl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Left a couple of nitpicks, but besides that:

  • Where did these show up? For my own curiosity 😁
  • Add some small tests? Is the transpose test supposed to test the non-continuous copy?

Comment thread arraycontext/impl/pyopencl/__init__.py Outdated
Comment thread arraycontext/impl/pyopencl/__init__.py Outdated
Comment thread arraycontext/impl/pyopencl/fake_numpy.py
@kaushikcfd kaushikcfd force-pushed the some_fixes_to_pyopencl_actx branch from 143f99b to 6af81f0 Compare May 10, 2026 19:22
@kaushikcfd
Copy link
Copy Markdown
Collaborator Author

Add some small tests? Is the transpose test supposed to test the non-continuous copy?

The linked regression exercises the code path. Pyopencl.Array.transpose is a no-op on the cl.Device simply creates a copy of the buffer with different strides.

pyopencl.array does not allow array branches with unequal dtypes.
@kaushikcfd kaushikcfd force-pushed the some_fixes_to_pyopencl_actx branch 2 times, most recently from b9ad7d8 to 48dd357 Compare May 11, 2026 03:57
Comment thread arraycontext/impl/pyopencl/__init__.py
# pyopencl supports host transfers only for contiguous arrays.
return ary.get(queue=self.queue)

result = self.call_loopy(
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make a great deal of sense to me.

  • If host-to-GPU transfers fail, then this will fail, too, because the first thing that this call_loopy will do is attempt to transfer the array.
  • Why not simply copy-to-contiguous on the host and then go from there? (And issue a warning for the hidden cost?)
  • IMO the nicest solution would be to make use of OpenCL's strided copy primitive. There was some code for that at a point, but it wasn't tested and (unnecessarily) only worked in the get direction.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first thing that this call_loopy will do is attempt to transfer the array.
Why not simply copy-to-contiguous on the host and then go from there?

I don't think I get these. Over here, we are doing input_ary.copy(order="C") through this loopy kernel on the device before calling ".get".

IMO the nicest solution would be to make use of OpenCL's strided copy primitive. There was some code for that at a point, but it wasn't tested and (unnecessarily) only worked in the get direction.

Agreed. I had seen that, but even if it were merged, it would only work for n-d arrays for n < 4 -- so we will still need this code for the general case.

@kaushikcfd kaushikcfd force-pushed the some_fixes_to_pyopencl_actx branch from 48dd357 to 2a87adf Compare May 11, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants