Skip to content

Make apply more memory-friendly for CUDA#407

Draft
VidithM wants to merge 2 commits intoDrTimothyAldenDavis:dev2from
VidithM:dev2a
Draft

Make apply more memory-friendly for CUDA#407
VidithM wants to merge 2 commits intoDrTimothyAldenDavis:dev2from
VidithM:dev2a

Conversation

@VidithM
Copy link
Contributor

@VidithM VidithM commented Mar 12, 2025

  • If doing an in-place apply and C is iso on input but not on output, and a non-positional operator is used , then we need to realloc C->x and set all numerical entries to the iso value. However, this pins C->x on the host which is bad for CUDA. This change defers the iso expansion to the appropriate point.

(would it be better to instead change the API for GB_apply_op to have a do_iso_expansion flag? The drawback with the current solution is that the expansion may be performed when not needed, if C is not iso on input.)

@VidithM VidithM marked this pull request as draft March 12, 2025 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant