Open
Conversation
The latest cuda-tile (1.2.0) requires tileiras >=13.2. Update the CI Docker base image to CUDA 13.2.0 so the system-installed tileiras matches what cuda-tile expects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
/ok to test 7edbbea |
With CUDA 13.2, the tileiras compiler uses more memory per compilation. Running 16 xdist workers with -n auto and different kernel modules compiling simultaneously causes OOM (exit code 137) on the CI runner. Adding --dist=loadscope groups tests by module/class so each worker compiles one kernel type at a time, reducing peak memory. This matches the Ocean CI configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
/ok to test 24bfa7b |
With --dist=loadscope, tests run more sequentially per worker to avoid OOM. This trades speed for memory safety, so the previous 15-minute step timeout (17m job timeout) is too tight. Bump to 25m step / 30m job to match the test-benchmark job timeout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
/ok to test 4d051ce |
The previous --dist=loadscope fix prevented OOM but caused timeouts because it serializes all tests within a module to one worker (e.g. 156 test_bmm tests on a single worker). Switch to -n 4 (4 parallel workers instead of 16 from -n auto): - 4 workers use ~1/4 peak memory, avoiding the CUDA 13.2 OOM - Tests distribute freely across workers, no serialization bottleneck - Expected runtime ~12-15 min, well within the 25 min step timeout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
/ok to test aa59188 |
- -n 16 (auto): OOM with CUDA 13.2 tileiras (~2.5 min) - -n 4: no OOM but too slow, times out at 25 min - -n 8: half the memory of 16 workers, twice the speed of 4 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
/ok to test cc6a34b |
-n 8 was stable (no OOM) but timed out at 25 min (just barely). Bump to -n 10 for more speed, and increase step timeout to 35 min (job timeout 40 min) as safety margin for CUDA 13.2 compilation. Summary of what we know: - -n 16 (auto): OOM at ~2.5 min - -n 4: no OOM, timeout at 25 min - -n 8: no OOM, timeout at 25 min (nearly complete) - -n 10: should complete in ~20 min Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
/ok to test 3f2ffb8 |
With -n 12 passing in 19 min, 25m step timeout gives ~6 min headroom. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
/ok to test b433a75 |
Collaborator
Author
|
/ok to test 0a2582e |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
• Bump CI Docker base image from
nvcr.io/nvidia/cuda:13.1.0-devel-ubuntu22.04tonvcr.io/nvidia/cuda:13.2.0-devel-ubuntu22.04• The latest
cuda-tile(1.2.0) declaresnvidia-cuda-tileiras >=13.2, <13.3as its optional[tileiras]dependency. The CI image should ship a matching systemtileirasso the compiler version is aligned.• PyTorch
cu130wheels remain compatible with CUDA 13.2 (backward compatible).What changed
modeling/transformers/Dockerfile: updatedBASE_IMAGEdefault fromcuda:13.1.0tocuda:13.2.0.How to verify
In the CI docker build / test-ops job logs, check:
•
nvcc --version→ should show 13.2•
dpkg -l | grep tileiras→ should show 13.2.xCI Configuration
🤖 Generated with Claude Code