run to run warpspeed impl sm100+#9263
Conversation
|
Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine. No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
OverviewThis PR implements run-to-run support for the warpspeed scan optimization on SM100+ targets, enabling deterministic DeviceScan execution. The changes introduce a stable reduction order variant of the warpspeed lookahead logic and thread this stability setting through the scan dispatch pipeline. ChangesWarpspeed Lookahead Stable VariantAdded
Warpspeed Scan Pipeline IntegrationExtended the warpspeed scan implementation in
Kernel Dispatch UpdateUpdated Policy Selection for Stable ReductionModified Related IssueCloses WalkthroughAdds a deterministic ChangesStable Warpspeed Scan Implementation
Assessment against linked issues
Possibly related PRs
Suggested reviewers
important: Confirm that warpspeed stable lookahead updates important: Verify the suggestion: Consider adding a static_assert or comment that documents Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cub/cub/device/dispatch/tuning/tuning_scan.cuh (1)
1038-1047:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winsuggestion: Update the inline rationale for the
require_stable_reduction_order→cc >= {10, 0}gate:warpIncrementalLookaheadStableis available for__cccl_ptx_isa >= 860(sm_90+), but the scan policy selector only produces ascan_warpspeed_policywhencc >= {10, 0}(otherwiseget_warpspeed_policyreturns{}), so stable warpspeed on sm_90+ is blocked by warpspeed policy/tuning availability—not by stable lookahead codegen availability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: dfcdb20c-106f-4ae5-a688-9e19e5475411
📒 Files selected for processing (4)
cub/cub/detail/warpspeed/look_ahead.cuhcub/cub/device/dispatch/kernels/kernel_scan.cuhcub/cub/device/dispatch/kernels/kernel_scan_warpspeed.cuhcub/cub/device/dispatch/tuning/tuning_scan.cuh
|
/ok to test cbd13bb |
🥳 CI Workflow Results🟩 Finished in 2h 26m: Pass: 100%/284 | Total: 11d 15h | Max: 2h 26m | Hits: 18%/1000913See results here. |
|
pre-commit.ci autofix |
Description
closes #7556
Checklist