-
Notifications
You must be signed in to change notification settings - Fork 599
[Common] Fix long compile time in padding.cu on arch 75 #2562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Common] Fix long compile time in padding.cu on arch 75 #2562
Conversation
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>
for more information, see https://pre-commit.ci
|
/te-ci L0 |
Greptile SummaryAddresses severe compilation time regression on arch 75 introduced by PR #2548. The previous fix added Key Changes:
Technical Assessment: Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Dev as Developer
participant PR2548 as PR #2548
participant Compiler as NVCC Compiler (arch 75)
participant PR2562 as PR #2562 (Current)
Note over Dev,PR2548: Original Issue: Integer Overflow
Dev->>PR2548: Add static_cast<size_t> for row offset
PR2548->>PR2548: Declare row_offset outside inner loop
Note over PR2548: size_t row_offset = static_cast<size_t>(row) * row_length
Note over Compiler: Compilation Time Issue Discovered
PR2548->>Compiler: Compile padding.cu for arch 75
Compiler->>Compiler: Process row_offset variable scope
Note over Compiler: Takes 20+ minutes (was 30 seconds)
Note over Dev,PR2562: Workaround: Inline Computation
Dev->>PR2562: Move static_cast inline to usage sites
PR2562->>PR2562: Remove row_offset variable declaration
PR2562->>PR2562: Use static_cast<size_t>(row) * row_length inline
Note over PR2562: Applied to 3 locations per kernel<br/>(input read, output write, padding write)
PR2562->>Compiler: Compile with inline computation
Compiler->>Compiler: Optimize without intermediate variable
Note over Compiler: Compile time restored to ~30 seconds
Note over PR2562: Overflow Protection Maintained<br/>Performance Functionally Equivalent
|
Greptile's behavior is changing!From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section. This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR". |
Description
After PR #2548 we observe very long compile times in
padding.cufor arch 75, >20+mins where previously it only took ~30seconds to compile.To temporarily work around this issue, we are moving the scope of row_offset inward but keeping the cast to
size_tType of change
Changes
Checklist: