[DRAFT] Optimize offset based DeviceSegmentedReduce for small and medium segment sizes by srinivasyadav18 · Pull Request #6942 · NVIDIA/cccl

srinivasyadav18 · 2025-12-11T01:45:48Z

Description

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.
Merge guarantee's API commit seperately in Add cuda::execution::guarantee and cuda::execution::segment_size::max_segment_size #6682
Check if static segment size is really useful or reduce kernel size for DeviceSegmentedReduceKernel

Status

The current version of PR show's good speed ups for I32/F32 (reaching upto 70%) with Sum, but only very decent improvements (upto 10% SOL from < 1% SOL) with more complex operator's like ArgMax or larger input types (> 4B).

Some intial benchmarks:

Sum T{ct}=F32

ArgMax T{ct}=F64

…size::max_segment_size`

…c guarantees

…segment_size::max_segment_size`" This reverts commit 5bf0477.

github-actions · 2026-02-03T01:19:23Z

😬 CI Workflow Results

🟥 Finished in 6h 00m: Pass: 75%/95 | Total: 4d 16h | Max: 6h 00m | Hits: 73%/95813

See results here.

srinivasyadav18 added 6 commits December 10, 2025 17:04

Add cuda::execution::guarantee's API and `cuda::execution::segment_…

5bf0477

…size::max_segment_size`

Replace template parameter _N with _Size

4da3347

remove usage of sub-namespace segment_size

321f183

Extend Gurantee's API and max_segment_size to support stateful/dynami…

890eb10

…c guarantees

add support for max seg size and optimize small,med seg size

61d0ce0

add benchmarks

7bea7fa

srinivasyadav18 requested review from a team as code owners December 11, 2025 01:45

srinivasyadav18 requested review from alliepiper and miscco December 11, 2025 01:45

github-project-automation Bot added this to CCCL Dec 11, 2025

github-project-automation Bot moved this to Todo in CCCL Dec 11, 2025

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Dec 11, 2025

This comment has been minimized.

Sign in to view

clean up and fix benchmarks

96f098c

This comment has been minimized.

Sign in to view

jrhemstad requested a review from elstehle January 13, 2026 16:33

alliepiper removed their request for review January 14, 2026 00:19

srinivasyadav18 added 2 commits January 28, 2026 17:12

Revert "Add cuda::execution::guarantee's API and `cuda::execution::…

702b05d

…segment_size::max_segment_size`" This reverts commit 5bf0477.

remove guarantees API usage

afaefae

This comment has been minimized.

Sign in to view

srinivasyadav18 added 3 commits February 2, 2026 09:59

disable fp tests

7716eca

update license

4f4a826

minor fixes

01cbba7

srinivasyadav18 closed this Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] Optimize offset based DeviceSegmentedReduce for small and medium segment sizes#6942

[DRAFT] Optimize offset based DeviceSegmentedReduce for small and medium segment sizes#6942
srinivasyadav18 wants to merge 12 commits into
NVIDIA:mainfrom
srinivasyadav18:opt_offset_seg_reduce

srinivasyadav18 commented Dec 11, 2025 •

edited

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

srinivasyadav18 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Status

Some intial benchmarks:

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions Bot commented Feb 3, 2026

😬 CI Workflow Results

🟥 Finished in 6h 00m: Pass: 75%/95 | Total: 4d 16h | Max: 6h 00m | Hits: 73%/95813

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

srinivasyadav18 commented Dec 11, 2025 •

edited

Loading