Skip to content

[DRAFT] Optimize offset based DeviceSegmentedReduce for small and medium segment sizes#6942

Closed
srinivasyadav18 wants to merge 12 commits into
NVIDIA:mainfrom
srinivasyadav18:opt_offset_seg_reduce
Closed

[DRAFT] Optimize offset based DeviceSegmentedReduce for small and medium segment sizes#6942
srinivasyadav18 wants to merge 12 commits into
NVIDIA:mainfrom
srinivasyadav18:opt_offset_seg_reduce

Conversation

@srinivasyadav18
Copy link
Copy Markdown
Contributor

@srinivasyadav18 srinivasyadav18 commented Dec 11, 2025

Description

closes #6898

Checklist

Status

The current version of PR show's good speed ups for I32/F32 (reaching upto 70%) with Sum, but only very decent improvements (upto 10% SOL from < 1% SOL) with more complex operator's like ArgMax or larger input types (> 4B).

Some intial benchmarks:

Sum T{ct}=F32 opt_sum_F32_I32_speedup_heatmap
ArgMax T{ct}=F64 opt_argmax_F64_I32_speedup_heatmap

@srinivasyadav18 srinivasyadav18 requested review from a team as code owners December 11, 2025 01:45
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Dec 11, 2025
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Dec 11, 2025
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@jrhemstad jrhemstad requested a review from elstehle January 13, 2026 16:33
@alliepiper alliepiper removed their request for review January 14, 2026 00:19
@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 3, 2026

😬 CI Workflow Results

🟥 Finished in 6h 00m: Pass: 75%/95 | Total: 4d 16h | Max: 6h 00m | Hits: 73%/95813

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Optimize device_segment_reduce for small and medium varaible segment size's

1 participant