Skip to content

[libcu++] Adds a cuda::execution::tie_break requirement folded into determinism#9269

Open
elstehle wants to merge 3 commits into
NVIDIA:mainfrom
elstehle:fea/cuda-execution-tie-break-determinism
Open

[libcu++] Adds a cuda::execution::tie_break requirement folded into determinism#9269
elstehle wants to merge 3 commits into
NVIDIA:mainfrom
elstehle:fea/cuda-execution-tie-break-determinism

Conversation

@elstehle
Copy link
Copy Markdown
Contributor

@elstehle elstehle commented Jun 5, 2026

Closes #9255

Note, this is an alternative proposal to #9238.

Adds an optional tie-break preference to cuda::execution::determinism.

A deterministic guarantee (determinism::run_to_run / determinism::gpu_to_gpu) can now be called with a tie-break tag to specify which of the elements that compare equal at an algorithm's selection boundary are kept — e.g. require(determinism::run_to_run(determinism::tie_break::prefer_smaller_index)). The tags live in cuda::execution::determinism::tie_break:: with values unspecified (default), prefer_smaller_index, and prefer_larger_index.

A word on the motivation in top-k: ties at the K-th element are the source of non-determinism; pairing a determinism requirement with a tie-break lets users specify which amongst the competing items should make it into the result set.

The determinism options I envision for top-k:

require(...) which items are selected
determinism::not_guaranteed non-deterministic among tied elements (fast path)
determinism::run_to_run deterministic; implementation-defined tie-break
determinism::run_to_run(determinism::tie_break::prefer_smaller_index) deterministic; ties resolved toward smaller source index
determinism::run_to_run(determinism::tie_break::prefer_larger_index) deterministic; ties resolved toward larger source index

My mental model: tie_break is purely a criterion about which items are selected, i.e., it operates on the result set. Ordering, i.e., {stable,unstable} / sorted / unsorted, is the orthogonal concern that operates on the result sequence (output_ordering). A tie-break says nothing about order.

Why part of determinism rather than a standalone requirement? A tie-break is only meaningful alongside a deterministic guarantee. Instead of a separate tie_break requirement that has to be paired with determinism (and static_assert-ed), the tie-break is now produced by the guarantee itself, so it is structurally impossible to request one without determinism — while determinism stays unchanged for users who don't care (it defaults to unspecified). Attaching a tie-break to not_guaranteed is a static_assert. (This started as a standalone cuda::execution::tie_break requirement and was folded into determinism per review.)

A note on levels: an explicit prefer_smaller_index/prefer_larger_index fully fixes the result set, so for top-k it produces the same result on run_to_run and gpu_to_gpu. The tag is accepted on both, since in general the determinism level (reproducibility, incl. numerics) and the tie-break (which set) are orthogonal. See #9238 (comment)

Algorithms that do not support a tie-break (reduce / scan / segmented reduce) route their requested determinism through a shared guard and reject a tie-break at compile time with a single, friendly diagnostic instead of a deep template error.

Exposing this within cuda::execution (rather than burying it in a CUB/top-k namespace) keeps the requirements interface homogeneous for users.

@elstehle elstehle requested review from a team as code owners June 5, 2026 04:09
@elstehle elstehle requested review from Jacobfaib and NaderAlAwar June 5, 2026 04:09
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 5, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 5, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 21dbe6f3-2c83-4ee2-9769-d7cd5081c163

📥 Commits

Reviewing files that changed from the base of the PR and between cb6e157 and 25cea02.

📒 Files selected for processing (2)
  • libcudacxx/include/cuda/__execution/determinism.h
  • libcudacxx/test/libcudacxx/cuda/execution/determinism.pass.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • libcudacxx/test/libcudacxx/cuda/execution/determinism.pass.cpp
  • libcudacxx/include/cuda/__execution/determinism.h

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Overview

This PR embeds an optional tie-break preference into cuda::execution::determinism so callers can specify which equal elements are selected when algorithms encounter ties at a selection boundary. Tie-break values are subordinate to a determinism guarantee and cannot be requested on their own. Algorithms that do not support tie-break (e.g., reduce, scan, segmented reduce) now trigger a single, clear compile-time diagnostic if a tie-break is supplied.

API Design and Usage

Tie-break constants (in cuda::execution::determinism::tie_break):

  • unspecified (default): implementation-defined tie-breaking
  • prefer_smaller_index: prefer elements with smaller source indices
  • prefer_larger_index: prefer elements with larger source indices

Example:
require(determinism::run_to_run(determinism::tie_break::prefer_smaller_index))

Tie-break holders are not standalone requirements; attaching a tie-break to determinism::not_guaranteed is disallowed via static_assert.

Implementation Details

libcudacxx/include/cuda/__execution/determinism.h:

  • Added enum __tie_break_t and __tie_break_holder_t to represent tie-break preferences.
  • Extended __determinism_holder_t to carry a tie-break parameter (default __unspecified) and expose static constexpr tie_break.
  • Added operator() to attach tie-break holders to deterministic guarantees, with a compile-time check preventing attachment to not_guaranteed.
  • Added __no_tie_break_guard / __validate_no_tie_break_t to enforce at compile time that algorithms not supporting tie-breaks reject requests with a single diagnostic.

CUB integration (cub/*):

  • Device reduce, scan, and segmented-reduce environment/dispatch paths now wrap determinism selections with ::cuda::execution::determinism::__validate_no_tie_break_t to reject unsupported tie-breaks at compile time. No public API signatures or runtime behavior changes.

Tests:

  • Updated libcudacxx test determinism.pass.cpp with static_asserts verifying tie-break attachment produces the expected derived types and that defaults remain unspecified. Also asserts tie-break holders are not requirements.

Scope & Impact

  • Files touched include libcudacxx determinism header and test, and several cub device headers (device_reduce.cuh, device_scan.cuh, device_segmented_reduce.cuh).
  • Public API additions: tie_break namespace with types/constants and the extended determinism holder exported under cuda::execution::determinism.
  • Backward compatible: existing code that doesn't specify tie-break remains unchanged; requesting tie-break without determinism is a compile-time error.

Motivation

Resolves the request to let callers of top-k-style algorithms (e.g., DeviceBatchedTopK) choose deterministic tie-breaking behavior so that the composition of the result set is predictable when equal elements straddle selection boundaries.

Walkthrough

important: PR adds a tie-break preference to determinism requirements, implements a compile-time guard rejecting tie-breaks where unsupported, updates CUB reduce/scan/segmented_reduce to apply the guard, and adds compile-time tests.

Changes

Determinism tie-break support

Layer / File(s) Summary
Tie-break type system foundation
libcudacxx/include/cuda/__execution/determinism.h
Adds __tie_break_t, __tie_break_holder_t, determinism::tie_break constants; extends __determinism_holder_t with _TieBreak parameter and tie_break member; adds operator()(__tie_break_holder_t) for deterministic guarantees.
Compile-time tie-break validation constraint
libcudacxx/include/cuda/__execution/determinism.h
Adds __no_tie_break_guard and __validate_no_tie_break_t that static_assert when a determinism type includes a non-__unspecified tie-break.
Validate reduce algorithms do not support tie-breaks
cub/cub/device/device_reduce.cuh
Wraps determinism selection in __transform_reduce, __minmax_reduce, and __arg_min_env with __validate_no_tie_break_t so tie-break-bearing requirements fail at compile time.
Validate scan algorithm does not support tie-breaks
cub/cub/device/device_scan.cuh
Wraps determinism selection in scan_impl_env with __validate_no_tie_break_t.
Validate segmented reduce algorithms do not support tie-breaks
cub/cub/device/device_segmented_reduce.cuh
Applies __validate_no_tie_break_t and switches from ::cuda::std::execution::determinism to ::cuda::execution::determinism across multiple segmented-reduce entry points and public overloads.
Test coverage for tie-break behavior
libcudacxx/test/libcudacxx/cuda/execution/determinism.pass.cpp
Adds static_assert checks that tie-break preferences are recorded in derived requirement types and that tie-break holder tags are not valid requirements.

Assessment against linked issues

Objective Addressed Explanation
Introduce option to specify tie-breaking behavior in requirements API [#9255]
Support tie_break::prefer_smaller_index in determinism requirements [#9255]
Support tie_break::prefer_larger_index in determinism requirements [#9255]
Compile-time validation that algorithms rejecting tie-breaks fail if tie-break is specified [#9255]

Out-of-scope changes

Code Change Explanation
Namespace migration to ::cuda::execution::determinism in segmented_reduce (cub/cub/device/device_segmented_reduce.cuh) This namespace change is not requested by issue #9255; confirm it's intentional and compatible with callers expecting ::cuda::std::execution::determinism.

Possibly related PRs

  • NVIDIA/cccl#9098: Overlaps in scan_impl_env determinism-wiring logic; changes may interact with stable-reduction-order flags.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Infer (1.2.0)
libcudacxx/test/libcudacxx/cuda/execution/determinism.pass.cpp

libcudacxx/test/libcudacxx/cuda/execution/determinism.pass.cpp:11:10: fatal error: 'cuda/execution.determinism.h' file not found
11 | #include <cuda/execution.determinism.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/25cea02b6c370f4801470057724d98e23ae06dae-8223f19a817559c3/tmp/clang_command_.tmp.b37f80.txt
++Contents of '/tmp/coderabbit-infer/25cea02b6c370f4801470057724d98e23ae06dae-8223f19a817559c3/tmp/clang_command_.tmp.b37f80.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRING_SIZE=65535" "-cc1"

... [truncated 1195 characters] ...

ernal-isystem" "/usr/local/include" "-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-fdeprecated-macro" "-ferror-limit" "19" "-fgnuc-version=4.2.1"
"-fskip-odr-check-in-gmf" "-fcxx-exceptions" "-fexceptions"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/8223f19a817559c3/file.o" "-x" "c++"
"libcudacxx/test/libcudacxx/cuda/execution/determinism.pass.cpp" "-O0"
"-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
libcudacxx/include/cuda/__execution/determinism.h (1)

87-90: ⚡ Quick win

suggestion: Complete the Doxygen contract for operator() documentation.
Line 87 documents a non-void function with Doxygen but omits @param[in] and @return; add both to keep generated docs and header comments consistent.

As per coding guidelines: "When a function is documented with Doxygen, it must include: //! @brief, `//! `@param`[in/out/in,out]` for every parameter, and `//! `@return for non-void functions."

libcudacxx/test/libcudacxx/cuda/execution/determinism.pass.cpp (1)

35-38: ⚡ Quick win

suggestion: Extract the gpu_to_gpu tie-break type and verify its value and tie_break members individually (like the r2r_smaller test) for symmetric coverage.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d3c370d9-12ae-4364-aff8-5d7b43dc2a26

📥 Commits

Reviewing files that changed from the base of the PR and between 2b21bec and cb6e157.

📒 Files selected for processing (5)
  • cub/cub/device/device_reduce.cuh
  • cub/cub/device/device_scan.cuh
  • cub/cub/device/device_segmented_reduce.cuh
  • libcudacxx/include/cuda/__execution/determinism.h
  • libcudacxx/test/libcudacxx/cuda/execution/determinism.pass.cpp

Comment on lines +186 to +189
using default_determinism_t = ::cuda::execution::determinism::__validate_no_tie_break_t<
::cuda::std::execution::__query_result_or_t<requirements_t,
::cuda::execution::determinism::__get_determinism_t,
::cuda::execution::determinism::run_to_run_t>;
::cuda::execution::determinism::run_to_run_t>>;
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== DeviceReduce determinism guard sites =="
rg -n -C2 '__validate_no_tie_break_t|__get_determinism_t|ReduceByKey\s*\(' cub/cub/device/device_reduce.cuh

echo
echo "== env dispatch helper implementation =="
fd -a 'env_dispatch\.cuh$' cub | while read -r f; do
  echo "-- $f --"
  rg -n -C4 'dispatch_with_env(_and_tuning)?|__get_determinism_t|__validate_no_tie_break_t' "$f"
done

Repository: NVIDIA/cccl

Length of output: 4738


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== device_reduce.cuh: find ReduceByKey env-based overload and its dispatch path =="
rg -n "ReduceByKey\\s*\\(" cub/cub/device/device_reduce.cuh | head -n 50
echo
rg -n "dispatch_with_env(_and_tuning)?" cub/cub/device/device_reduce.cuh
echo
echo "== device_reduce.cuh: show around env-based ReduceByKey (near match at line ~2215) =="
sed -n '2190,2285p' cub/cub/device/device_reduce.cuh
echo
echo "== device_reduce.cuh: show further around implementation starting near 2390 =="
sed -n '2360,2465p' cub/cub/device/device_reduce.cuh
echo
echo "== env_dispatch.cuh: search determinism-related tokens =="
rg -n "determinism|tie_break|__validate_no_tie_break_t|__get_determinism_t" cub/cub/detail/env_dispatch.cuh
echo
echo "== env_dispatch.cuh: show whole file with line numbers (bounded) =="
wc -l cub/cub/detail/env_dispatch.cuh
sed -n '1,160p' cub/cub/detail/env_dispatch.cuh

Repository: NVIDIA/cccl

Length of output: 11123


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== device_reduce.cuh: find ReduceByKey env-based overload and its dispatch path =="
rg -n "ReduceByKey\\s*\\(" cub/cub/device/device_reduce.cuh | head -n 80
echo
echo "== device_reduce.cuh: dispatch_with_env(_and_tuning) usage =="
rg -n "dispatch_with_env(_and_tuning)?" cub/cub/device/device_reduce.cuh
echo
echo "== device_reduce.cuh: show around env-based ReduceByKey (near line ~2215 from prior output) =="
sed -n '2185,2285p' cub/cub/device/device_reduce.cuh
echo
echo "== device_reduce.cuh: show env-parameter overload signature area (around ~2390 from prior output) =="
sed -n '2365,2455p' cub/cub/device/device_reduce.cuh
echo
echo "== env_dispatch.cuh: search determinism-related tokens =="
rg -n "determinism|tie_break|__validate_no_tie_break_t|__get_determinism_t" cub/cub/detail/env_dispatch.cuh
echo
echo "== env_dispatch.cuh: show full file (wc + sed) =="
wc -l cub/cub/detail/env_dispatch.cuh
sed -n '1,220p' cub/cub/detail/env_dispatch.cuh

Repository: NVIDIA/cccl

Length of output: 11220


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== device_reduce.cuh: ReduceByKey overloads =="
rg -n "ReduceByKey\\s*\\(" cub/cub/device/device_reduce.cuh | head -n 120

echo
echo "== device_reduce.cuh: dispatch_with_env(_and_tuning) usage =="
rg -n "dispatch_with_env(_and_tuning)?" cub/cub/device/device_reduce.cuh

echo
echo "== device_reduce.cuh: show around env-based ReduceByKey declaration/usage (near ~2215) =="
sed -n '2190,2265p' cub/cub/device/device_reduce.cuh

echo
echo "== device_reduce.cuh: show around ReduceByKey definitions starting near ~2390 =="
sed -n '2365,2465p' cub/cub/device/device_reduce.cuh

echo
echo "== env_dispatch.cuh: determinism/tie-break relevance =="
rg -n "determinism|tie_break|__validate_no_tie_break_t|__get_determinism_t" cub/cub/detail/env_dispatch.cuh

echo
echo "== env_dispatch.cuh: file size + contents (bounded) =="
wc -l cub/cub/detail/env_dispatch.cuh
sed -n '1,240p' cub/cub/detail/env_dispatch.cuh

Repository: NVIDIA/cccl

Length of output: 9976


important: cub::DeviceReduce::ReduceByKey(..., EnvT) still accepts tie-break determinism via EnvT without enforcing the no-tie-break rule
The EnvT overload in cub/cub/device/device_reduce.cuh directly calls detail::dispatch_with_env_and_tuning(...) and contains no __validate_no_tie_break_t / determinism static_assert, while cub/cub/detail/env_dispatch.cuh only queries stream/memory/tuning (no determinism/tie_break handling). This allows require(determinism::run_to_run(determinism::tie_break::prefer_smaller_index)) in EnvT to be silently ignored instead of hard-failing like the other guarded helpers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iiuc, it's a pre-existing behaviour, where we invoke the ByKey algorithm that may be non-deterministic for pseudo-associative operations like floating point addition without without validating the requested determinism, so a run_to_run/gpu_to_gpu request is silently accepted rather than rejected.

@srinivasyadav18, as our determinism expert, is this a pre-existing issue and we should start to validate requirements in the interfaces?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +140 to +143
using requested_determinism_t = ::cuda::execution::determinism::__validate_no_tie_break_t<
::cuda::std::execution::__query_result_or_t<requirements_t,
::cuda::execution::determinism::__get_determinism_t,
::cuda::execution::determinism::not_guaranteed_t>;
::cuda::execution::determinism::not_guaranteed_t>>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== DeviceScan determinism guard sites =="
rg -n -C2 'scan_impl_env|__validate_no_tie_break_t|__get_determinism_t' cub/cub/device/device_scan.cuh

echo
echo "== DeviceScan by-key env overloads =="
rg -n -C3 'Exclusive(Sum|Scan)ByKey|Inclusive(Sum|Scan)ByKey|dispatch_with_env\(' cub/cub/device/device_scan.cuh

echo
echo "== env dispatch helper implementation =="
fd -a 'env_dispatch\.cuh$' cub | while read -r f; do
  echo "-- $f --"
  rg -n -C4 'dispatch_with_env(_and_tuning)?|__get_determinism_t|__validate_no_tie_break_t' "$f"
done

Repository: NVIDIA/cccl

Length of output: 12595


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search determinism validation symbols in env_dispatch.cuh =="
rg -n '__validate_no_tie_break_t|__get_determinism_t' cub/cub/detail/env_dispatch.cuh

echo
echo "== Locate scan_by_key_impl in device_scan.cuh =="
rg -n 'scan_by_key_impl' cub/cub/device/device_scan.cuh

echo
echo "== Context around scan_by_key_impl determinism-related code =="
rg -n -C6 'scan_by_key_impl' cub/cub/device/device_scan.cuh | head -n 200

Repository: NVIDIA/cccl

Length of output: 119


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Locate scan_by_key_impl definition =="
rg -n 'scan_by_key_impl' cub/cub/device/device_scan.cuh

echo
echo "== Search determinism validation symbols near scan_by_key_impl =="
rg -n -C8 '__validate_no_tie_break_t|__get_determinism_t' cub/cub/device/device_scan.cuh | head -n 200

echo
echo "== Show scan_by_key_impl body (first match region) =="
# Print from the first occurrence of scan_by_key_impl through a reasonable window.
line=$(rg -n 'scan_by_key_impl' cub/cub/device/device_scan.cuh | head -n 1 | cut -d: -f1)
start=$((line-20))
end=$((line+220))
sed -n "${start},${end}p" cub/cub/device/device_scan.cuh | cat -n

Repository: NVIDIA/cccl

Length of output: 13246


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Count occurrences of tie-break validation in device_scan.cuh =="
rg -n '__validate_no_tie_break_t' cub/cub/device/device_scan.cuh

echo
echo "== Find requested_determinism_t definitions/uses in device_scan.cuh =="
rg -n 'requested_determinism_t' cub/cub/device/device_scan.cuh

echo
echo "== Find scan_impl_determinism usage around by-key =="
rg -n 'scan_impl_determinism' cub/cub/device/device_scan.cuh | head -n 50

echo
echo "== Context around requested_determinism_t within scan_by_key_impl wrapper =="
# Narrow to the first requested_determinism_t occurrence
line=$(rg -n 'requested_determinism_t' cub/cub/device/device_scan.cuh | head -n 1 | cut -d: -f1)
start=$((line-60)); end=$((line+60))
sed -n "${start},${end}p" cub/cub/device/device_scan.cuh | cat -n

Repository: NVIDIA/cccl

Length of output: 6763


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Find __validate_no_tie_break_t across cub/cub =="
rg -n '__validate_no_tie_break_t' cub/cub | head -n 50

echo
echo "== Find __get_determinism_t across cub/cub =="
rg -n '__get_determinism_t' cub/cub | head -n 50

echo
echo "== Find detail::scan_by_key::dispatch call sites =="
rg -n 'detail::scan_by_key::dispatch' cub/cub | head -n 50

echo
echo "== Find scan_by_key::dispatch definition files =="
rg -n 'namespace scan_by_key|struct dispatch|dispatch\<' cub/cub/detail/scan_by_key* cub/cub/detail 2>/dev/null | head -n 200

Repository: NVIDIA/cccl

Length of output: 4415


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Locate scan_by_key-related detail files =="
fd -a '*scan_by_key*' cub/cub/detail | head -n 200

echo
echo "== Search for 'scan_by_key::dispatch' definitions =="
rg -n 'scan_by_key::dispatch' cub/cub/detail | head -n 50

echo
echo "== Search for tie-break validation in scan_by_key details =="
rg -n '__validate_no_tie_break_t' cub/cub/detail | head -n 50

echo
echo "== Search for determinism query symbols in scan_by_key details =="
rg -n '__get_determinism_t|__query_result_or_t<.*determinism|not_guaranteed_t' cub/cub/detail | head -n 80

echo
echo "== If dispatch is found, show surrounding code =="
file=$(rg -l 'scan_by_key::dispatch' cub/cub/detail | head -n 1 || true)
if [ -n "${file:-}" ]; then
  echo "-- First dispatch file: $file --"
  rg -n -C4 'scan_by_key::dispatch' "$file" | head -n 120
else
  echo "-- No dispatch definition found in cub/cub/detail/ with current search --"
fi

Repository: NVIDIA/cccl

Length of output: 412


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Locate files with scan_by_key in cub/cub/detail =="
fd -a -g '*scan_by_key*' cub/cub/detail | head -n 200

echo
echo "== Find scan_by_key::dispatch occurrences across cub/cub/detail =="
rg -n 'scan_by_key::dispatch' cub/cub/detail | head -n 50

echo
echo "== Find scan_by_key dispatch / determinism helpers across cub/cub/detail =="
rg -n '__validate_no_tie_break_t|__get_determinism_t|not_guaranteed_t|scan_impl_determinism' cub/cub/detail | head -n 200

Repository: NVIDIA/cccl

Length of output: 177


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Find any cub/cub files with scan_by_key in the filename =="
fd -a -g '*scan_by_key*' cub/cub | head -n 200

echo
echo "== Search full cub/cub for 'scan_by_key::dispatch' =="
rg -n 'scan_by_key::dispatch' cub/cub | head -n 50

echo
echo "== Search full cub/cub for 'namespace scan_by_key' =="
rg -n 'namespace[[:space:]]+scan_by_key|struct[[:space:]]+scan_by_key' cub/cub | head -n 50

echo
echo "== Search full cub/cub for tie-break validation helpers in scan_by_key_impl/dispatch =="
rg -n '__validate_no_tie_break_t|scan_impl_determinism|not_guaranteed_t' cub/cub/device/device_scan.cuh cub/cub/detail 2>/dev/null | head -n 200

Repository: NVIDIA/cccl

Length of output: 1316


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search determinism symbols in dispatch_scan_by_key.cuh =="
rg -n '__validate_no_tie_break_t|__get_determinism_t|not_guaranteed_t|scan_impl_determinism|determinism::' cub/cub/device/dispatch/dispatch_scan_by_key.cuh

echo
echo "== Search determinism symbols in tuning_scan_by_key.cuh =="
rg -n '__validate_no_tie_break_t|__get_determinism_t|not_guaranteed_t|scan_impl_determinism|determinism::' cub/cub/device/dispatch/tuning/tuning_scan_by_key.cuh

echo
echo "== Search for scan_by_key dispatch-ish entry points in dispatch_scan_by_key.cuh =="
rg -n 'scan_by_key|dispatch|dispatch\<' cub/cub/device/dispatch/dispatch_scan_by_key.cuh | head -n 200

echo
echo "== Show top portion of dispatch_scan_by_key.cuh (first 220 lines) =="
sed -n '1,220p' cub/cub/device/dispatch/dispatch_scan_by_key.cuh | cat -n

echo
echo "== Show portion around any determinism-related code in dispatch_scan_by_key.cuh =="
for pat in '__validate_no_tie_break_t' '__get_determinism_t' 'not_guaranteed_t' 'determinism'; do
  line=$(rg -n "$pat" cub/cub/device/dispatch/dispatch_scan_by_key.cuh | head -n 1 | cut -d: -f1 || true)
  if [ -n "$line" ]; then
    start=$((line-60)); end=$((line+120));
    echo "-- Context around $pat at line $line --"
    sed -n "${start},${end}p" cub/cub/device/dispatch/dispatch_scan_by_key.cuh | cat -n
  fi
done

Repository: NVIDIA/cccl

Length of output: 116


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search for 'determinism' substring in dispatch_scan_by_key.cuh =="
rg -n 'determin' cub/cub/device/dispatch/dispatch_scan_by_key.cuh || true

echo
echo "== Search for 'determin' substring in tuning_scan_by_key.cuh =="
rg -n 'determin' cub/cub/device/dispatch/tuning/tuning_scan_by_key.cuh || true

echo
echo "== Show first 260 lines of dispatch_scan_by_key.cuh =="
sed -n '1,260p' cub/cub/device/dispatch/dispatch_scan_by_key.cuh | cat -n

echo
echo "== Show first 260 lines of tuning_scan_by_key.cuh =="
sed -n '1,260p' cub/cub/device/dispatch/tuning/tuning_scan_by_key.cuh | cat -n

Repository: NVIDIA/cccl

Length of output: 21756


critical: EnvT tie-break determinism is rejected for scan_impl_env, but not for Exclusive*ByKey / Inclusive*ByKey env overloads.

  • scan_impl_env uses __validate_no_tie_break_t<...__get_determinism_t...> + static_assert (around cub/cub/device/device_scan.cuh:131-143).
  • The *ByKey env overloads only call detail::dispatch_with_env(env, ...) and then scan_by_key_impl<tuning_t>(...); scan_by_key_impl/detail::scan_by_key::dispatch have no __get_determinism_t / __validate_no_tie_break_t checks.
  • Tie-break determinism requirements can therefore likely compile and be ignored on the *ByKey env paths.

Fix: add the same tie-break rejection (requested_determinism_t via __validate_no_tie_break_t + the existing determinism static_assert) to the *ByKey EnvT path (e.g., a scan_by_key_env wrapper analogous to scan_impl_env, or the check in scan_by_key_impl/shared ByKey dispatch).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

🥳 CI Workflow Results

🟩 Finished in 2h 17m: Pass: 100%/340 | Total: 12d 16h | Max: 2h 13m | Hits: 36%/1291429

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Introduce an option to specify a tie_breaking behavior in the requirements API

1 participant