Skip to content

[libcu++] Adds a cuda::execution::tie_break requirement #9238

Open
elstehle wants to merge 1 commit into
NVIDIA:mainfrom
elstehle:cuda-execution-tie-break
Open

[libcu++] Adds a cuda::execution::tie_break requirement #9238
elstehle wants to merge 1 commit into
NVIDIA:mainfrom
elstehle:cuda-execution-tie-break

Conversation

@elstehle
Copy link
Copy Markdown
Contributor

@elstehle elstehle commented Jun 3, 2026

Closes #9255

Note, this is an alternative proposal to #9269

Adds cuda::execution::tie_break requirement.

cuda::execution::tie_break is a requirement (sibling to determinism/output_ordering) with values unspecified, prefer_smaller_index, and prefer_larger_index. It describes how the top-k algorithm breaks ties among elements that compare equal at its selection boundary.

A word on the motivation in top-k: ties at the K-th element are the source of non-determinism; pairing a determinism requirement with a tie-break lets users specify which amongst the competing items should make it into the result set.

The four determinism options I envision for top-k:

require(...) which items are selected
determinism::not_guaranteed non-deterministic among tied elements (fast path)
determinism::run_to_run deterministic; implementation-defined tie-break
determinism::run_to_run, tie_break::prefer_smaller_index deterministic; ties resolved toward smaller source index
determinism::run_to_run, tie_break::prefer_larger_index deterministic; ties resolved toward larger source index

My mental model: tie_break is purely a criterion about which items are selected, i.e., it operates on the result set. Ordering, i.e., {stable,unstable} / sorted / unsorted, is the orthogonal concern that operates on the result sequence (output_ordering). A tie-break says nothing about order, and is only meaningful together with a deterministic requirement (we will static_assert on this in the relevant implementation, making sure that if a tie_break is demanded, a determinism requirement is accompanying it).

I think exposing it as a cuda::execution requirement (rather than burying it in a CUB/top-k namespace) keeps the requirements interface homogeneous for users.

@elstehle elstehle requested a review from a team as a code owner June 3, 2026 14:22
@elstehle elstehle requested a review from davebayer June 3, 2026 14:22
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 3, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 3, 2026
@elstehle elstehle changed the title adds cuda::execution::tie_break requirement Adds a cuda::execution::tie_break requirement Jun 3, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 01e6e4f9-a7b1-4983-a156-2d096ef050d9

📥 Commits

Reviewing files that changed from the base of the PR and between 5a9ea63 and 74622cb.

📒 Files selected for processing (4)
  • libcudacxx/include/cuda/__execution/tie_break.h
  • libcudacxx/include/cuda/execution
  • libcudacxx/include/cuda/execution.tie_break.h
  • libcudacxx/test/libcudacxx/cuda/execution/tie_break.pass.cpp

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comments. Human reviews are what
ultimately matter for merging.

Overview

This PR introduces a new cuda::execution::tie_break requirement to the CUDA Execution namespace, enabling explicit specification of how top-k algorithms resolve ties among elements that compare equal at the selection boundary. The new requirement has three preference values: unspecified, prefer_smaller_index, and prefer_larger_index.

Changes

New Files

libcudacxx/include/cuda/__execution/tie_break.h (97 lines added)

  • Defines the core cuda::execution::tie_break requirement functionality
  • Introduces internal __tie_break_t enum with three preference values: __unspecified, __prefer_smaller_index, __prefer_larger_index
  • Implements __tie_break_holder_t<_Preference> template requirement type that stores the preference as an integral_constant
  • Provides __get_tie_break_t query object supporting environment queries
  • Exports public type aliases: unspecified_t, prefer_smaller_index_t, prefer_larger_index_t
  • Defines global constant instances: unspecified, prefer_smaller_index, prefer_larger_index, and __get_tie_break

libcudacxx/include/cuda/execution.tie_break.h (26 lines added)

  • Public convenience header that includes the core tie_break.h implementation
  • Provides standard include guards and compiler-specific system header pragmas

libcudacxx/test/libcudacxx/cuda/execution/tie_break.pass.cpp (35 lines added)

  • Compile-time test verifying that tie-break policy types derive from cuda::execution::__requirement
  • Validates that __get_tie_break() returns the expected tie-break type for each policy input

Modified Files

libcudacxx/include/cuda/execution (1 line added)

  • Includes the new cuda/__execution/tie_break.h header to expose tie-break functionality through the umbrella CUDA execution namespace

Semantics

The tie-break requirement specifies selection semantics for choosing which items are included in the result set when multiple items compete for the K-th slot. It operates orthogonally to output_ordering, which controls the ordering of the result sequence. Integration with determinism requirements:

  • determinism::not_guaranteed: Selection among tied elements is non-deterministic (fast path)
  • determinism::run_to_run: Deterministic selection with implementation-defined tie-break
  • determinism::run_to_run + tie_break::prefer_smaller_index or tie_break::prefer_larger_index: Deterministic selection with ties resolved toward the specified source index direction

Static assertions will enforce that tie-break requirements are only meaningful when paired with appropriate determinism requirements.

Walkthrough

This PR adds a new CUDA execution tie-break preference system for deterministic ordering in concurrent execution contexts. It defines an internal preference enum with three states, requirement holders for type-safe wrapping, a queryable preference interface, and public type aliases and global instances for use in execution environments and algorithms.

Changes

Tie-break requirement framework

Layer / File(s) Summary
Core tie-break types and preference holders
libcudacxx/include/cuda/__execution/tie_break.h
Introduces __tie_break_t enum with three preferences, __tie_break_holder_t template wrapping each preference as integral_constant with query() support, and __get_tie_break_t query object integrating with __queryable_with to extract preferences from execution environments.
Public header exposure and umbrella include
libcudacxx/include/cuda/execution.tie_break.h, libcudacxx/include/cuda/execution
Wraps core implementation in a public header with include guard, license header, and compiler pragmas; adds tie-break header to the cuda/execution umbrella include.
Compile-time type validation
libcudacxx/test/libcudacxx/cuda/execution/tie_break.pass.cpp
Validates that unspecified_t, prefer_smaller_index_t, and prefer_larger_index_t types conform to __requirement, and verifies __get_tie_break returns correct types for each preference input.

Suggested reviewers

  • ericniebler
  • gevtushenko

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Infer (1.2.0)
libcudacxx/test/libcudacxx/cuda/execution/tie_break.pass.cpp

libcudacxx/test/libcudacxx/cuda/execution/tie_break.pass.cpp:11:10: fatal error: 'cuda/execution.tie_break.h' file not found
11 | #include <cuda/execution.tie_break.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/74622cb2341f0d167891103ca4345b8b65bb3756-a8d15ca7e2b2abfc/tmp/clang_command_.tmp.6810d7.txt
++Contents of '/tmp/coderabbit-infer/74622cb2341f0d167891103ca4345b8b65bb3756-a8d15ca7e2b2abfc/tmp/clang_command_.tmp.6810d7.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRING_SIZE=65535" "-cc1" "-tripl

... [truncated 1183 characters] ...

nternal-isystem" "/usr/local/include" "-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-fdeprecated-macro" "-ferror-limit" "19" "-fgnuc-version=4.2.1"
"-fskip-odr-check-in-gmf" "-fcxx-exceptions" "-fexceptions"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/a8d15ca7e2b2abfc/file.o" "-x" "c++"
"libcudacxx/test/libcudacxx/cuda/execution/tie_break.pass.cpp" "-O0"
"-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

🥳 CI Workflow Results

🟩 Finished in 1h 35m: Pass: 100%/115 | Total: 1d 18h | Max: 53m 16s | Hits: 99%/339232

See results here.

@elstehle elstehle changed the title Adds a cuda::execution::tie_break requirement [libcu++] Adds a cuda::execution::tie_break requirement Jun 4, 2026
@pciolkosz
Copy link
Copy Markdown
Contributor

If tie_break is only relevant with the determinism argument, shouldn't it be somehow integrated into it? It might not be feasible given it's released already, but just wanted to double check if that design was considered. I think it would be easier to understand the tie_break if it was just inside the determinism rather than a separate guarantee that only works when the other is specified

@elstehle
Copy link
Copy Markdown
Contributor Author

elstehle commented Jun 4, 2026

If tie_break is only relevant with the determinism argument, shouldn't it be somehow integrated into it? It might not be feasible given it's released already, but just wanted to double check if that design was considered. I think it would be easier to understand the tie_break if it was just inside the determinism rather than a separate guarantee that only works when the other is specified

Yeah, but no strong opinion. I just think determinism is a requirement that most algorithms have a meaning for (reduce, scan, select), whereas the tie-break is meaningful (maybe?) only for top-k. My main rationale was that folding it with determinism would burden the determinism vocabulary with a concept only few users need.

Also, I think they answer different questions: determinism is "how reproducible" a result is, whereas tie_break influences which items to prioritize. So, I think they compose naturally as separate require(...) properties, where top-k would static_assert requiring determinism::run_to_run/gpu_to_gpu whenever a tie-break is set.

@pciolkosz
Copy link
Copy Markdown
Contributor

I think we could make the determinism guarantee to be constructible without a tie_break and then you don't really complicate the determinism usage outside of cases where someone actually cares about it. I think if we don't see any use cases of tie_break outside of determinism and we will always static_assert that determinism is used when tie_break is used, we can make tie_break easier to understand by folding it into determinism. This way it's impossible to misuse the tie_break by forgetting to use determinism. If we keep tie_break optional for determinism, I still agree it adds some extra complexity to it because the documentation will need to mention it, but I think in practice it should be acceptable.

@elstehle
Copy link
Copy Markdown
Contributor Author

elstehle commented Jun 5, 2026

I think we could make the determinism guarantee to be constructible without a tie_break and then you don't really complicate the determinism usage outside of cases where someone actually cares about it. I think if we don't see any use cases of tie_break outside of determinism and we will always static_assert that determinism is used when tie_break is used, we can make tie_break easier to understand by folding it into determinism. This way it's impossible to misuse the tie_break by forgetting to use determinism. If we keep tie_break optional for determinism, I still agree it adds some extra complexity to it because the documentation will need to mention it, but I think in practice it should be acceptable.

Would you envision something like this:

require(determinism::run_to_run)                                     // deterministic, no tie-break (default)
require(determinism::run_to_run(determinism::tie_break::prefer_smaller_index))  // + tie-break
require(determinism::gpu_to_gpu(determinism::tie_break::prefer_larger_index))

It's worth noting that with determinism::tie_break::prefer_{smaller,larger}_index, we get the same guarantee regardless of the choice of run_to_run versus gpu_to_gpu, there's one exact result set for a given input. In that case, the question would be: should we support prefer_X with both, run_to_run and gpu_to_gpu so users can still express their requirement more clearly or only support prefer_X with gpu_to_gpu? I am leaning to supporting it with both determinism levels, just so users can express their intent more granularly.

@elstehle
Copy link
Copy Markdown
Contributor Author

elstehle commented Jun 5, 2026

I think we could make the determinism guarantee to be constructible without a tie_break and then you don't really complicate the determinism usage outside of cases where someone actually cares about it. I think if we don't see any use cases of tie_break outside of determinism and we will always static_assert that determinism is used when tie_break is used, we can make tie_break easier to understand by folding it into determinism. This way it's impossible to misuse the tie_break by forgetting to use determinism. If we keep tie_break optional for determinism, I still agree it adds some extra complexity to it because the documentation will need to mention it, but I think in practice it should be acceptable.

@pciolkosz, I've opened a separate PR that implements this:
#9269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Introduce an option to specify a tie_breaking behavior in the requirements API

2 participants