[libcu++] Adds exec::guarantee and the max_total_num_items guarantee#9278
[libcu++] Adds exec::guarantee and the max_total_num_items guarantee#9278elstehle wants to merge 3 commits into
exec::guarantee and the max_total_num_items guarantee#9278Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
OverviewThis PR introduces a guarantees facility to libcudacxx, enabling callers to communicate properties about problem characteristics that algorithms can exploit for optimization. The initial guarantee provides an upper bound on total items processed, useful for sizing intermediate structures in segmented or batched algorithms. Key FeaturesGuarantees Mechanism
Design Notes
Files ChangedImplementation Headers
Public Headers
Tests
SummaryThis change adds a composable, queryable guarantees API and the first guarantee, important: WalkthroughAdds a guarantee facility (base type, query key, variadic ChangesExecution Guarantees Facility
Assessment against linked issues
Suggested labels
Suggested reviewers
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Infer (1.2.0)libcudacxx/test/libcudacxx/cuda/execution/guarantee.fail.cpplibcudacxx/test/libcudacxx/cuda/execution/guarantee.fail.cpp:11:10: fatal error: 'cuda/execution.guarantee.h' file not found ... [truncated 1183 characters] ... nternal-isystem" "/usr/local/include" "-internal-isystem" libcudacxx/test/libcudacxx/cuda/execution/max_total_num_items.fail.cpplibcudacxx/test/libcudacxx/cuda/execution/max_total_num_items.fail.cpp:11:10: fatal error: 'cuda/execution.max_total_num_items.h' file not found ... [truncated 2200 characters] ... rc/clang/cFrontend_errors.ml", line 48, characters 6-141 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1cc66350-916c-4788-a194-ec55f2aa4233
📒 Files selected for processing (9)
libcudacxx/include/cuda/__execution/guarantee.hlibcudacxx/include/cuda/__execution/max_total_num_items.hlibcudacxx/include/cuda/executionlibcudacxx/include/cuda/execution.guarantee.hlibcudacxx/include/cuda/execution.max_total_num_items.hlibcudacxx/test/libcudacxx/cuda/execution/guarantee.fail.cpplibcudacxx/test/libcudacxx/cuda/execution/guarantee.pass.cpplibcudacxx/test/libcudacxx/cuda/execution/max_total_num_items.fail.cpplibcudacxx/test/libcudacxx/cuda/execution/max_total_num_items.pass.cpp
😬 CI Workflow Results🟥 Finished in 1h 20m: Pass: 87%/115 | Total: 20h 06m | Max: 53m 56s | Hits: 99%/266428See results here. |
Closes #9279
Description
Adds
cuda::execution::guaranteetogether with its first guarantee,cuda::execution::max_total_num_items. Whererequirelets a caller demand properties from an algorithm,guaranteelets a caller promise properties of the problem that an algorithm may exploit. Guarantees are bundled withguarantee(...)and surfaced through a dedicated__get_guaranteesquery, mirroringrequire.max_total_num_itemscommunicates an upper bound on the total number of items processed (e.g. the combined size of all segments incub::DeviceBatchedTopK), which an algorithm can use to size intermediate offset types. Since this bound-information may not be attachable to a specific parameter (e.g., on aDeviceBatchedTopKand similarly for segmented algorithms), we decided it should go into the guarantees API.Design decisions
max_total_num_itemsfirst,min_total_num_itemslater. Lower bounds are presumably rare in practice, so we optimize for the common case and keep the two as separate, composable guarantees (guarantee(max_total_num_items<N>(), min_total_num_items<M>())) instead of one lower+upper guarantee.max_total_num_items<N>()(static),max_total_num_items(n)(runtime), andmax_total_num_items<N>(n)(static bound + runtime refinement, assertingn <= N).int64_t: a 32-bit bound stays 32-bit instead of widening to 64-bit, such that amax_total_num_items(1000000)still provides an int32 static upper bound. Narrower types can be requested explicitly (max_total_num_items<cuda::std::int16_t{1000}>()).Example