Skip to content

Allow explicit data transfers to GPUs#156620

Draft
ZuseZ4 wants to merge 14 commits into
rust-lang:mainfrom
ZuseZ4:offload-explicit-datatransfer
Draft

Allow explicit data transfers to GPUs#156620
ZuseZ4 wants to merge 14 commits into
rust-lang:mainfrom
ZuseZ4:offload-explicit-datatransfer

Conversation

@ZuseZ4

@ZuseZ4 ZuseZ4 commented May 15, 2026

Copy link
Copy Markdown
Member

View all comments

So far we had our offload intrinsics handle data movement automatically to/from the gpu.
That's convenient (and reasonably fast once our LLVM opts land). However, Rust generally also allows being explicit. That might give perf benefits (where our LLVM opts fail), and it could also be nice for modelling, when passing data around but still preventing CPU users from accesing it.

@ZuseZ4 ZuseZ4 added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. F-gpu_offload `#![feature(gpu_offload)]` labels May 15, 2026
@rustbot rustbot added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 15, 2026
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from e475c46 to da102aa Compare May 15, 2026 22:37
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4

ZuseZ4 commented May 15, 2026

Copy link
Copy Markdown
Member Author

Vendoring llvm/llvm-project#198033 for now.

@rust-log-analyzer

This comment has been minimized.

@rust-bors

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from abc274d to 1d8d1e7 Compare May 29, 2026 01:47
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from 1d8d1e7 to a94ef31 Compare May 29, 2026 02:58
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch 2 times, most recently from 4b77bad to 319ef7d Compare May 31, 2026 00:47
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from fba7eb2 to 358171b Compare May 31, 2026 02:35
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from 2f1d614 to bbe3882 Compare May 31, 2026 20:03
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from bbe3882 to d290591 Compare May 31, 2026 20:32
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from d290591 to e8ad696 Compare May 31, 2026 21:59
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the offload-explicit-datatransfer branch from 4f5c325 to 6c8bec9 Compare June 1, 2026 01:36
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

Comment thread library/core/src/offload/mod.rs Outdated
Comment thread library/core/src/offload/mod.rs Outdated
Comment on lines +39 to +41
// This exists so MIR creates Drop terminators for PreloadMut.
// rustc codegen intercepts those terminators and emits the
// offload return mapper.

@oli-obk oli-obk Jun 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not just an intrinsic call here?

View changes since the review

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partly just experimenting, partly because intrinsics recently changed a bit, they got updated for more explicit Place handling, about which I didn't want to think for my mvp. I'll update them to intrinsics after my deadline.

Comment thread compiler/rustc_codegen_ssa/src/mir/block.rs Outdated

#[lang = "preload"]
#[unstable(feature = "offload", issue = "124509")]
pub fn preload<'a, T: ?Sized>(x: &'a T) -> Preload<'a, T> {

@oli-obk oli-obk Jun 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think these should just be intrinsics instead of catching lang item calls during codegen of call terminators.

View changes since the review

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Which lang items"? (my code who fails to catch an inlined terminator call :D)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, with an intrinsic it actually seems to work in release.
Not sure if we want one intrinsic with 2 arguments (mut/const, init/drop) or 4 intrinsics. Right now I have two.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

Copy link
Copy Markdown
Collaborator

The job tidy failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
fmt check
Diff in /checkout/compiler/rustc_codegen_llvm/src/builder/gpu_helper.rs:1:
-use crate::SimpleCx;
-use crate::builder::Builder;
-use crate::intrinsic::TransferType;
-use crate::llvm;
-use crate::llvm::{Type, Value};
 use rustc_abi::Align;
 use rustc_codegen_ssa::MemFlags;
 use rustc_codegen_ssa::common::TypeKind;
Diff in /checkout/compiler/rustc_codegen_llvm/src/builder/gpu_helper.rs:9:
 use rustc_codegen_ssa::traits::{BaseTypeCodegenMethods, BuilderMethods};
 use rustc_middle::bug;
 use rustc_middle::ty::offload_meta::{OffloadMetadata, OffloadSize};
+
+use crate::builder::Builder;
+use crate::intrinsic::TransferType;
+use crate::llvm::{Type, Value};
+use crate::{SimpleCx, llvm};
 
 pub(crate) fn scalar_width<'ll>(cx: &'ll SimpleCx<'_>, ty: &'ll Type) -> u64 {
     match cx.type_kind(ty) {
Diff in /checkout/library/core/src/offload/mod.rs:1:
 // offload module
 #[unstable(feature = "gpu_offload", issue = "131513")]
 pub use crate::macros::builtin::offload_kernel;
+use crate::marker::PhantomData;
 #[unstable(feature = "gpu_offload", issue = "131513")]
 pub use crate::offload;
-
-use crate::marker::PhantomData;
 
 // We store a raw pointer instead of a reference, since the real location of the data will be on a
 // GPU, at a different address. We only use the CPU pointer as a key to our runtime cpu-gpu pointer
fmt: checked 6902 files
Bootstrap failed while executing `test src/tools/tidy tidyselftest --extra-checks=py,cpp,js,spellcheck`
Build completed unsuccessfully in 0:00:48
  local time: Wed Jun 10 22:41:33 UTC 2026
  network time: Wed, 10 Jun 2026 22:41:33 GMT

@Sa4dUs Sa4dUs mentioned this pull request Jun 18, 2026
@rust-bors

rust-bors Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

☔ The latest upstream changes (presumably #158416) made this pull request unmergeable. Please resolve the merge conflicts by rebasing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. F-gpu_offload `#![feature(gpu_offload)]` S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants