Skip to content

x64: report an error instead of panicking on stack frames larger than i32::MAX#13783

Open
soumik15630m wants to merge 2 commits into
bytecodealliance:mainfrom
soumik15630m:fix-x64-frame-size-overflow
Open

x64: report an error instead of panicking on stack frames larger than i32::MAX#13783
soumik15630m wants to merge 2 commits into
bytecodealliance:mainfrom
soumik15630m:fix-x64-frame-size-overflow

Conversation

@soumik15630m

@soumik15630m soumik15630m commented Jul 1, 2026

Copy link
Copy Markdown

Summary

x64 codegen panics with TryFromIntError when a function's stack frame
is large enough that offsets relative to it no longer fit in a 32-bit
immediate. This affects three places: the inline stack-probe loop
(guard_size * probe_count), clobber save/restore (RSP-relative
offsets), and incoming-argument access (RBP-relative offsets via
SyntheticAmode::IncomingArg::finalize).

This was reported downstream in
rust-lang/rustc_codegen_cranelift#1656, where a ~2GB stack allocation
([0; 536870787_usize], a [i32; N] array) triggers an ICE:
thread 'optimize module ...' panicked at cranelift/codegen/src/isa/x64/inst/emit.rs:373:18:
guard_size * probe_count is too large to fit in a 32-bit immediate: TryFromIntError(PosOverflow)

What this does

Adds ABIMachineSpec::validate_frame_layout, a new trait method with a
no-op default, overridden for x64. It's called from VCode::emit right
after compute_frame_layout, before any offset encoding takes place,
and checks:

On overflow, this returns CodegenError::Unsupported instead of
panicking. Only x64 overrides the new trait method; other backends are
unaffected (aarch64/riscv64/s390x/pulley needed a one-line ?
each to propagate emit's now-fallible return type, no behavior
change).

Testing

  • Added unit tests in isa/x64/abi.rs covering: a normal frame
    (accepted), an oversized fixed_frame_storage_size (rejected), an
    oversized tail_args_size via the RBP-relative path (rejected), and
    a frame that's under i32::MAX on its own but pushed over by the
    guard-page rounding margin (rejected) — this last case specifically
    reproduces the original bug.

Known follow-up (not included here)

bjorn3's comment mentions "all stack load/store instruction
lowerings" more broadly. I traced and covered the two offset-encoding
paths that are actually reachable from the repro (gen_clobber_save/
gen_clobber_restore and SyntheticAmode::IncomingArg::finalize), but
I have not exhaustively audited every stack-relative offset computation
in the x64 backend for the same class of narrowing. Happy to widen this
if there's a specific path you'd like checked.

Separately, rustc_codegen_cranelift currently panics when it receives
any Err from Context::compile at
src/base.rs:224
rather than converting it to a rustc diagnostic. I have open a PR rust-lang/rustc_codegen_cranelift/pull/1669 there to handle CodegenError::Unsupported the same way
the existing CodegenError::Verifier arm already does.

@soumik15630m soumik15630m requested review from a team as code owners July 1, 2026 02:13
@soumik15630m soumik15630m requested review from alexcrichton and removed request for a team July 1, 2026 02:13
@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Jul 1, 2026
Comment thread cranelift/codegen/src/isa/x64/abi.rs Outdated
if rsp_relative.saturating_add(probe_margin) > i32::MAX as u64
|| rbp_relative > i32::MAX as u64
{
return Err(crate::CodegenError::Unsupported(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably use ImplLimitExceeded instead.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, thanks. ImplLimitExceeded doesn't carry a message though ...so is it okay?Happy to switch either way; want to make sure I'm not throwing away information you'd want to keep.

insts
}

fn validate_frame_layout(frame_layout: &FrameLayout, guard_size: u32) -> CodegenResult<()> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation should probably be done in generic code (maybe the caller of compute_frame_layout?). The FrameLayout type uses 32bit ints, so the 4GB stack frame size applies to all targets.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure ... will push a commit soon

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit adding a generic check in compute_frame_layout.

I kept the x64-specific validate_frame_layout check as well, since it's
catching a different, tighter condition: x64 encodes these offsets as
signed 32-bit immediates (i32::MAX, not u32::MAX), and the inline
probestack loop rounds the frame size up to the next guard-page boundary
before encoding it - so a frame just under i32::MAX can get pushed
over it by that rounding. The original repro (~2.147GB) is under
u32::MAX (~4.295GB), so the generic check alone doesn't catch it -
only the x64-specific one does.

Happy to remove the x64-specific check if you'd rather keep this simpler
and generic-only, but wanted to flag that it currently exists to catch
a real case the generic check can't - unless other backends share x64's
signed-immediate constraint, in which case maybe it'd make sense to
generalize the tighter check too rather than keep it x64-only. Let me
know which direction you'd prefer.

@soumik15630m soumik15630m force-pushed the fix-x64-frame-size-overflow branch from 48d5999 to b94de39 Compare July 1, 2026 12:31
@soumik15630m soumik15630m requested a review from bjorn3 July 1, 2026 12:31
@alexcrichton

Copy link
Copy Markdown
Member

Thanks for the PR, but personally I'm pretty wary of adding a parallel path that calculates information about a frame that duplicates what's already in each backend. Frame handling/layout is pretty precise and requires extremely careful handling, and I wouldn't be confident that it could be accurately mirrored elsewhere in the system.

What I think should happen here is either (a) to propagate a result upwards at the time of a try_into or a checked_add happens (e.g. the try_into in this case, or (b) the case in question is updated. Given the line number it looks like this is failing to generate a stack probe, but we could enhance that logic to better support the stack probe loop with larger frame sizes.

@soumik15630m

soumik15630m commented Jul 2, 2026

Copy link
Copy Markdown
Author

@alexcrichton
Thanks for the detailed feedback.
Dug into this in detail. There are actually two different kinds of
32-bit narrowing happening here, and I think they need different
treatment:

1. RSP arithmetic (the subq_mi/addq_mi adjustments in the probe
loop and clobber save/restore) - this is genuinely fixable by widening.
r11/r15 are already reserved as scratch registers in this ABI, and
movabsq_oi already exists to materialize a 64-bit value into one. So
for these sites, we could fall back to "materialize into a scratch reg,
then register-to-register add/sub" when the value doesn't fit in i32.

2. Stack-relative memory addressing (Amode::imm_reg(offset, rsp),
used for touching probe pages, clobber load/store, incoming-arg access)

  • this hits an actual x86-64 ISA limit, not a Cranelift shortcut: the
    addressing mode's displacement field is a hard 32-bit signed value at
    the instruction-encoding level. The fix is computing the absolute
    effective address into a register first, then addressing off that with
    a zero displacement.

One thing worth flagging on option (a): MachInstEmit::emit returns
() for all 5 backends, with no error path anywhere in MachBuffer.
So propagating a Result from the actual try_from/.expect() sites
in emit.rs would mean changing that trait's signature - which is
called once per instruction, everywhere, across every backend. That's
a much bigger surface than I first thought, and probably bigger than
(b), which stays contained to x64 frame setup/teardown.

Given that, (b) seems like the more proportionate direction, and I'm
happy to take it on. Let me know if that matches your thinking or if
I'm missing a narrower way to do (a).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants