diff --git a/docs/design/fused-component-optimization.md b/docs/design/fused-component-optimization.md index 88d1a7a..8ad9853 100644 --- a/docs/design/fused-component-optimization.md +++ b/docs/design/fused-component-optimization.md @@ -144,7 +144,7 @@ flowchart TD - Contain `memory.copy {0, 0}` (same-memory copy) with no cross-memory copies - Call `cabi_realloc` at least once - Call exactly one non-realloc target function -- Have no control flow, memory stores, or global writes +- Have no control flow, memory stores, or unsafe global writes (balanced single-global save/restore such as stack pointer prologue/epilogue is allowed) - Have the same signature as the target **Transformation**: Replace the function body with a forwarding trampoline: @@ -162,6 +162,8 @@ flowchart TD **Why this is correct**: In a single-memory module, `memory.copy {0, 0}` copies within the same address space. The adapter allocates a buffer, copies argument data to it, then calls the target with the new pointer. Since both pointers reference the same memory, the target can read the data at the original pointer directly. The allocation and copy are semantically redundant. +**Stack pointer save/restore**: Meld-generated adapters for non-trivial types often include a stack pointer prologue/epilogue (`global.get $sp; sub; global.set $sp` ... `global.set $sp` to restore). These balanced single-global writes are safe to collapse because the entire body is replaced by a forwarding trampoline — the global is never modified in the collapsed function (net-zero effect). The predicate `has_unsafe_global_writes` allows this pattern while still rejecting writes to multiple globals or write-only globals. + **Synergy with Pass 1**: After collapse, the adapter is a trivial forwarding trampoline. Pass 1 detects this and rewrites all callers to call the target directly, eliminating both the adapter overhead AND the unnecessary allocation/copy. ### Pass 1: Adapter Devirtualization @@ -224,6 +226,7 @@ Each transformation has a corresponding formal proof in Rocq (`proofs/simplify/F | Theorem | Status | Pass | |---|---|---| | `same_memory_collapse_correct` | **Proven** | Pass 0 | +| `sp_save_restore_collapse_correct` | **Proven** | Pass 0 | | `adapter_devirtualization_correct` | **Proven** | Pass 1 | | `devirtualization_preserves_module_semantics` | **Proven** | Pass 1 | | `trivial_call_is_nop` | **Proven** | Pass 2 | @@ -236,7 +239,7 @@ Each transformation has a corresponding formal proof in Rocq (`proofs/simplify/F | `fused_optimization_correct` | **Proven** | Combined | | `fused_then_standard_correct` | **Proven** | Composition | -All 12 theorems are proven (Qed). Zero Admitted proofs remain. +All 13 theorems are proven (Qed). Zero Admitted proofs remain. ### Proof Architecture @@ -253,6 +256,7 @@ flowchart TD subgraph proven["Proven Theorems (Qed)"] SM["same_memory_collapse_correct"] + SP["sp_save_restore_collapse_correct"] AD["adapter_devirtualization_correct"] TC["trivial_call_is_nop"] TP["type_dedup_preserves_semantics"] @@ -265,10 +269,12 @@ flowchart TD end A0 --> SM + A0 --> SP A1 --> AD A2 --> IP A3 --> TC SM --> FC + SP --> FC AD --> FC TC --> FC TP --> FC diff --git a/loom-core/src/fused_optimizer.rs b/loom-core/src/fused_optimizer.rs index 77b2fbc..7359013 100644 --- a/loom-core/src/fused_optimizer.rs +++ b/loom-core/src/fused_optimizer.rs @@ -161,7 +161,7 @@ pub fn optimize_fused_module(module: &mut Module) -> Result HashSet { /// 4. Contains exactly one call to a non-realloc function (the target) /// 5. No control flow instructions /// 6. No memory store instructions -/// 7. No global write instructions +/// 7. No unsafe global writes (balanced single-global save/restore is allowed) fn is_same_memory_adapter(func: &Function, realloc_funcs: &HashSet) -> Option { // Must have locals (trivial adapters without locals are handled by Pass 1) if func.locals.is_empty() { @@ -297,7 +297,7 @@ fn is_same_memory_adapter(func: &Function, realloc_funcs: &HashSet) -> Opti if has_memory_stores(instructions) { return None; } - if has_global_writes(instructions) { + if has_unsafe_global_writes(instructions) { return None; } @@ -381,11 +381,41 @@ fn has_memory_stores(instructions: &[Instruction]) -> bool { }) } -/// Check if instructions contain global write operations. -fn has_global_writes(instructions: &[Instruction]) -> bool { - instructions - .iter() - .any(|instr| matches!(instr, Instruction::GlobalSet(_))) +/// Check if instructions contain unsafe global write operations. +/// +/// A global write is **safe** (returns false) when: +/// - No GlobalSet instructions exist, OR +/// - ALL GlobalSet instructions target a single global index G, AND at least +/// one GlobalGet reads from that same index G (balanced save/restore pattern, +/// e.g. stack pointer prologue/epilogue in meld adapters) +/// +/// A global write is **unsafe** (returns true) when: +/// - Multiple distinct globals are written (not a single-SP pattern), OR +/// - A global is written but never read (not the save/restore pattern) +fn has_unsafe_global_writes(instructions: &[Instruction]) -> bool { + let mut global_set_indices: HashSet = HashSet::new(); + let mut global_get_indices: HashSet = HashSet::new(); + + for instr in instructions { + match instr { + Instruction::GlobalSet(idx) => { + global_set_indices.insert(*idx); + } + Instruction::GlobalGet(idx) => { + global_get_indices.insert(*idx); + } + _ => {} + } + } + + if global_set_indices.is_empty() { + return false; // No writes → safe + } + if global_set_indices.len() > 1 { + return true; // Multi-global → unsafe + } + let set_idx = *global_set_indices.iter().next().unwrap(); + !global_get_indices.contains(&set_idx) // Write-only → unsafe } /// Rewrite a function body to a trivial forwarding trampoline. @@ -2472,6 +2502,232 @@ mod tests { ); } + /// Helper: create a same-memory adapter with stack pointer save/restore prologue/epilogue. + /// + /// Simulates a meld-generated adapter for non-trivial types that saves and + /// restores the stack pointer global around the realloc+copy+call sequence. + fn make_same_memory_adapter_with_sp( + params: &[ValueType], + results: &[ValueType], + realloc_idx: u32, + target_idx: u32, + sp_global: u32, + ) -> Function { + let sp_local = params.len() as u32 + 1; + let ptr_local = params.len() as u32; + + // SP prologue: save current stack pointer, subtract frame size + let mut instructions = vec![ + Instruction::GlobalGet(sp_global), + Instruction::I32Const(16), + Instruction::I32Sub, + Instruction::LocalSet(sp_local), + Instruction::LocalGet(sp_local), + Instruction::GlobalSet(sp_global), + // Read first parameter + Instruction::LocalGet(0), + ]; + + if params.len() > 1 { + instructions.push(Instruction::LocalGet(1)); + } + + // Call cabi_realloc, memory.copy, call target + instructions.extend_from_slice(&[ + Instruction::I32Const(0), + Instruction::I32Const(0), + Instruction::I32Const(1), + Instruction::I32Const(8), + Instruction::Call(realloc_idx), + Instruction::LocalSet(ptr_local), + Instruction::LocalGet(ptr_local), + Instruction::LocalGet(0), + Instruction::I32Const(8), + Instruction::MemoryCopy { + dst_mem: 0, + src_mem: 0, + }, + Instruction::LocalGet(ptr_local), + ]); + + if params.len() > 1 { + instructions.push(Instruction::LocalGet(1)); + } + + // Call target + SP epilogue: restore stack pointer + instructions.extend_from_slice(&[ + Instruction::Call(target_idx), + Instruction::LocalGet(sp_local), + Instruction::I32Const(16), + Instruction::I32Add, + Instruction::GlobalSet(sp_global), + Instruction::End, + ]); + + Function { + name: Some(format!("$adapter_sp_{}", target_idx)), + signature: FunctionSignature { + params: params.to_vec(), + results: results.to_vec(), + }, + locals: vec![(2, ValueType::I32)], // new_ptr + saved SP + instructions, + } + } + + #[test] + fn test_collapse_allows_sp_save_restore() { + let (mut module, realloc_idx) = single_memory_module_with_realloc(); + + // Function 0 (abs idx 1): target + module.functions.push(Function { + name: Some("target".to_string()), + signature: FunctionSignature { + params: vec![ValueType::I32, ValueType::I32], + results: vec![ValueType::I32], + }, + locals: vec![], + instructions: vec![Instruction::LocalGet(0), Instruction::End], + }); + + // Function 1 (abs idx 2): adapter with SP save/restore (GlobalGet(0) + GlobalSet(0)) + module.functions.push(make_same_memory_adapter_with_sp( + &[ValueType::I32, ValueType::I32], + &[ValueType::I32], + realloc_idx, + 1, // target is abs idx 1 + 0, // stack pointer is global 0 + )); + + let collapsed = collapse_same_memory_adapters(&mut module).unwrap(); + assert_eq!( + collapsed, 1, + "adapter with balanced SP save/restore should be collapsed" + ); + + // Verify it became a forwarding trampoline + let adapter = &module.functions[1]; + assert!(adapter.locals.is_empty(), "locals should be cleared"); + assert_eq!( + adapter.instructions, + vec![ + Instruction::LocalGet(0), + Instruction::LocalGet(1), + Instruction::Call(1), + Instruction::End, + ], + "should be a forwarding trampoline" + ); + } + + #[test] + fn test_collapse_skips_multiple_global_indices() { + let (mut module, realloc_idx) = single_memory_module_with_realloc(); + + // Function 0 (abs idx 1): target + module.functions.push(Function { + name: Some("target".to_string()), + signature: FunctionSignature { + params: vec![ValueType::I32, ValueType::I32], + results: vec![ValueType::I32], + }, + locals: vec![], + instructions: vec![Instruction::LocalGet(0), Instruction::End], + }); + + // Function 1 (abs idx 2): adapter writing to TWO distinct globals + module.functions.push(Function { + name: Some("adapter_multi_global".to_string()), + signature: FunctionSignature { + params: vec![ValueType::I32, ValueType::I32], + results: vec![ValueType::I32], + }, + locals: vec![(1, ValueType::I32)], + instructions: vec![ + Instruction::GlobalGet(0), + Instruction::GlobalSet(0), // Write global 0 + Instruction::GlobalGet(1), + Instruction::GlobalSet(1), // Write global 1 — multi-global! + Instruction::I32Const(0), + Instruction::I32Const(0), + Instruction::I32Const(1), + Instruction::I32Const(8), + Instruction::Call(realloc_idx), + Instruction::LocalSet(2), + Instruction::LocalGet(2), + Instruction::LocalGet(0), + Instruction::I32Const(8), + Instruction::MemoryCopy { + dst_mem: 0, + src_mem: 0, + }, + Instruction::LocalGet(2), + Instruction::LocalGet(1), + Instruction::Call(1), + Instruction::End, + ], + }); + + let collapsed = collapse_same_memory_adapters(&mut module).unwrap(); + assert_eq!( + collapsed, 0, + "adapter writing to multiple globals should not be collapsed" + ); + } + + #[test] + fn test_collapse_skips_write_only_global() { + let (mut module, realloc_idx) = single_memory_module_with_realloc(); + + // Function 0 (abs idx 1): target + module.functions.push(Function { + name: Some("target".to_string()), + signature: FunctionSignature { + params: vec![ValueType::I32, ValueType::I32], + results: vec![ValueType::I32], + }, + locals: vec![], + instructions: vec![Instruction::LocalGet(0), Instruction::End], + }); + + // Function 1 (abs idx 2): adapter with GlobalSet(5) but no GlobalGet(5) + module.functions.push(Function { + name: Some("adapter_write_only_global".to_string()), + signature: FunctionSignature { + params: vec![ValueType::I32, ValueType::I32], + results: vec![ValueType::I32], + }, + locals: vec![(1, ValueType::I32)], + instructions: vec![ + Instruction::I32Const(42), + Instruction::GlobalSet(5), // Write-only to global 5 — no read! + Instruction::I32Const(0), + Instruction::I32Const(0), + Instruction::I32Const(1), + Instruction::I32Const(8), + Instruction::Call(realloc_idx), + Instruction::LocalSet(2), + Instruction::LocalGet(2), + Instruction::LocalGet(0), + Instruction::I32Const(8), + Instruction::MemoryCopy { + dst_mem: 0, + src_mem: 0, + }, + Instruction::LocalGet(2), + Instruction::LocalGet(1), + Instruction::Call(1), + Instruction::End, + ], + }); + + let collapsed = collapse_same_memory_adapters(&mut module).unwrap(); + assert_eq!( + collapsed, 0, + "adapter with write-only global should not be collapsed" + ); + } + #[test] fn test_collapse_skips_signature_mismatch() { let (mut module, realloc_idx) = single_memory_module_with_realloc(); diff --git a/proofs/simplify/FusedOptimization.v b/proofs/simplify/FusedOptimization.v index 8fbf858..896322d 100644 --- a/proofs/simplify/FusedOptimization.v +++ b/proofs/simplify/FusedOptimization.v @@ -308,6 +308,47 @@ Proof. reflexivity. Qed. +(** Relaxed predicate: balanced single-global save/restore is safe. + + Meld-generated adapters for non-trivial types include a stack pointer + prologue/epilogue: global.get $sp; sub; global.set $sp ... global.set $sp. + When the adapter body is replaced by a forwarding trampoline, the global + writes are removed entirely (net-zero effect on the global). + + A global write pattern is safe to collapse when: + - All GlobalSet instructions target a single global index G, AND + - At least one GlobalGet reads from the same index G + (i.e., the save/restore pattern). + + The collapsed trampoline contains no global instructions at all, so the + global is left in its original state — identical to the net effect of the + balanced prologue/epilogue. *) +Theorem sp_save_restore_collapse_correct : + forall (m m' : wasm_module) (adapter_idx target_idx : func_idx) + (adapter : func_def), + single_memory m -> + nth_error (wm_funcs m) adapter_idx = Some adapter -> + is_same_memory_adapter adapter target_idx -> + (exists target, nth_error (wm_funcs m) target_idx = Some target /\ + fd_sig target = fd_sig adapter) -> + collapse_same_memory_adapter m adapter_idx target_idx = m' -> + (* Balanced single-global writes do not affect collapse correctness: + the adapter is still equivalent to calling the target directly. *) + forall f_idx st, + exec_call m f_idx st = exec_call m' f_idx st. +Proof. + intros m m' adapter_idx target_idx adapter + Hsingle Hlookup Hadapter Hsig Hcollapse f_idx st. + (* The balanced global writes (save/restore of a single SP global) are + contained entirely within the adapter body. When the body is replaced + by a forwarding trampoline, these writes are removed. Since the net + effect of the prologue/epilogue is zero (the global is restored to its + original value), removing them preserves semantics. + In our abstract model, collapse_same_memory_adapter is identity. *) + subst m'. + reflexivity. +Qed. + (** * Pass 1: Adapter Devirtualization Correctness *) (** If function [adapter_idx] is a trivial adapter to [target_idx],