Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ members = [
[workspace.package]
version = "0.3.0"
authors = ["PulseEngine <https://github.com/pulseengine>"]
edition = "2021"
edition = "2024"
license = "Apache-2.0"
repository = "https://github.com/pulseengine/loom"
rust-version = "1.75"
rust-version = "1.85"

[workspace.dependencies]
# WebAssembly parsing and encoding
Expand Down
72 changes: 54 additions & 18 deletions docs/design/fused-component-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,8 +125,9 @@ Multiple components may import the same external function. These are merged and

```mermaid
flowchart TD
Input[Fused Module from meld] --> P0
P0["Pass 0: Same-Memory Adapter Collapse"] --> P1
Input[Fused Module from meld] --> P0a
P0a["Pass 0a: Memory Import Dedup"] --> P0b
P0b["Pass 0b: Same-Memory Adapter Collapse"] --> P1
P1["Pass 1: Adapter Devirtualization"] --> P2
P2["Pass 2: Trivial Call Elimination"] --> P3
P3["Pass 3: Function Type Deduplication"] --> P4
Expand All @@ -135,13 +136,26 @@ flowchart TD
Output[Cleaned Module] --> Standard["loom 12-Phase Pipeline"]
```

### Pass 0: Same-Memory Adapter Collapse
### Pass 0a: Memory Import Deduplication

**What**: Merge identical memory imports and remap all memory references to index 0.

**When**: Meld fuses components that share the same host memory, producing a module with multiple memory imports that all reference the same underlying linear memory (identical `(module, name)` pair). Per WASM spec §2.5.10, the same import key resolves to the same binding.

**Transformation**: Remove duplicate memory imports, remap all memory-indexed instructions (`memory.copy`, `memory.size`, `memory.grow`, `memory.fill`, `memory.init`), data segment memory indices, and memory exports to index 0.

**Safety (all-or-nothing)**: This pass only fires when ALL memory imports are identical and there are no local memories. Partial dedup could leave references to removed memory indices, so all-or-nothing is the only safe strategy.

**Synergy with Pass 0b**: After dedup, a previously multi-memory module becomes single-memory, enabling Pass 0b (same-memory adapter collapse) to fire on adapters that were previously skipped.

### Pass 0b: Same-Memory Adapter Collapse

**What**: Collapse same-memory adapters (realloc + memory.copy within one linear memory) into trivial forwarding trampolines.

**Pattern detected**: Functions in single-memory modules that:
**Pattern detected**: Functions where all memory operations target a single consistent memory index:
- Have locals (temporary pointers for the copy)
- Contain `memory.copy {0, 0}` (same-memory copy) with no cross-memory copies
- Contain `memory.copy {N, N}` (same-memory copy, any consistent index N) with no cross-memory copies
- All load/store instructions target the same memory index N
- Call `cabi_realloc` at least once
- Call exactly one non-realloc target function
- Have no unsafe control flow (null-guard `If` blocks with empty/nop else bodies and safe-to-discard then bodies are allowed; `Block`, `Loop`, `Br`, `BrIf`, `BrTable` are rejected; balanced single-global save/restore is allowed; memory stores are allowed — they target the discarded realloc'd buffer)
Expand All @@ -160,14 +174,16 @@ flowchart TD
call $target)
```

**Why this is correct**: In a single-memory module, `memory.copy {0, 0}` copies within the same address space. The adapter allocates a buffer, copies argument data to it, then calls the target with the new pointer. Since both pointers reference the same memory, the target can read the data at the original pointer directly. The allocation and copy are semantically redundant.
**Why this is correct**: When all memory operations target the same linear memory (index N), `memory.copy {N, N}` copies within a single address space. The adapter allocates a buffer, copies argument data to it, then calls the target with the new pointer. Since both pointers reference the same memory, the target can read the data at the original pointer directly. The allocation and copy are semantically redundant. This holds for any memory index N, not just 0 — the key invariant is consistency within the adapter.

**Stack pointer save/restore**: Meld-generated adapters for non-trivial types often include a stack pointer prologue/epilogue (`global.get $sp; sub; global.set $sp` ... `global.set $sp` to restore). These balanced single-global writes are safe to collapse because the entire body is replaced by a forwarding trampoline — the global is never modified in the collapsed function (net-zero effect). The predicate `has_unsafe_global_writes` allows this pattern while still rejecting writes to multiple globals or write-only globals.

**Memory stores**: Meld-generated adapters for complex types (records, lists) perform field-by-field copying via load+store pairs alongside `memory.copy`. These stores target the freshly realloc'd buffer, which is discarded when the adapter body is replaced by a forwarding trampoline. The target reads from the original pointer in the same address space, so the stores are semantically redundant.

**Null-guard If blocks**: Meld-generated adapters for optional/nullable types wrap the realloc+copy in a null-guard `if` block. These are safe to collapse because: if the condition is true (Some), the copy is same-memory redundant — skipping it is equivalent; if false (None), the copy never ran anyway. The then_body must contain only safe-to-discard instructions (locals, constants, arithmetic, loads, stores, realloc calls, `memory.copy {0, 0}`). Nested `If` inside `If`, non-empty else bodies, `Block`, `Loop`, `Br`, `BrIf`, and `BrTable` remain rejected.

**Cross-memory adapters**: Adapters where memory operations target different indices (e.g., `memory.copy {0, 1}` or loads from memory 0 mixed with stores to memory 1) are NOT collapsed. These represent genuine cross-component data transfer between distinct linear memories, where the copy is semantically necessary. Such adapters are detected and counted as `cross_memory_adapters_detected` for diagnostic visibility, but no optimization is applied.

**Synergy with Pass 1**: After collapse, the adapter is a trivial forwarding trampoline. Pass 1 detects this and rewrites all callers to call the target directly, eliminating both the adapter overhead AND the unnecessary allocation/copy.

### Pass 1: Adapter Devirtualization
Expand Down Expand Up @@ -229,10 +245,12 @@ Each transformation has a corresponding formal proof in Rocq (`proofs/simplify/F

| Theorem | Status | Pass |
|---|---|---|
| `same_memory_collapse_correct` | **Proven** | Pass 0 |
| `sp_save_restore_collapse_correct` | **Proven** | Pass 0 |
| `null_guard_collapse_correct` | **Proven** | Pass 0 |
| `store_discard_collapse_correct` | **Proven** | Pass 0 |
| `memory_import_dedup_preserves_semantics` | **Proven** | Pass 0a |
| `same_memory_collapse_correct` | **Proven** | Pass 0b |
| `sp_save_restore_collapse_correct` | **Proven** | Pass 0b |
| `null_guard_collapse_correct` | **Proven** | Pass 0b |
| `store_discard_collapse_correct` | **Proven** | Pass 0b |
| `generalized_same_memory_collapse_correct` | **Proven** | Pass 0b |
| `adapter_devirtualization_correct` | **Proven** | Pass 1 |
| `devirtualization_preserves_module_semantics` | **Proven** | Pass 1 |
| `trivial_call_is_nop` | **Proven** | Pass 2 |
Expand All @@ -245,11 +263,11 @@ Each transformation has a corresponding formal proof in Rocq (`proofs/simplify/F
| `fused_optimization_correct` | **Proven** | Combined |
| `fused_then_standard_correct` | **Proven** | Composition |

All 15 theorems are proven (Qed). Zero Admitted proofs remain.
All 17 theorems are proven (Qed). Zero Admitted proofs remain.

### Proof Architecture

The proofs rely on four well-justified semantic axioms from the WASM spec:
The proofs rely on five well-justified semantic axioms from the WASM spec:

```mermaid
flowchart TD
Expand All @@ -258,13 +276,17 @@ flowchart TD
A1["trivial_adapter_equiv\n(Spec §4.4.8: call semantics)"]
A2["identical_import_equiv\n(Spec §2.5.10: import resolution)"]
A3["trivial_call_nop\n(Spec §4.4.8: void call = no-op)"]
A4["identical_memory_import_equiv\n(Spec §2.5.10: import resolution)"]
A4b["same_memory_adapter_general_equiv\n(Generalized: any consistent memory N)"]
end

subgraph proven["Proven Theorems (Qed)"]
MD["memory_import_dedup_preserves"]
SM["same_memory_collapse_correct"]
SP["sp_save_restore_collapse_correct"]
NG["null_guard_collapse_correct"]
SD["store_discard_collapse_correct"]
GS["generalized_same_memory_collapse_correct"]
AD["adapter_devirtualization_correct"]
TC["trivial_call_is_nop"]
TP["type_dedup_preserves_semantics"]
Expand All @@ -276,17 +298,21 @@ flowchart TD
FS["fused_then_standard_correct"]
end

A4 --> MD
A0 --> SM
A0 --> SP
A0 --> NG
A0 --> SD
A4b --> GS
A1 --> AD
A2 --> IP
A3 --> TC
MD --> FC
SM --> FC
SP --> FC
NG --> FC
SD --> FC
GS --> FC
AD --> FC
TC --> FC
TP --> FC
Expand Down Expand Up @@ -316,13 +342,14 @@ flowchart TD
Input --> Fused

subgraph Fused["Fused Component Optimization"]
F0["0. Same-memory adapter collapse"]
F0a["0a. Memory import dedup"]
F0b["0b. Same-memory adapter collapse"]
F1["1. Adapter devirtualization"]
F2["2. Trivial call elimination"]
F3["3. Type deduplication"]
F4["4. Dead function elimination"]
F5["5. Import deduplication"]
F0 --> F1 --> F2 --> F3 --> F4 --> F5
F0a --> F0b --> F1 --> F2 --> F3 --> F4 --> F5
end

Fused --> Standard
Expand Down Expand Up @@ -371,6 +398,7 @@ let mut module: Module = parse_wasm(&bytes)?;

// Apply fused-specific optimizations first
let stats: FusedOptimizationStats = optimize_fused_module(&mut module)?;
println!("Memory imports deduplicated: {}", stats.memory_imports_deduplicated);
println!("Same-memory adapters collapsed: {}", stats.same_memory_adapters_collapsed);
println!("Adapters devirtualized: {}", stats.calls_devirtualized);
println!("Trivial calls eliminated: {}", stats.trivial_calls_eliminated);
Expand All @@ -390,20 +418,22 @@ optimize_module(&mut module)?;

| Feature | Status | Coverage |
|---|---|---|
| Same-memory adapter collapse | Done | Single-memory modules with realloc+copy adapters |
| Memory import deduplication | Done | Identical memory imports (aliased host memory) |
| Same-memory adapter collapse | Done | Any module with same-memory realloc+copy adapters (any consistent memory index) |
| Trivial adapter devirtualization | Done | All direct adapters |
| Trivial call elimination | Done | () -> () no-op functions |
| Function type deduplication | Done | Basic types (skips GC) |
| Dead function elimination | Done | With element segment parsing |
| Function import deduplication | Done | Function imports only |
| Correctness proofs | Done | All 15 theorems proven (Qed) |
| Cross-memory adapter diagnostics | Done | Detection and counting (not collapsed) |
| Correctness proofs | Done | All 17 theorems proven (Qed) |

### Planned

```mermaid
flowchart TD
subgraph tier1["Tier 1: Next Optimizations"]
T1["Memory-crossing adapter simplification"]
T1["Scalar return elision for cross-memory adapters"]
end

subgraph tier2["Tier 2: Advanced"]
Expand All @@ -416,7 +446,7 @@ flowchart TD

| Feature | Priority | Impact | Effort |
|---|---|---|---|
| Memory-crossing adapter simplification | High | Eliminates entire adapters in shared-memory | High |
| Scalar return elision (cross-memory) | Medium | Avoids copy-back for scalar returns in cross-memory adapters | Medium |
| String transcoding detection | Low | Rare but high savings when hit | Very High |
| Multi-memory adapter inlining | Low | Reduces trampoline overhead in multi-memory mode | High |

Expand All @@ -442,6 +472,12 @@ flowchart TD

**Rationale**: Every import affects all function indices (local functions are numbered after imports). Removing duplicate imports shifts indices, which is cleaner to do once before the standard pipeline runs multiple analysis passes.

### Why Memory Import Dedup Before Adapter Collapse?

**Decision**: Run memory import deduplication (Pass 0a) before same-memory adapter collapse (Pass 0b).

**Rationale**: When meld fuses components sharing the same host memory, it emits multiple identical memory imports. While Pass 0b now handles multi-memory modules (collapsing adapters that use a consistent memory index), running Pass 0a first normalizes memory indices and may expose additional same-memory adapters. For example, adapters using `memory.copy {1, 1}` that are aliased to memory 0 become `memory.copy {0, 0}` after dedup, simplifying the index space for all subsequent passes.

### Why Only Function Import Deduplication?

**Decision**: Only deduplicate function imports, not memory/table/global imports.
Expand Down
6 changes: 3 additions & 3 deletions loom-cli/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
//!
//! Command-line tool for optimizing WebAssembly modules

use anyhow::{anyhow, Context, Result};
use anyhow::{Context, Result, anyhow};
use clap::{Parser, Subcommand};
use std::fs;
use std::path::Path;
Expand Down Expand Up @@ -374,7 +374,7 @@ fn optimize_command(
let should_run = |pass_name: &str| -> bool {
enabled_passes
.as_ref()
.map_or(true, |passes| passes.contains(&pass_name))
.is_none_or(|passes| passes.contains(&pass_name))
};

// Helper to track pass stats
Expand Down Expand Up @@ -563,7 +563,7 @@ fn optimize_command(
/// Run property-based verification on the module
fn run_verification(original: &loom_core::Module, optimized: &loom_core::Module) -> Result<()> {
use loom_core::Instruction;
use loom_isle::{iadd32, iconst32, simplify, Imm32, ValueData};
use loom_isle::{Imm32, ValueData, iadd32, iconst32, simplify};

// First, run Z3 SMT-based formal verification (if feature enabled)
#[cfg(feature = "verification")]
Expand Down
2 changes: 1 addition & 1 deletion loom-core/benches/optimization_benchmarks.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
//!
//! Run with: cargo bench

use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion};
use criterion::{BenchmarkId, Criterion, black_box, criterion_group, criterion_main};
use loom_core::{optimize, parse};

/// Benchmark constant folding performance
Expand Down
2 changes: 1 addition & 1 deletion loom-core/src/component_executor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
//! It validates that optimization preserves component structure, exports, and canonical functions
//! by parsing and analyzing the component binary format without requiring runtime instantiation.

use anyhow::{anyhow, Context, Result};
use anyhow::{Context, Result, anyhow};

/// Execution result for a component after optimization
#[derive(Debug, Clone)]
Expand Down
12 changes: 10 additions & 2 deletions loom-core/src/component_optimizer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
//! println!("Size reduction: {:.1}%", stats.reduction_percentage());
//! ```

use anyhow::{anyhow, Context, Result};
use anyhow::{Context, Result, anyhow};
use wasmparser::{Encoding, Parser, Payload};

/// Statistics about component optimization
Expand Down Expand Up @@ -218,16 +218,24 @@ fn optimize_core_module(module_bytes: &[u8]) -> Result<Vec<u8>> {
|| fused_stats.dead_functions_eliminated > 0
|| fused_stats.imports_deduplicated > 0
|| fused_stats.trivial_calls_eliminated > 0
|| fused_stats.memory_imports_deduplicated > 0
{
eprintln!(
" Fused optimization: {} adapters devirtualized, {} trivial calls eliminated, {} types deduped, {} dead funcs removed, {} imports deduped",
" Fused optimization: {} mem imports deduped, {} adapters devirtualized, {} trivial calls eliminated, {} types deduped, {} dead funcs removed, {} imports deduped",
fused_stats.memory_imports_deduplicated,
fused_stats.calls_devirtualized,
fused_stats.trivial_calls_eliminated,
fused_stats.types_deduplicated,
fused_stats.dead_functions_eliminated,
fused_stats.imports_deduplicated,
);
}
if fused_stats.cross_memory_adapters_detected > 0 {
eprintln!(
" Cross-memory adapters detected (not collapsed): {}",
fused_stats.cross_memory_adapters_detected,
);
}

// Validate after fused optimization
let bytes = crate::encode::encode_wasm(&module)?;
Expand Down
Loading