Follow-on to #428. The synth#428 levers took gale's dissolved gust_mix 2.81×→1.81× vs native LLVM. Reaching parity (~1.0×) is the remaining #428/#471 work (leaf-prologue shrink + an algebraic fold of 256*x>>8 → x, which synth currently emits literally). This issue is about beating LLVM — target 0.70× — using information LLVM structurally lacks.
(a) Proof-carrying specialization. gale primitives ship machine-checked invariants (Verus/Rocq/Kani): value ranges, no-overflow, no-alias. If synth consumes these as premises (carried in the IR by loom — see loom issue), it can specialize per-composition. Worked example: gust_mix is clamp(ch+476, 1000, 2000); a composition that proves ch ∈ [524,1524] makes both clamp branches dead → the function collapses to add r0,#476; bx lr. LLVM never emits that — it never had the bound. That single elision goes below 1.0×.
(b) Optimal register allocation for functions under a size threshold (the common case for dissolved primitives) — exact/exhaustive coloring where LLVM bails to linear-scan heuristics. The dissolve guarantees small, whole-known functions; exploit it.
The claim this enables: verified code faster than the unverified native build, because the proof is an optimization input. Full analysis + the tiered path + kill-criterion (gust_codegen_bench ratio <1.0 then ≤0.70, correctness bit-identical — elision sound only under the proven bound): gale benches/gust/optimization/BEAT-LLVM-0.7x.md (PR pulseengine/gale#119). Pairs with loom + meld issues (fact-passing contract across the pipeline).
Follow-on to #428. The synth#428 levers took gale's dissolved
gust_mix2.81×→1.81× vs native LLVM. Reaching parity (~1.0×) is the remaining #428/#471 work (leaf-prologue shrink + an algebraic fold of256*x>>8 → x, which synth currently emits literally). This issue is about beating LLVM — target 0.70× — using information LLVM structurally lacks.(a) Proof-carrying specialization. gale primitives ship machine-checked invariants (Verus/Rocq/Kani): value ranges, no-overflow, no-alias. If synth consumes these as premises (carried in the IR by loom — see loom issue), it can specialize per-composition. Worked example:
gust_mixisclamp(ch+476, 1000, 2000); a composition that provesch ∈ [524,1524]makes both clamp branches dead → the function collapses toadd r0,#476; bx lr. LLVM never emits that — it never had the bound. That single elision goes below 1.0×.(b) Optimal register allocation for functions under a size threshold (the common case for dissolved primitives) — exact/exhaustive coloring where LLVM bails to linear-scan heuristics. The dissolve guarantees small, whole-known functions; exploit it.
The claim this enables: verified code faster than the unverified native build, because the proof is an optimization input. Full analysis + the tiered path + kill-criterion (
gust_codegen_benchratio <1.0 then ≤0.70, correctness bit-identical — elision sound only under the proven bound): gale benches/gust/optimization/BEAT-LLVM-0.7x.md (PR pulseengine/gale#119). Pairs with loom + meld issues (fact-passing contract across the pipeline).