Skip to content

Optimized-path byte-size estimator is incomplete (Cmp/Cmn/Adds/Subs/Popcnt) — latent branch-displacement drift #498

Description

@avrabe

Summary

Follow-up to #483. The byte_offsets size estimator in optimizer_bridge.rs::ir_to_arm (a hand-maintained per-op size table that drives branch-displacement resolution) is missing correct arms for several variable-width ops it emits. They fall through to _ => 2, but the encoder can emit a 4-byte .w form:

  • Cmp / Cmncmp r0,#0 is 2 bytes, but cmp.w r8,#0 (high reg or imm>255) is 4.
  • Adds / Subs — 2 bytes for low regs/small imm, 4 for high regs/large imm.
  • Popcnt — a multi-instruction pseudo-op expansion (~50 bytes), sized as 2.

#483 fixed the Strh/Strb/Ldrh/Ldrsh/Ldrb/Ldrsb arms (always 4 in the optimized path). The remaining gaps only corrupt a branch displacement when a mis-sized op sits between a branch and its target, so they are not triggered by the #483 repro — but they are the same latent class.

Why it persists structurally

synth-synthesis cannot depend on synth-backend (the dependency is the other way), so the estimator cannot call the real encoder and instead duplicates its size logic — which drifts. The robust fix is to remove the duplication: expose instruction sizes from a shared location, or have the backend return per-instruction sizes the optimizer can consume. Belongs to the VCR-* program (#242) — correctness from construction rather than a hand-maintained mirror.

Acceptance

A block/br_if (or loop) fixture whose body spans a cmp.w/adds.w/popcnt between branch and target, run through the optimized path under unicorn, matches wasmtime; ideally the estimator stops duplicating encoder size logic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions