Skip to content

ZJIT: Shrink Insn and InsnId #986

@tekknolagi

Description

@tekknolagi

Right now Insn is 120 bytes (gasp!) and InsnId is 64 bits. ruby#16685 does some boxing and bit packing to get it down to 48 bytes but a) that's still pretty big and b) it's still a very ad-hoc representation.

enum Insn {
    Const { val: Const },
    Param,
    LoadArg { idx: u32, id: ID, val_type: Type },
    Entries { targets: Vec<BlockId> },
    StringCopy { val: InsnId, chilled: bool, state: InsnId },
    StringIntern { val: InsnId, state: InsnId },
    // ...
    FixnumAdd  { left: InsnId, right: InsnId, state: InsnId },
    // ...
}

If we copy a little bit of e.g. Cranelift's homework, we can instead get both a more regular representation and a much smaller Insn (say, 24 bytes, assuming InsnId is u32):

pub struct InsnData {
    opcode: Opcode,
    args:   [InsnId; 4],
    data:   DataRef,
}

We can get a lot out of this representation. We have 133 opcodes (wow), of which:

  • 14 have no InsnId at all (10%)
  • 24 hold 1 InsnId (18%)
  • 45 hold 3 InsnId (33%)
  • 1 holds 4 InsnId (0%)
  • 1 has an optional InsnId
  • 24 are variadic (18%)
  • 59 have some kind of other data attached (44%)

The majority (81%) of all opcodes could store their operands inline in this new representation, with the variadic instructions referencing an operand pool.

Data-holding instructions would reference separate typed data pools (imagine interning Types, a CCallData pool, an Invariant pool, ...).

Some APIs become cleaner: the for_each_... variants no longer need to case-by-case, the Display could probably get cleaned up,

Now, there are some problems with this:

  • We have a lot of ergonomic niceties from having enum-of-structs. We match a lot.
  • The migration would be rough. That being said, we could probably make it incremental by adding an expand instruction that gives us the enum variant back as a temporary holdover.
  • It's a big experimental change. I imagine it might improve compile times and shrink native stack frames. I imagine it might make value numbering easier. But I'm not sure.

I'm not wed to any particular new representation but I do have an interest in shrinking Insn and InsnId and generally speeding up the compiler.

This could probably also be applied to LIR, too, which suffers from similar problems (size of LIR Insn is 192).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions