Skip to content

Internal arrays (counter arrays) - optimizations #222

@cardillan

Description

@cardillan

A specific new optimizer for array access will be created.

Note: optimizations for speed should be attempted even when the goal is size, but should only be performed when the cost is zero or less.

Array access inlining

A specific jump table will be generated for given array access instruction. That way the return address will be baked in (so no need to set .array*rret/.array*wret), and the transfer variable will be replaced directly by the value being read/written. Reduces cost to 4 instructions per access.

Note: inlining should preferentially be applied inside loops/functions that access the array at multiple indexes. A single read-then-write access in a loop will be optimized to nearly the same performance in the un-inlined form, and won't bloat the code if there's another access to the same array at a different place.

Array access fusing

If two or more array access instructions (of different arrays) are used together with the same index, they can be replaced by a fused instruction, saving the costs of out-of-line calls of all accesses except the first. Always performed together with inlining. Reduces cost to 3+n instructions, where n is the number of instructions fused. Complex, as the resulting instruction needs to be aware of all input/output variables.

Further improvements

Not planned for the first iteration.

A non-inlined fused jump table could be used as a jump table for accessing the last item in the array, saving instruction space.

More complex fusing is possible, e.g. fusing access to two adjacent elements of the same array: optimizes cases like array[i] + array[i + 1].

Short array optimizations

Performed for arrays up of up to 3 elements. For arrays of length 4 or more, the jump table performs similarly well and can be shared.

Short array optimizations cause out-of-bounds accesses to always target one of the elements (the last one). Using out-of-bounds index can't derail the program execution. Nevertheless, runtime checks still get generated when prescribed by the compiler profile.

These optimizations get performed by the Array Expander for all array accesses when the optimizer is active. The optimized version is always smaller and/or faster than the jump table.

Arrays of length 1

Each element access gets converted to the access to the single element.

Arrays of length 2

Should be performed after array access fusing, to use the same jump for multiple element access.

a[x] = b; gets converted to if x == 0 then a[0] = b; else a[1] = b; end;.

b = a[x] gets converted to b = x == 0 ? a[0] : a[1];

Instruction size: 3/4, execution time: 2/3 steps.

Arrays of length 3

Should be performed after array access fusing, to use the same jump for multiple element access.

a[x] = b gets converted to

if x == 0 then
    a[0] = b;
elsif x == 1 then
    a[1] = b;
else
    a[2] = b;
end;

Analogically for the reverse assignments.

Instruction size: 7 (8 with jump table), execution time: 3/4 steps - 4 when accessing the middle element. Takes always 4 with the jump table, so it is faster in two thirds of cases.

Array code injection

Not planned for the first iteration.

When a value is read, updated and written again (think array[i]++), the update can be injected to the jump table. Saves 3 instructions.

More complex injection is possible, e.g. comparing and swapping two adjacent elements of the array (bubblesort?).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions