Unalign PackedFingerprint on all hosts, not just x86 and x86-64#152710
Unalign PackedFingerprint on all hosts, not just x86 and x86-64#152710Zalathar wants to merge 1 commit intorust-lang:mainfrom
PackedFingerprint on all hosts, not just x86 and x86-64#152710Conversation
|
r? @mati865 rustbot has assigned @mati865. Use Why was this reviewer chosen?The reviewer was selected based on:
|
|
Perhaps we could ask on Zulip whether somebody with an access to AArch64 could run the benchmarks? |
|
rustc-perf should also get an arm machine soon |
|
@rustbot reroll |
|
Do we want to block this PR on such a perf run? |
|
Personally, I don't think it's necessary to block this PR on an aarch64 perf run. There are other places in the compiler where we unalign things to reduce padding, without worrying about the effect it might have on non-x86 hosts. And I don't think the original decision to make this packing x86-only was based on any concrete benchmarks. Even if we had a perf run, we would probably need to rely on cycle-count and wall-clock time, and even those would only tell us the behaviour of the benchmark machine's CPU, not the behaviour of other chips in the aarch64 ecosystem. |
Back in #78646,
DepNodewas modified to store an unalignedPackedFingerprintinstead of an 8-byte-alignedFingerprint. That reduced the size of DepNode from 24 bytes to 17 bytes (nowadays 18 bytes), resulting in considerable memory savings in incremental builds.See #152695 (comment) for a benchmark demonstrating the impact of removing that optimization.
At the time (and today), the unaligning was only performed on x86 and x86-64 hosts, because those CPUs are known to generally have low overhead for unaligned memory accesses. Hosts with other CPU architectures would continue to use an 8-byte-aligned fingerprint and a 24-byte DepNode.
Given the subsequent rise of aarch64 (especially on macOS) and other architectures, it's a shame that some commonly-used builds of rustc don't get those memory-size benefits, based on a decision made several years ago under different circumstances.
We don't have benchmarks to show the actual effect of unaligning DepNode fingerprints on various non-x86 hosts, but it seems very likely to be a good idea on Apple chips, and I have no particular reason to think that it will be catastrophically bad on other hosts. And we don't typically perform this kind of speculative pessimization in other parts of the compiler.