rustc_codegen_ssa: Make upstream monomorphizations representation sparse#149313
rustc_codegen_ssa: Make upstream monomorphizations representation sparse#149313osiewicz wants to merge 1 commit intorust-lang:mainfrom
Conversation
That is a lot. I wonder if |
|
That's worth a try I guess. |
|
Looks like it indeed. And |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…mean, r=<try> rustc_codegen_ssa: Make upstream monomorphizations representation sparse
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (4fcb6e6): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 0.9%, secondary -3.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 472.579s -> 470.697s (-0.40%) |
|
☔ The latest upstream changes (presumably #146348) made this pull request unmergeable. Please resolve the merge conflicts. |
This comment has been minimized.
This comment has been minimized.
9e1740f to
4d2d113
Compare
This comment has been minimized.
This comment has been minimized.
4d2d113 to
1b2e745
Compare
|
rustbot has assigned @JonathanBrouwer. Use Why was this reviewer chosen?The reviewer was selected based on:
|
|
Fair disclosure: I have used Claude Opus to generate code submitted in this PR. |
This comment has been minimized.
This comment has been minimized.
1b2e745 to
9c3a781
Compare
This comment has been minimized.
This comment has been minimized.
Upstream monomorphisations are a blessing and a curse of Zed's build performance. On one hand, we do benefit from it, as builds with share-generics disabled are slower for us (plus we don't really want to use nightly anyways). On the other, deserializing query results takes *a lot* of time. For some crates close to the end of our compilation pipeline, it's over 400ms per crate. To make matters worse, I've measured a hit ratio of upstream generics. A sample of such measurement goes as follows: ``` upstream_monomorphization returned None for 28501 distinct monomorphizations. upstream_monomorphization returned Some 2518 times. Results came from 163 distinct CrateNums. In total, there are 619731 instantiations ``` This is horrid for us, as we're using a very small percentage of the map that we spend so much time deserializing from. This commit tries to (rather clumsily) move us towards a sparse representation of upstream_monomorphizations. Instead of storing <DefId, (GenericArgsRef<'_>)> which is rather heavy to deserialize, we'll resort to storing Hashes of Instances. This commit reduces a `touch crates/editor/src/editor.rs` scenario in Zed for me from 14.5s to 11s.
9c3a781 to
67cb41f
Compare
|
Not my area of expertise, sorry I missed this PR, oops |
There was a problem hiding this comment.
IIUC, most of the gains from this PR avoid decoding exported_generic_symbols. Is that so?
Do you have more detailed statistics about cases where upstream_monomorphization returns None? Typical structure of the instances?
This PR introduces a heavy machinery to filter calls to exported_generic_symbols. Did you try manual filters?
For instance, early return None if the generic arguments contain a type from the current crate?
|
|
||
| match self.def { | ||
| InstanceKind::Item(def) => tcx | ||
| .upstream_monomorphizations_for(def) |
There was a problem hiding this comment.
Is this query still used anywhere?
| InstanceKind::Item(def) => tcx | ||
| .upstream_monomorphizations_for(def) | ||
| .and_then(|monos| monos.get(&self.args).cloned()), | ||
| InstanceKind::DropGlue(_, Some(_)) => tcx.upstream_drop_glue_for(self.args), |
There was a problem hiding this comment.
Is this query still used anywhere?
| InstanceKind::AsyncDropGlue(_, _) => None, | ||
| InstanceKind::FutureDropPollShim(_, _, _) => None, | ||
| InstanceKind::AsyncDropGlueCtorShim(_, _) => { | ||
| tcx.upstream_async_drop_glue_for(self.args) |
There was a problem hiding this comment.
Is this query still used anywhere?
|
@cjgillot yes, the gist is to not decode The stats provided in the PRs description were useful to visualize hit ratio on this beefy upstream monomorphizations map. If it had hits on most of the queries OR the set of unique used crates was almost fully overlapping with the set of all dependencies, then there would be little to no point to using a sparse representation for it. Yes, what this PR proposes is quite complex and I wish we could help it - guidance is most welcome. I did explore local-checks approach before abandoning it due to the "Vec dillema" described above. We need the I did however add local checks to the construction of I can grab the stats for you if you wish. Footnotes
|
Upstream monomorphisations are a blessing and a curse of Zed's build performance.
On one hand, we do benefit from it, as builds with share-generics disabled are slower for us (plus we don't really want to use nightly anyways).
On the other, deserializing query results takes a lot of time. For some crates close to the end of our compilation pipeline, it's over 400ms per crate.
To make matters worse, I've measured a hit ratio of upstream generics. A sample of such measurement goes as follows:
This is horrid for us, as we're using a very small percentage of the map that we spend so much time deserializing from.
This commit tries to (rather clumsily) move us towards a sparse representation of upstream_monomorphizations. Instead of storing <DefId, (GenericArgsRef<'_>)> which is rather heavy to deserialize, we'll resort to storing Hashes of Instances. I plan to make this more foolproof, hence this commit is marked as WIP.
For one, we should probably keep the projection queries. Also, it might be worthwhile to store index pointing at entry within exported_generics of target crate in order to remedy a potential for collisions.
This commit reduces a
touch crates/editor/src/editor.rsscenario in Zed for me from 14.5s to 11s.