Do not deduplicate captured args while expanding format_args!#149926
Do not deduplicate captured args while expanding format_args!#149926ShoyuVanilla wants to merge 1 commit intorust-lang:mainfrom
format_args!#149926Conversation
|
Some changes occurred in src/tools/clippy cc @rust-lang/clippy Some changes occurred in compiler/rustc_ast_lowering/src/format.rs cc @m-ou-se |
|
r? @spastorino rustbot has assigned @spastorino. Use |
This comment has been minimized.
This comment has been minimized.
|
@rustbot author |
fb523ee to
3ebdaa4
Compare
|
@rustbot ready |
|
Nominating as per #145739 (comment) |
|
It'd be worth adding a test for the drop behavior. |
3ebdaa4 to
af89685
Compare
|
Given that this makes more sense for the language, along with the clean crater results and the intuition that it'd be surprising if anything actually leaned on this, I propose: @rfcbot fcp merge lang |
|
Team member @traviscross has proposed to merge this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns. |
|
I don't think we should do this. It will make the generated code for I don't want to end up in a situation where it would make sense for Clippy to suggest something like: Adding @rust-rfcbot concern equivalence |
|
Deduplication feels wrong to me in the context of #145739, but m-ou-se arguments seem right to me here. Do I understand correctly that this introduces a tension with potential future extensions like: println!("{a.b} {a.b}");because deduplicating these would require deduplicating expressions, not just identifiers? My guess is that stopping deduplication opens the door to such extensions, and sticking to deduplication instead makes them less realistic. (although I am having a hard time reasoning about my sentence here with all these stacked negations) I am okay with a future without |
Or alternatively, defining increasingly complex rules like "we deduplicate the We would have to define those because they'd affect the semantics of the program. If we ensure that we only deduplicate as an as-if optimization that isn't semantically visible, the user doesn't have to care about deduplication as a complicating factor, they can just treat it as an optimization.
The ability to format |
|
As the author of this PR, I approached this work under the assumption that it was a fairly straightforward and uncontroversial change, based on the discussions in #145739 I didn't expect the topic to be this debatable, and had I known that upfront, I might have reconsidered taking this on, since I'm not deeply familiar with the formatting. For that reason, I've mostly stayed quiet so far. I don't feel I have particularly strong insights to add beyond what's already been discussed. Apologies if this PR caused extra discussion or confusion; that wasn't my intention 😢 |
|
@ShoyuVanilla you don't have to be sorry! You just had the bad luck of stumbling into a superficially simple problem that has some deep subtle issues. This happens occasionally when one builds a complicated system such as a programming language, but we can't predict which issues it affects. It's nor your fault that these issues exist. :) I'm sorry this turned out to be heavier than you thought. |
There's also the only-slightly more complex rule of Also, to be clear: the above rule would allow us to move forward with implementing (and even stabilizing) |
|
Do we expect |
I've got a half-baked PR that does the optimization for captured formatted arguments: #152480. It's missing the equivalent optimization for captured width and precision arguments for now, since I think that would be more invasive. Plus, when attempting to preserve the behavior of writing the args out separately, I'm not sure this is the right thing to do, especially given that width/precision messiness. Hopefully the draft PR makes things a bit clearer at least? |
You didn't do anything wrong here. We're having a vigorous discussion about whether we should do this, but none of that was caused by your PR; we appreciate you doing the work on this. |
|
Giving a bit of additional context here, which may not be clear because while this has been discussed in several different Rust team meetings, the discussion and motivations haven't all made it onto GitHub threads, or has been spread across several threads. I've heard from a couple of folks that that context wasn't at all clear, and that can be frustrating and make it feel like things are happening for unclear or insufficiently motivated reasons. This issue was originally discovered by @theemathas in #145739 (comment) . This came up further in discussion of RFC 3626, but this is not a dependency of that RFC and that RFC is not the motivation for making the change; it is reasonably straightforward to make that RFC handle either approach. The primary motivation for many has been the consistency of the language, and avoiding what appears to be a user trap or potential source of confusion, caused by the interaction of several features, some of which users don't commonly think about. (For instance, several people reacted to the use of a nullary type constructor as a captured argument with "you can do that?".) When looking at it, folks on both lang and opsem felt it didn't make sense that That, alone, provided strong enough motivation to look into changing this behavior. The initial reaction was to consider if there'd be substantive performance impact for just removing the behavior entirely. However, feedback on this and other threads surfaced that it's reasonably common to write multiple instances of the same capture in the same format string, and that this was thus an important optimization; that led us to hesitate a bit. We wondered, at that point, if it'd be possible to do an as-if optimization rather than a user-visible optimization. @dianne came to the rescue, suggesting that an as-if optimization for many common cases should be possible. (A draft of this optimization has now been posted at #152480 .) Several folks have weighed in on this, including @RalfJung, regarding the value and desirability of the semantic change. That feedback applies to the consistency of the current Rust language, leaving aside any potential future changes we might make that would benefit from it. I'm hoping that that makes the motivations and some of the thinking behind this proposed change more explicitly clear. In particular, I hope that makes it sufficiently clear that this change is not driven primarily by the desire to make further language changes, and multiple people across multiple teams have expressed a desire to make this change even if we don't add any further extensions to the formatting machinery. |
|
Thanks for the summary! FWIW, I think the current behavior, while surprising, does at least have a "low-entropy explanation" via the desugaring to a single named argument. But with struct fields entering the picture, the behavior that includes user-visible deduplication becomes quite messy. |
|
@RalfJung Sure, it's definitely explainable. But a "low-entropy explanation" for different behavior is never going to beat a "no-entropy explanation" (namely "it works the same as it does everywhere else"). |
|
So, I filed the concern We've now seen the optimization. I no longer have this concern myself, and don't want to maintain this concern by proxy. If someone else wishes to file a concern, they should do so. @rfcbot resolve lets-see-the-optimization |
|
🔔 This is now entering its final comment period, as per the review above. 🔔 |
|
@rfcbot concern just-does-not-seem-worth-it So I've been kind of sitting this out, and I hate to do this, but I am going to raise a concern for a bit more discussion. The bottom line is that I don't really buy the "consistency" motivation for this change. I see no reason to expect that It seems like the consistency concern comes up when you look at something like But I see a rather consistent mental model which is that embedded I gather that the main argument in FAVOR of this chang is that it'd be simpler if the embedded I have a kind of internal heuristic which is like: don't break compatibility if you don't have to. Right now, if this were a fresh design, I suppose I'd be on the fence because of Am I missing something? Is there a more complex argument? If I do have this right, can somebody summarize the perf/cost hit and point me at the comments about the complexity of the proposed optimization? (The optimization smacks to me "sufficiently smart compiler" when a rather "dumb compiler hack" would do the trick...) EDIT: I read over more the comments. I can see that the |
|
Actually, what would help me is to write-out the precise desugaring expected for EDIT: I guess that based on this comment...
...it's clear enough. |
What about between
I think the best case for the simple argument is demonstrated by the earnest surprise that @theemathas (in #145739) and @Jules-Bertholet (in rust-lang/rfcs#3626 (comment)) had about the behavior. Obviously both are Rust experts. As @Jules-Bertholet said:
The simple argument really is that simple. From lang, we've pushed a consistent evaluation opsem for consts, the interpolation behavior is inconsistent with that, and that surprises people. The slightly-more-complex argument, in an RFC 3626 context, goes as follows. Let's say we want to desugar this: fn main() {
println!("{X.f} {X.f}"); //~ Format string to desugar.
}
// Feel free to ignore this scaffolding...
use core::{fmt, sync::atomic::{AtomicU8, Ordering::Relaxed}};
const X: W = W(Y);
const Y: U = U { f: V(AtomicU8::new(0)) };
struct W(U);
struct U { f: V }
struct V(AtomicU8);
impl fmt::Display for V {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let x = self.0.fetch_add(1, Relaxed);
write!(f, "displayV({x})")
}
}
impl Drop for W {
fn drop(&mut self) {
let x = self.0.f.0.fetch_add(1, Relaxed);
println!("dropW({x})");
}
}
impl core::ops::Deref for W {
type Target = U;
fn deref(&self) -> &Self::Target {
let x = self.0.f.0.fetch_add(1, Relaxed);
println!("derefW({x})");
&self.0
}
}There are three ways that come to mind: fn main() {
//println!("{X.f} {X.f}"); //~ Format string to desugar.
println!("-- 1:");
println!("{} {}", X.f, X.f); //~ Desugaring 1.
println!("-- 2:");
{ let x = X; println!("{} {}", x.f, x.f) }; //~ Desugaring 2.
println!("-- 3:");
println!("{} {0:}", X.f); //~ Desugaring 3.
}These produce outputs: I find the output of desugaring 1 satisfying, given our intended opsem expectations for consts. I find the output of desugarings 2 and 3 various degrees of unsatisfying. In particular, the idea that with For my part, if we were not to clean this up, then I'd become more skeptical of whether we'd want to expand our interpolation syntax at all. Is it worth it? I really can't imagine there being meaningful breakage here. Regarding the performance, I'm hopeful that as-if optimizations can cover the common cases, but maybe there would be some cost. Maybe in some cases people should indeed prefer to write |
|
Thinking on this. I think that looking at particular examples isn't that fruitful for me. I get a lot more value out of thinking about the desugaring, making sure it's sensible, that it does the right thing in the common cases, and letting the semanticscs of edge cases fall out from the desugaring. I think this aligns well with the way people learn: first learn the easy stuff, then learn the details, then let those details guide their intutions (the Terrance Tao More to math than rigor process). So, let's assume that the backwards compatibility is not an issue here for the moment. Then I think we are arguing about the desugarings. TC is proposing that a nice desugaring is going to be: There are other options you could take, e.g., it behaves differently in the particular case of a single variable, or, we create one argument based on the literal bytes of the expression, or more complex things. I do also think that when we look forward to To me, there are several conflicting design axioms here. One of them is "Rust is straightforward" or something like that, but another is "idiomatic nice Rust does the right thing modulo a specific set of compiler optimizations" (and should we include this particular thing in that case). I am curious: do we have any data about how often it occurs in practice that the same variable is repeated? (I don't recall.) Thinking about it, I can certainly see the argument that "why should it work differently than repeating variable references in any other context, if that's an efficiency problem, shouldn't we address it holistically". |
|
The previous few posts make me ponder this: Is there an existing complexity we can re-use for this? Like what if we imagine that f"{a.b.foo} {a.b.bar}"desugared to something more like tuple_closure_format!("{} {}", || (&a.b.foo, &a.b.bar))Then we could say "well obviously that follows the same capture rules as closures", using the less-straight-forward-but-more-useful capturing rules we've already defined for closures. (And by not needing to do it on tokens we could have much smarter rules -- maybe even things smart about For clarity, this is currently a thought exercise, not a concrete "we should change it to work that way" proposal. There might well be reasons this doesn't work at all; I haven't thought through it in detail. |
|
OK, I've given this some more thought -- but before I write anything, I want to confirm something: The optimization we've been discussing, is the point here to remove the "extra fields" from the format args struct? Can someone link me to the PR or description? That sounds like a pretty tailored and complex optimization, is it intended to be special purpose for the format-args struct or something that would apply more generally? |
|
The PR with the optimization is #152480. |
|
@RalfJung that's a big PR, is there a shorter description of what it does and how? |
|
@nikomatsakis the relevant part is the changes to AST→HIR lowering, https://github.com/rust-lang/rust/pull/152480/changes#diff-8b601779832d29e42b24bb88ccc3a5762217d72f148e8070767ee787b271190c. the way it's implemented there is as a transformation on the AST representation of |
View all comments
Resolves #145739
I ran crater with #149291.
While there are still a few seemingly flaky, spurious results, no crates appear to be affected by this breaking change.
The only hit from the lint was
https://github.com/multiversx/mx-sdk-rs/blob/813927c03a7b512a3c6ef9a15690eaf87872cc5c/framework/meta-lib/src/tools/rustc_version_warning.rs#L19-L30,
which performs formatting on consts of type
::semver::Version. These constants contain a nested::semver::Identifier(Version.pre.identifier) that has a custom destructor. However, this case is not impacted by the change, so no breakage is expected.