Skip to content

Do not deduplicate captured args while expanding format_args!#149926

Open
ShoyuVanilla wants to merge 1 commit intorust-lang:mainfrom
ShoyuVanilla:no-dedup-fmt
Open

Do not deduplicate captured args while expanding format_args!#149926
ShoyuVanilla wants to merge 1 commit intorust-lang:mainfrom
ShoyuVanilla:no-dedup-fmt

Conversation

@ShoyuVanilla
Copy link
Member

@ShoyuVanilla ShoyuVanilla commented Dec 12, 2025

View all comments

Resolves #145739

I ran crater with #149291.
While there are still a few seemingly flaky, spurious results, no crates appear to be affected by this breaking change.

The only hit from the lint was
https://github.com/multiversx/mx-sdk-rs/blob/813927c03a7b512a3c6ef9a15690eaf87872cc5c/framework/meta-lib/src/tools/rustc_version_warning.rs#L19-L30,
which performs formatting on consts of type ::semver::Version. These constants contain a nested ::semver::Identifier (Version.pre.identifier) that has a custom destructor. However, this case is not impacted by the change, so no breakage is expected.

@rustbot
Copy link
Collaborator

rustbot commented Dec 12, 2025

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

Some changes occurred in compiler/rustc_ast_lowering/src/format.rs

cc @m-ou-se

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-clippy Relevant to the Clippy team. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 12, 2025
@rustbot
Copy link
Collaborator

rustbot commented Dec 12, 2025

r? @spastorino

rustbot has assigned @spastorino.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rust-log-analyzer

This comment has been minimized.

@ShoyuVanilla
Copy link
Member Author

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 13, 2025
@ShoyuVanilla
Copy link
Member Author

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 13, 2025
@theemathas theemathas added the I-lang-nominated Nominated for discussion during a lang team meeting. label Dec 14, 2025
@theemathas
Copy link
Contributor

Nominating as per #145739 (comment)

@traviscross traviscross added P-lang-drag-1 Lang team prioritization drag level 1. https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang T-lang Relevant to the language team needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. labels Dec 14, 2025
@traviscross
Copy link
Contributor

traviscross commented Dec 14, 2025

It'd be worth adding a test for the drop behavior.

@traviscross
Copy link
Contributor

traviscross commented Dec 14, 2025

Given that this makes more sense for the language, along with the clean crater results and the intuition that it'd be surprising if anything actually leaned on this, I propose:

@rfcbot fcp merge lang

@rust-rfcbot
Copy link
Collaborator

rust-rfcbot commented Dec 14, 2025

Team member @traviscross has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@rust-rfcbot rust-rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Dec 14, 2025
@m-ou-se m-ou-se assigned m-ou-se and unassigned spastorino Dec 17, 2025
@m-ou-se
Copy link
Member

m-ou-se commented Dec 24, 2025

I don't think we should do this. It will make the generated code for println!("{x} {x}"); less efficient, as it will get two separate arguments instead of one.

I don't want to end up in a situation where it would make sense for Clippy to suggest something like:

warning: using the same placeholder multiple times is inefficient as of Rust 1.94.0
 --> src/main.rs:3:5
  |
3 |     println!("{x} {x}");
  |     ^^^^^^^^^^^^^^^^^^^
  |
help: change this to
  |
3 -     println!("{x} {x}");
3 +     println!("{x} {x}", x = x);
  |

Adding , x = x shouldn't make a difference. If adding that makes the resulting code more efficient, I strongly feel like we've done something wrong.

@rust-rfcbot concern equivalence

@rust-rfcbot rust-rfcbot removed the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Feb 10, 2026
@iago-lito
Copy link
Contributor

Deduplication feels wrong to me in the context of #145739, but m-ou-se arguments seem right to me here.

Do I understand correctly that this introduces a tension with potential future extensions like:

println!("{a.b} {a.b}");

because deduplicating these would require deduplicating expressions, not just identifiers?

My guess is that stopping deduplication opens the door to such extensions, and sticking to deduplication instead makes them less realistic.

(although I am having a hard time reasoning about my sentence here with all these stacked negations)

I am okay with a future without format!("{field.access.paths}") and only format!("{strict_formatting_identifiers}"), if this makes sense to Rust. The former is much comfortable but it actually does feel a bit "too magic" to me.

@joshtriplett
Copy link
Member

joshtriplett commented Feb 10, 2026

because deduplicating these would require deduplicating expressions, not just identifiers?

Or alternatively, defining increasingly complex rules like "we deduplicate the a but not the a.b", factoring out common prefixes...

We would have to define those because they'd affect the semantics of the program. If we ensure that we only deduplicate as an as-if optimization that isn't semantically visible, the user doesn't have to care about deduplication as a complicating factor, they can just treat it as an optimization.

I am okay with a future without format!("{field.access.paths}") and only format!("{strict_formatting_identifiers}"), if this makes sense to Rust. The former is much comfortable but it actually does feel a bit "too magic" to me.

The ability to format {self.field} is a very common request.

@ShoyuVanilla
Copy link
Member Author

As the author of this PR, I approached this work under the assumption that it was a fairly straightforward and uncontroversial change, based on the discussions in #145739
. The main challenge I anticipated was identifying the affected crates.

I didn't expect the topic to be this debatable, and had I known that upfront, I might have reconsidered taking this on, since I'm not deeply familiar with the formatting. For that reason, I've mostly stayed quiet so far. I don't feel I have particularly strong insights to add beyond what's already been discussed.

Apologies if this PR caused extra discussion or confusion; that wasn't my intention 😢

@RalfJung
Copy link
Member

@ShoyuVanilla you don't have to be sorry! You just had the bad luck of stumbling into a superficially simple problem that has some deep subtle issues. This happens occasionally when one builds a complicated system such as a programming language, but we can't predict which issues it affects. It's nor your fault that these issues exist. :) I'm sorry this turned out to be heavier than you thought.

@jackh726
Copy link
Member

because deduplicating these would require deduplicating expressions, not just identifiers?

Or alternatively, defining increasingly complex rules like "we deduplicate the a but not the a.b", factoring out common prefixes...

There's also the only-slightly more complex rule of format_args!("{a} {a}") is deduplicated, but format_args!("{a.b} {a.b}") is not, under the logic that you can factor out the former into format_args!("{a} {a}", a=a) but not the latter into format_args!("{a.b} {a.b}", a.b=a.b).

Also, to be clear: the above rule would allow us to move forward with implementing (and even stabilizing) format_args!("{a.b} {a.b}") without even needing to worry about whether or not we want to change the deduplication of format_args!("{a} {a}"). I don't think anybody is arguing that doing any set of deduplication on format_args!("{a.b} {a.b}") makes much sense, but even less so for format_args!("{a.b.c} {a.b.c}").

@m-ou-se
Copy link
Member

m-ou-se commented Feb 10, 2026

Do we expect format_args!("{p.name}", p=get_person()) to work? If so, format_args!("{a.b}") could simply desugar to format_args!("{a.b}", a=a).

@traviscross traviscross added the P-lang-drag-1 Lang team prioritization drag level 1. https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang label Feb 11, 2026
@dianne
Copy link
Contributor

dianne commented Feb 11, 2026

@dianne, would you be able to show us a PR for the optimization?

I've got a half-baked PR that does the optimization for captured formatted arguments: #152480. It's missing the equivalent optimization for captured width and precision arguments for now, since I think that would be more invasive. Plus, when attempting to preserve the behavior of writing the args out separately, {:w$} only seems about as de-duplicable as {a.b}; both reasonably should be pure, but can execute user code via Deref impls (though that's probably much less likely from coercing a width to &usize than from a field access).

I'm not sure this is the right thing to do, especially given that width/precision messiness. Hopefully the draft PR makes things a bit clearer at least?

@joshtriplett
Copy link
Member

As the author of this PR, I approached this work under the assumption that it was a fairly straightforward and uncontroversial change, based on the discussions in #145739 . The main challenge I anticipated was identifying the affected crates.

I didn't expect the topic to be this debatable, and had I known that upfront, I might have reconsidered taking this on, since I'm not deeply familiar with the formatting. For that reason, I've mostly stayed quiet so far. I don't feel I have particularly strong insights to add beyond what's already been discussed.

Apologies if this PR caused extra discussion or confusion; that wasn't my intention 😢

You didn't do anything wrong here. We're having a vigorous discussion about whether we should do this, but none of that was caused by your PR; we appreciate you doing the work on this.

@joshtriplett
Copy link
Member

joshtriplett commented Feb 16, 2026

Giving a bit of additional context here, which may not be clear because while this has been discussed in several different Rust team meetings, the discussion and motivations haven't all made it onto GitHub threads, or has been spread across several threads. I've heard from a couple of folks that that context wasn't at all clear, and that can be frustrating and make it feel like things are happening for unclear or insufficiently motivated reasons.

This issue was originally discovered by @theemathas in #145739 (comment) .

This came up further in discussion of RFC 3626, but this is not a dependency of that RFC and that RFC is not the motivation for making the change; it is reasonably straightforward to make that RFC handle either approach.

The primary motivation for many has been the consistency of the language, and avoiding what appears to be a user trap or potential source of confusion, caused by the interaction of several features, some of which users don't commonly think about. (For instance, several people reacted to the use of a nullary type constructor as a captured argument with "you can do that?".)

When looking at it, folks on both lang and opsem felt it didn't make sense that f(something, something) would evaluate something twice (just as something + something does) but format!("{something} {something}") would not; that difference felt like surprising behavior for people to deal with. There were a couple of different ways to make that evaluation difference apparent, but the reactions generally seemed more like "are those obscure enough that it's not too much of a hazard to leave it this way", rather than any feeling that the current behavior was the right behavior on principle.

That, alone, provided strong enough motivation to look into changing this behavior. The initial reaction was to consider if there'd be substantive performance impact for just removing the behavior entirely. However, feedback on this and other threads surfaced that it's reasonably common to write multiple instances of the same capture in the same format string, and that this was thus an important optimization; that led us to hesitate a bit. We wondered, at that point, if it'd be possible to do an as-if optimization rather than a user-visible optimization. @dianne came to the rescue, suggesting that an as-if optimization for many common cases should be possible. (A draft of this optimization has now been posted at #152480 .)

Several folks have weighed in on this, including @RalfJung, regarding the value and desirability of the semantic change. That feedback applies to the consistency of the current Rust language, leaving aside any potential future changes we might make that would benefit from it.

I'm hoping that that makes the motivations and some of the thinking behind this proposed change more explicitly clear. In particular, I hope that makes it sufficiently clear that this change is not driven primarily by the desire to make further language changes, and multiple people across multiple teams have expressed a desire to make this change even if we don't add any further extensions to the formatting machinery.

@RalfJung
Copy link
Member

Thanks for the summary!

FWIW, I think the current behavior, while surprising, does at least have a "low-entropy explanation" via the desugaring to a single named argument. But with struct fields entering the picture, the behavior that includes user-visible deduplication becomes quite messy.

@joshtriplett
Copy link
Member

joshtriplett commented Feb 17, 2026

@RalfJung Sure, it's definitely explainable. But a "low-entropy explanation" for different behavior is never going to beat a "no-entropy explanation" (namely "it works the same as it does everywhere else").

@joshtriplett
Copy link
Member

So, I filed the concern lets-see-the-optimization for exactly the reason it says on the tin: libs-api wanted to see and consider the optimization before deciding this, to evaluate the maintainability concerns.

We've now seen the optimization. I no longer have this concern myself, and don't want to maintain this concern by proxy. If someone else wishes to file a concern, they should do so.

@rfcbot resolve lets-see-the-optimization

@rust-rfcbot rust-rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Feb 25, 2026
@rust-rfcbot
Copy link
Collaborator

🔔 This is now entering its final comment period, as per the review above. 🔔

@rust-rfcbot rust-rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Feb 25, 2026
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Feb 25, 2026

@rfcbot concern just-does-not-seem-worth-it

So I've been kind of sitting this out, and I hate to do this, but I am going to raise a concern for a bit more discussion. The bottom line is that I don't really buy the "consistency" motivation for this change. I see no reason to expect that format!("{X}, {X}") could be consistent with foo(X, X). In fact, I would much more expect consistency between format!("{X}, {X}") and format!("{X}, {X}", X=X).

It seems like the consistency concern comes up when you look at something like format!("{expr()}, {expr()}")?

But I see a rather consistent mental model which is that embedded {} blocks introduce an argument which can be named (in the case of X or a field chain like x.y.z) or anonymous (all other kinds of expressions).

I gather that the main argument in FAVOR of this chang is that it'd be simpler if the embedded {} blocks desugared to always a fresh argument. And this is rather stronger with f strings, i.e., f"foo({X},{X})" doesn't have that "desugars to X=X" to lean on, rather you must explain in terms of format!.

I have a kind of internal heuristic which is like: don't break compatibility if you don't have to. Right now, if this were a fresh design, I suppose I'd be on the fence because of f strings, but given that (a) there is code in the wild; (b) the desugaring is still easy to understand either way; and (c) it has real-world impact on performance and code-size, I don't understand why we would make this change.

Am I missing something? Is there a more complex argument?

If I do have this right, can somebody summarize the perf/cost hit and point me at the comments about the complexity of the proposed optimization?

(The optimization smacks to me "sufficiently smart compiler" when a rather "dumb compiler hack" would do the trick...)

EDIT: I read over more the comments. I can see that the f"" strings and the case with more fields does make the desugaring more complex to think about. I'll ponder a bit.

@rust-rfcbot rust-rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Feb 25, 2026
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Feb 25, 2026

Actually, what would help me is to write-out the precise desugaring expected for format!("{X.f}, {X.f}"). @m-ou-se you had a comment but skimming it I didn't quite see what desugaring you expected there?

EDIT: I guess that based on this comment...

Do we expect format_args!("{p.name}", p=get_person()) to work? If so, format_args!("{a.b}") could simply desugar to format_args!("{a.b}", a=a).

...it's clear enough.

@traviscross
Copy link
Contributor

The bottom line is that I don't really buy the "consistency" motivation for this change. I see no reason to expect that format!("{X}, {X}") could be consistent with foo(X, X). In fact, I would much more expect consistency between format!("{X}, {X}") and format!("{X}, {X}", X=X).

What about between format!("{X} {X}") and format!("{} {}", X, X)?

Am I missing something? Is there a more complex argument?

I think the best case for the simple argument is demonstrated by the earnest surprise that @theemathas (in #145739) and @Jules-Bertholet (in rust-lang/rfcs#3626 (comment)) had about the behavior. Obviously both are Rust experts. As @Jules-Bertholet said:

const Z: SomeStruct = SomeStruct::new();
println!("{Z.field1} {Z.field2} {Z.field3}");

If Z has interior mutability, then it has to be evaluated several times, no? That’s how consts work everywhere else in Rust. If you want only one instance, use a static

The simple argument really is that simple. From lang, we've pushed a consistent evaluation opsem for consts, the interpolation behavior is inconsistent with that, and that surprises people.


The slightly-more-complex argument, in an RFC 3626 context, goes as follows.

Let's say we want to desugar this:

fn main() {
    println!("{X.f} {X.f}"); //~ Format string to desugar.
}

// Feel free to ignore this scaffolding...
use core::{fmt, sync::atomic::{AtomicU8, Ordering::Relaxed}};

const X: W = W(Y);
const Y: U = U { f: V(AtomicU8::new(0)) };

struct W(U);
struct U { f: V }
struct V(AtomicU8);

impl fmt::Display for V {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        let x = self.0.fetch_add(1, Relaxed);
        write!(f, "displayV({x})")
    }
}

impl Drop for W {
    fn drop(&mut self) {
        let x = self.0.f.0.fetch_add(1, Relaxed);
        println!("dropW({x})");
    }
}

impl core::ops::Deref for W {
    type Target = U;
    fn deref(&self) -> &Self::Target {
        let x = self.0.f.0.fetch_add(1, Relaxed);
        println!("derefW({x})");
        &self.0
    }
}

There are three ways that come to mind:

fn main() {
    //println!("{X.f} {X.f}"); //~ Format string to desugar.
    println!("-- 1:");
    println!("{} {}", X.f, X.f); //~ Desugaring 1.
    println!("-- 2:");
    { let x = X; println!("{} {}", x.f, x.f) }; //~ Desugaring 2.
    println!("-- 3:");
    println!("{} {0:}", X.f); //~ Desugaring 3.
}

Playground link

These produce outputs:

-- 1:
derefW(0)
derefW(0)
displayV(1) displayV(1)
dropW(2)
dropW(2)
-- 2:
derefW(0)
derefW(1)
displayV(2) displayV(3)
dropW(4)
-- 3:
derefW(0)
displayV(1) displayV(2)
dropW(3)

I find the output of desugaring 1 satisfying, given our intended opsem expectations for consts. I find the output of desugarings 2 and 3 various degrees of unsatisfying. In particular, the idea that with println!("{X.f} {X.f}") we would get (with desugaring 2) two derefs but only one drop would just seem to be a real wart and a likely-continual source of surprise. But desugaring 3 is no picnic either, especially when we start thinking about how we'd want to handle {x.y} {x.y.z}.

For my part, if we were not to clean this up, then I'd become more skeptical of whether we'd want to expand our interpolation syntax at all.

Is it worth it? I really can't imagine there being meaningful breakage here. Regarding the performance, I'm hopeful that as-if optimizations can cover the common cases, but maybe there would be some cost. Maybe in some cases people should indeed prefer to write format!("{x} {x}", x=X.f) rather than format!("{X.f} {X.f}") -- unless we went with desugaring 3, this would have, if nothing else, fewer derefs. But I can live with that. In the rest of the language, there are plenty of reasons that people should similarly write let x = EXPR; /* Uses x twice */ rather than /* Uses EXPR twice */. That's the cost of our language not being pure.

@nikomatsakis
Copy link
Contributor

Thinking on this. I think that looking at particular examples isn't that fruitful for me. I get a lot more value out of thinking about the desugaring, making sure it's sensible, that it does the right thing in the common cases, and letting the semanticscs of edge cases fall out from the desugaring.

I think this aligns well with the way people learn: first learn the easy stuff, then learn the details, then let those details guide their intutions (the Terrance Tao More to math than rigor process).

So, let's assume that the backwards compatibility is not an issue here for the moment.

Then I think we are arguing about the desugarings. TC is proposing that a nice desugaring is going to be: {$EXPR} becomes {} and $EXPR in the arguments.

There are other options you could take, e.g., it behaves differently in the particular case of a single variable, or, we create one argument based on the literal bytes of the expression, or more complex things.

I do also think that when we look forward to f"" notation, which I really desperately want (in fact I'd make it the default form of strings in some new edition, myself), the question also gets interesting, because then format! is kind of this weird thing people will not be using, and there's no obvious way to do "named arguments" in f"" syntax -- but I guess learning about format! is fine.

To me, there are several conflicting design axioms here. One of them is "Rust is straightforward" or something like that, but another is "idiomatic nice Rust does the right thing modulo a specific set of compiler optimizations" (and should we include this particular thing in that case).

I am curious: do we have any data about how often it occurs in practice that the same variable is repeated? (I don't recall.)

Thinking about it, I can certainly see the argument that "why should it work differently than repeating variable references in any other context, if that's an efficiency problem, shouldn't we address it holistically".

@scottmcm
Copy link
Member

scottmcm commented Mar 4, 2026

The previous few posts make me ponder this: Is there an existing complexity we can re-use for this?

Like what if we imagine that

f"{a.b.foo} {a.b.bar}"

desugared to something more like

tuple_closure_format!("{} {}", || (&a.b.foo, &a.b.bar))

Then we could say "well obviously that follows the same capture rules as closures", using the less-straight-forward-but-more-useful capturing rules we've already defined for closures.

(And by not needing to do it on tokens we could have much smarter rules -- maybe even things smart about Copy.)


For clarity, this is currently a thought exercise, not a concrete "we should change it to work that way" proposal. There might well be reasons this doesn't work at all; I haven't thought through it in detail.

@nikomatsakis
Copy link
Contributor

OK, I've given this some more thought -- but before I write anything, I want to confirm something:

The optimization we've been discussing, is the point here to remove the "extra fields" from the format args struct? Can someone link me to the PR or description? That sounds like a pretty tailored and complex optimization, is it intended to be special purpose for the format-args struct or something that would apply more generally?

@RalfJung
Copy link
Member

RalfJung commented Mar 4, 2026

The PR with the optimization is #152480.

@nikomatsakis
Copy link
Contributor

@RalfJung that's a big PR, is there a shorter description of what it does and how?

@dianne
Copy link
Contributor

dianne commented Mar 4, 2026

@nikomatsakis the relevant part is the changes to AST→HIR lowering, https://github.com/rust-lang/rust/pull/152480/changes#diff-8b601779832d29e42b24bb88ccc3a5762217d72f148e8070767ee787b271190c. the way it's implemented there is as a transformation on the AST representation of format_args! before it's lowered to HIR. it scans through the formatting placeholders looking for captures of identifiers that refer to places; if it finds any duplicates, it de-duplicates them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. I-lang-nominated Nominated for discussion during a lang team meeting. I-lang-radar Items that are on lang's radar and will need eventual work or consideration. I-libs-api-nominated Nominated for discussion during a libs-api team meeting. needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. P-lang-drag-1 Lang team prioritization drag level 1. https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-clippy Relevant to the Clippy team. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

format_args deduplicates consts with interior mutability or destructor