Skip to content

Use token::Lit in ast::ExprKind::Lit.#102944

Merged
bors merged 3 commits intorust-lang:masterfrom
nnethercote:ast-Lit-third-time-lucky
Nov 17, 2022
Merged

Use token::Lit in ast::ExprKind::Lit.#102944
bors merged 3 commits intorust-lang:masterfrom
nnethercote:ast-Lit-third-time-lucky

Conversation

@nnethercote
Copy link
Contributor

Instead of ast::Lit.

Literal lowering now happens at two different times. Expression literals are lowered when HIR is crated. Attribute literals are lowered during parsing.

r? @petrochenkov

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 12, 2022
@rustbot
Copy link
Collaborator

rustbot commented Oct 12, 2022

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 12, 2022
@nnethercote
Copy link
Contributor Author

Another alternative to the literal changes in #101562 and #101885.

I don't feel entirely happy with this approach. Various things:

  • The output change to src/test/ui/lint/lint-impl-fn.rs is bogus, but I don't understand why it happens.
  • rustc_ast_lowering now depends on rustc_parse, for report_lit_error, because lit lowering now happens in both of those crates.
  • ast::Lit still has redundancy, which two literal "kinds" within it.
  • There are some almost duplicate paths in the parser:
    • parse_opt_token_lit / parse_opt_ast_lit
    • parse_token_lit / parse_ast_lit`

@nnethercote
Copy link
Contributor Author

Also, it's not yet complete. I haven't updated clippy and rustfmt. I'll do that if we decide that this is the right path forward.

@rust-log-analyzer

This comment has been minimized.

@petrochenkov
Copy link
Contributor

rustc_ast_lowering now depends on rustc_parse, for report_lit_error, because lit lowering now happens in both of those crates.

The literal lowering logic, including error reporting, is independent of the rest of the parser and can be moved to some common dependency of AST->HIR lowering and Attribute -> MetaItem lowering, or passed through a function pointer like nt_to_tokenstream before #97251.

ast::Lit still has redundancy, which two literal "kinds" within it.

I'd expect it to be renamed to something like MetaItemLit, and the token_lit part to be removed (not in this PR).

@petrochenkov
Copy link
Contributor

petrochenkov commented Oct 13, 2022

Change description for the lang team.

This PR changes some errors related to literals from syntactic-but-not-in-macros to semantic.
These are the error names in the compiler:

    InvalidSuffix,
    InvalidIntSuffix,
    InvalidFloatSuffix,
    NonDecimalFloat(u32),
    IntTooLarge,

Examples:

"string"any_suffix
10u123
10.0f123
0b10f32
999340282366920938463463374607431768211455999

All these examples are valid tokens, that can be passed to macros without errors, but previously they were reported as errors when parsing proper Rust code, including code that will be later removed by cfg or attribute macros.

// Before
sink! {
    "string"any_suffix // OK
    10u123 // OK
    10.0f123 // OK
    0b10f32 // OK
    999340282366920938463463374607431768211455999 // OK
}

#[cfg(FALSE)]
fn configured_out() {
    "string"any_suffix // ERROR
    10u123 // ERROR
    10.0f123 // ERROR
    0b10f32 // ERROR
    999340282366920938463463374607431768211455999 // ERROR
}

fn main() {
    "string"any_suffix // ERROR
    10u123 // ERROR
    10.0f123 // ERROR
    0b10f32 // ERROR
    999340282366920938463463374607431768211455999 // ERROR
}

This PR makes them fully semantic errors so they are only reported after all macros are expanded.

// After
sink! {
    "string"any_suffix // OK
    10u123 // OK
    10.0f123 // OK
    0b10f32 // OK
    999340282366920938463463374607431768211455999 // OK
}

#[cfg(FALSE)]
fn configured_out() {
    "string"any_suffix // OK
    10u123 // OK
    10.0f123 // OK
    0b10f32 // OK
    999340282366920938463463374607431768211455999 // OK
}

fn main() {
    "string"any_suffix // ERROR
    10u123 // ERROR
    10.0f123 // ERROR
    0b10f32 // ERROR
    999340282366920938463463374607431768211455999 // ERROR
}

This way

  • we make behavior of function-like and attribute-like macros more consistent
  • we make other literals more consistent with floating point literals which are already converted from strings to floats lately in the compiler (probably due to expensiveness of the conversion)
  • we avoid unnecessary work in the compiler - literals don't need to be checked/converted unless they are really used, and size of expression AST nodes can be shrank.

@petrochenkov petrochenkov added T-lang Relevant to the language team S-waiting-on-team I-lang-nominated Nominated for discussion during a lang team meeting. and removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 13, 2022
@nnethercote
Copy link
Contributor Author

we make other literals more consistent with floating point literals which are already converted from strings to floats lately in the compiler (probably due to expensiveness of the conversion)

Floats are stored as Symbols rather than f64 in ast::LitKind so that ast::LitKind can implement Eq and Hash. I think this is the real reason they are converted later, rather than cost.

@BennoLossin
Copy link
Contributor

0x10.0 // OK

This is currently not accepted by the compiler: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=316edfc43c4ad955ee7b667883a10287

@petrochenkov
Copy link
Contributor

petrochenkov commented Oct 18, 2022

@y86-dev
Ah, you are right, 0x10.0 -> 0b10f32 in all the examples.
UPD: I've updated the examples in #102944 (comment).

@nnethercote
Copy link
Contributor Author

@scottmcm: The meeting notes indicate some support for this, but also that a firm conclusion wasn't reached. Should we do an FCP?

@bors

This comment was marked as resolved.

@scottmcm
Copy link
Member

@nnethercote I've been trying to distill my thoughts here into something less fuzzy. I think I see three different situations here.

  1. Things that match currently-existing token forms, but are hitting arbitrary length restrictions.

For example, decimal literals that are obviously lexically fine, like

#[cfg(nope)]
pub fn foo() {
    let x = 1111111111222222222233333333334444444444;
}

For that kind, I think we had lang consensus that they definitely shouldn't be a tokenizing nor parsing time error, but rather a semantic error later. So making those work for macros and cfg'd-out code as I understand this PR would do makes great sense to me.

  1. Extensions of existing lexical forms to other widths.

I don't think we're likely to add u1024 any time soon, but if we ever did I think it's very obvious that 0_u1024 would be a valid value of the type, as would 0xFFFF_u1024 and the other obvious examples. Similarly for 0.0_f16, or other such things with different bitwidths. (0.0_f11 is massively unlikely, but I still don't think it's something we'd need to block.)

I think lang also felt positive about supporting these, though I don't think we talked about them as much. Personally I'm definitely in favour.

  1. Things where the combination has no meaning today.

This is where I'm not so confident about things. When it explicitly rejects hexadecimal and binary floating point, it's not at all obvious to me that 0b0f32 is something we want to support for situations like this. What does that even mean, especially when 0x0f32 does something completely related to what an f32 suffix means near a .? Similarly, it's not obvious to me at all that "hello"match should even be one token. I guess we can't break that in macros until an edition, so it's stuck being passable there, but that doesn't mean it's something that makes sense to me to allow through the parser.


Hopefully that's understandable 🤞

How do you feel about those distinctions? Is the line between cases 1&2 and case 3 something that you think is meaningful? How obnoxious would it be to separate those cases in the code? Am I missing something about how Rust is set up lexically/syntactically?

cc @rust-lang/lang for comments in hopes this makes more sense than my rambling in the meeting.

@nnethercote
Copy link
Contributor Author

I have no strong opinions about the language design issues here. I'm just interested in shrinking the size of ast::Literal, in order to shrink ast::Expr, in order to reduce compile times.

@petrochenkov: what do think about Scott's comments?

@petrochenkov
Copy link
Contributor

@scottmcm

How do you feel about those distinctions? Is the line between cases 1&2 and case 3 something that you think is meaningful?

I wouldn't make any of these distinctions.
For me the important part is that all of them are considered valid tokens (accepted as tt or proc macro input), so it's too late to make the distinctions and they should all be accepted by Rust parser as well.
If you want to make distinctions, then perhaps the moral of this story is that we need to pay more attention when stabilizing something as a valid token.

How obnoxious would it be to separate those cases in the code? Am I missing something about how Rust is set up lexically/syntactically?

  • "string"any_suffix - less obnoxious, the suffix check is cheap and AST expressions can still keep token::Lit, although it may be annoying to not report this error twice.
  • everything else - more obnoxious, AST either needs to keep lowered literals, or the compiler will have to lower literals twice doing more work instead of less work.

@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/lang meeting, and we decided that we're willing to evaluate this via FCP. We're not thrilled that these were handled in the first place, but given that they work in macros, we should let them work in cfg(FALSE). That doesn't stop us from giving them language-level meanings in the parser later.

@joshtriplett
Copy link
Member

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Oct 25, 2022

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Oct 25, 2022
@joshtriplett joshtriplett removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Oct 25, 2022
@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Oct 26, 2022
Instead of `ast::Lit`.

Literal lowering now happens at two different times. Expression literals
are lowered when HIR is crated. Attribute literals are lowered during
parsing.

This commit changes the language very slightly. Some programs that used
to not compile now will compile. This is because some invalid literals
that are removed by `cfg` or attribute macros will no longer trigger
errors. See this comment for more details:
rust-lang#102944 (comment)
@nnethercote nnethercote force-pushed the ast-Lit-third-time-lucky branch from 38e4045 to 358a603 Compare November 15, 2022 22:41
@nnethercote
Copy link
Contributor Author

I rebased.

@bors r=petrochenkov

@bors
Copy link
Collaborator

bors commented Nov 15, 2022

📌 Commit 358a603 has been approved by petrochenkov

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 15, 2022
@Manishearth
Copy link
Member

@bors p=1

@bors
Copy link
Collaborator

bors commented Nov 16, 2022

⌛ Testing commit 358a603 with merge bebd57a...

@bors
Copy link
Collaborator

bors commented Nov 17, 2022

☀️ Test successful - checks-actions
Approved by: petrochenkov
Pushing bebd57a to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 17, 2022
@bors bors merged commit bebd57a into rust-lang:master Nov 17, 2022
@rustbot rustbot added this to the 1.67.0 milestone Nov 17, 2022
@nnethercote nnethercote deleted the ast-Lit-third-time-lucky branch November 17, 2022 01:59
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (bebd57a): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.7%] 16
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.2%, -0.2%] 1
Improvements ✅
(secondary)
-0.5% [-1.0%, -0.3%] 10
All ❌✅ (primary) 0.3% [-0.2%, 0.7%] 17

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.5% [0.5%, 0.5%] 1
Regressions ❌
(secondary)
2.5% [1.6%, 3.1%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.1% [-3.1%, -3.1%] 1
All ❌✅ (primary) 0.5% [0.5%, 0.5%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

@nnethercote
Copy link
Contributor Author

As above:

The wins and losses roughly balance out. Also, this will enable some additional improvements by shrinking the size of ast::Expr.

@rustbot label: +perf-regression-triaged

@rylev
Copy link
Member

rylev commented Nov 22, 2022

Note from perf triage: the regressions are largely concentrated in primary benchmarks while the improvements are in secondary. So while the regressions and improvements balance out when treating all benchmarks equally, I do think that primary regressions should be considered to have more weight than secondary improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-lang Relevant to the language team

Projects

None yet

Development

Successfully merging this pull request may close these issues.