Use `token::Lit` in `ast::ExprKind::Lit`. by nnethercote · Pull Request #102944 · rust-lang/rust

nnethercote · 2022-10-12T04:09:45Z

Instead of ast::Lit.

Literal lowering now happens at two different times. Expression literals are lowered when HIR is crated. Attribute literals are lowered during parsing.

r? @petrochenkov

rustbot · 2022-10-12T04:09:48Z

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

nnethercote · 2022-10-12T04:11:54Z

Another alternative to the literal changes in #101562 and #101885.

I don't feel entirely happy with this approach. Various things:

The output change to src/test/ui/lint/lint-impl-fn.rs is bogus, but I don't understand why it happens.
rustc_ast_lowering now depends on rustc_parse, for report_lit_error, because lit lowering now happens in both of those crates.
ast::Lit still has redundancy, which two literal "kinds" within it.
There are some almost duplicate paths in the parser:
- parse_opt_token_lit / parse_opt_ast_lit
- parse_token_lit / parse_ast_lit`

nnethercote · 2022-10-12T04:12:26Z

Also, it's not yet complete. I haven't updated clippy and rustfmt. I'll do that if we decide that this is the right path forward.

petrochenkov · 2022-10-13T11:11:36Z

rustc_ast_lowering now depends on rustc_parse, for report_lit_error, because lit lowering now happens in both of those crates.

The literal lowering logic, including error reporting, is independent of the rest of the parser and can be moved to some common dependency of AST->HIR lowering and Attribute -> MetaItem lowering, or passed through a function pointer like nt_to_tokenstream before #97251.

ast::Lit still has redundancy, which two literal "kinds" within it.

I'd expect it to be renamed to something like MetaItemLit, and the token_lit part to be removed (not in this PR).

petrochenkov · 2022-10-13T11:41:42Z

Change description for the lang team.

This PR changes some errors related to literals from syntactic-but-not-in-macros to semantic.
These are the error names in the compiler:

    InvalidSuffix,
    InvalidIntSuffix,
    InvalidFloatSuffix,
    NonDecimalFloat(u32),
    IntTooLarge,

Examples:

"string"any_suffix
10u123
10.0f123
0b10f32
999340282366920938463463374607431768211455999

All these examples are valid tokens, that can be passed to macros without errors, but previously they were reported as errors when parsing proper Rust code, including code that will be later removed by cfg or attribute macros.

// Before
sink! {
    "string"any_suffix // OK
    10u123 // OK
    10.0f123 // OK
    0b10f32 // OK
    999340282366920938463463374607431768211455999 // OK
}

#[cfg(FALSE)]
fn configured_out() {
    "string"any_suffix // ERROR
    10u123 // ERROR
    10.0f123 // ERROR
    0b10f32 // ERROR
    999340282366920938463463374607431768211455999 // ERROR
}

fn main() {
    "string"any_suffix // ERROR
    10u123 // ERROR
    10.0f123 // ERROR
    0b10f32 // ERROR
    999340282366920938463463374607431768211455999 // ERROR
}

This PR makes them fully semantic errors so they are only reported after all macros are expanded.

// After
sink! {
    "string"any_suffix // OK
    10u123 // OK
    10.0f123 // OK
    0b10f32 // OK
    999340282366920938463463374607431768211455999 // OK
}

#[cfg(FALSE)]
fn configured_out() {
    "string"any_suffix // OK
    10u123 // OK
    10.0f123 // OK
    0b10f32 // OK
    999340282366920938463463374607431768211455999 // OK
}

fn main() {
    "string"any_suffix // ERROR
    10u123 // ERROR
    10.0f123 // ERROR
    0b10f32 // ERROR
    999340282366920938463463374607431768211455999 // ERROR
}

This way

we make behavior of function-like and attribute-like macros more consistent
we make other literals more consistent with floating point literals which are already converted from strings to floats lately in the compiler (probably due to expensiveness of the conversion)
we avoid unnecessary work in the compiler - literals don't need to be checked/converted unless they are really used, and size of expression AST nodes can be shrank.

nnethercote · 2022-10-14T01:09:21Z

we make other literals more consistent with floating point literals which are already converted from strings to floats lately in the compiler (probably due to expensiveness of the conversion)

Floats are stored as Symbols rather than f64 in ast::LitKind so that ast::LitKind can implement Eq and Hash. I think this is the real reason they are converted later, rather than cost.

BennoLossin · 2022-10-18T19:04:03Z

0x10.0 // OK

This is currently not accepted by the compiler: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=316edfc43c4ad955ee7b667883a10287

petrochenkov · 2022-10-18T19:17:25Z

@y86-dev
Ah, you are right, 0x10.0 -> 0b10f32 in all the examples.
UPD: I've updated the examples in #102944 (comment).

nnethercote · 2022-10-20T21:40:37Z

@scottmcm: The meeting notes indicate some support for this, but also that a firm conclusion wasn't reached. Should we do an FCP?

scottmcm · 2022-10-23T06:35:50Z

@nnethercote I've been trying to distill my thoughts here into something less fuzzy. I think I see three different situations here.

Things that match currently-existing token forms, but are hitting arbitrary length restrictions.

For example, decimal literals that are obviously lexically fine, like

#[cfg(nope)]
pub fn foo() {
    let x = 1111111111222222222233333333334444444444;
}

For that kind, I think we had lang consensus that they definitely shouldn't be a tokenizing nor parsing time error, but rather a semantic error later. So making those work for macros and cfg'd-out code as I understand this PR would do makes great sense to me.

Extensions of existing lexical forms to other widths.

I don't think we're likely to add u1024 any time soon, but if we ever did I think it's very obvious that 0_u1024 would be a valid value of the type, as would 0xFFFF_u1024 and the other obvious examples. Similarly for 0.0_f16, or other such things with different bitwidths. (0.0_f11 is massively unlikely, but I still don't think it's something we'd need to block.)

I think lang also felt positive about supporting these, though I don't think we talked about them as much. Personally I'm definitely in favour.

Things where the combination has no meaning today.

This is where I'm not so confident about things. When it explicitly rejects hexadecimal and binary floating point, it's not at all obvious to me that 0b0f32 is something we want to support for situations like this. What does that even mean, especially when 0x0f32 does something completely related to what an f32 suffix means near a .? Similarly, it's not obvious to me at all that "hello"match should even be one token. I guess we can't break that in macros until an edition, so it's stuck being passable there, but that doesn't mean it's something that makes sense to me to allow through the parser.

Hopefully that's understandable 🤞

How do you feel about those distinctions? Is the line between cases 1&2 and case 3 something that you think is meaningful? How obnoxious would it be to separate those cases in the code? Am I missing something about how Rust is set up lexically/syntactically?

cc @rust-lang/lang for comments in hopes this makes more sense than my rambling in the meeting.

nnethercote · 2022-10-24T22:12:45Z

I have no strong opinions about the language design issues here. I'm just interested in shrinking the size of ast::Literal, in order to shrink ast::Expr, in order to reduce compile times.

@petrochenkov: what do think about Scott's comments?

petrochenkov · 2022-10-25T09:27:14Z

@scottmcm

How do you feel about those distinctions? Is the line between cases 1&2 and case 3 something that you think is meaningful?

I wouldn't make any of these distinctions.
For me the important part is that all of them are considered valid tokens (accepted as tt or proc macro input), so it's too late to make the distinctions and they should all be accepted by Rust parser as well.
If you want to make distinctions, then perhaps the moral of this story is that we need to pay more attention when stabilizing something as a valid token.

How obnoxious would it be to separate those cases in the code? Am I missing something about how Rust is set up lexically/syntactically?

"string"any_suffix - less obnoxious, the suffix check is cheap and AST expressions can still keep token::Lit, although it may be annoying to not report this error twice.
everything else - more obnoxious, AST either needs to keep lowered literals, or the compiler will have to lower literals twice doing more work instead of less work.

joshtriplett · 2022-10-25T18:40:48Z

We discussed this in today's @rust-lang/lang meeting, and we decided that we're willing to evaluate this via FCP. We're not thrilled that these were handled in the first place, but given that they work in macros, we should let them work in cfg(FALSE). That doesn't stop us from giving them language-level meanings in the parser later.

joshtriplett · 2022-10-25T18:40:55Z

@rfcbot merge

rfcbot · 2022-10-25T18:40:56Z

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Instead of `ast::Lit`. Literal lowering now happens at two different times. Expression literals are lowered when HIR is crated. Attribute literals are lowered during parsing. This commit changes the language very slightly. Some programs that used to not compile now will compile. This is because some invalid literals that are removed by `cfg` or attribute macros will no longer trigger errors. See this comment for more details: rust-lang#102944 (comment)

nnethercote · 2022-11-15T23:00:53Z

I rebased.

@bors r=petrochenkov

bors · 2022-11-15T23:00:55Z

📌 Commit 358a603 has been approved by petrochenkov

It is now in the queue for this repository.

Manishearth · 2022-11-16T17:44:52Z

@bors p=1

bors · 2022-11-16T23:03:17Z

⌛ Testing commit 358a603 with merge bebd57a...

bors · 2022-11-17T01:56:18Z

☀️ Test successful - checks-actions
Approved by: petrochenkov
Pushing bebd57a to master...

rust-timer · 2022-11-17T03:16:01Z

Finished benchmarking commit (bebd57a): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.7%]	16
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.2%	[-0.2%, -0.2%]	1
Improvements ✅ (secondary)	-0.5%	[-1.0%, -0.3%]	10
All ❌✅ (primary)	0.3%	[-0.2%, 0.7%]	17

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.5%, 0.5%]	1
Regressions ❌ (secondary)	2.5%	[1.6%, 3.1%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.1%	[-3.1%, -3.1%]	1
All ❌✅ (primary)	0.5%	[0.5%, 0.5%]	1

Cycles

This benchmark run did not return any relevant results for this metric.

nnethercote · 2022-11-17T04:29:01Z

As above:

The wins and losses roughly balance out. Also, this will enable some additional improvements by shrinking the size of ast::Expr.

@rustbot label: +perf-regression-triaged

rylev · 2022-11-22T14:36:04Z

Note from perf triage: the regressions are largely concentrated in primary benchmarks while the improvements are in secondary. So while the regressions and improvements balance out when treating all benchmarks equally, I do think that primary regressions should be considered to have more weight than secondary improvements.

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 12, 2022

rust-highfive assigned petrochenkov Oct 12, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 12, 2022

This comment has been minimized.

Sign in to view

This comment was marked as resolved.

Sign in to view

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Oct 25, 2022

joshtriplett removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Oct 25, 2022

rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Oct 26, 2022

nnethercote force-pushed the ast-Lit-third-time-lucky branch from 38e4045 to 358a603 Compare November 15, 2022 22:41

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 15, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 17, 2022

bors merged commit bebd57a into rust-lang:master Nov 17, 2022

rustbot added this to the 1.67.0 milestone Nov 17, 2022

bors mentioned this pull request Nov 17, 2022

Match and enforce crate and slug names #104120

Merged

nnethercote deleted the ast-Lit-third-time-lucky branch November 17, 2022 01:59

bors mentioned this pull request Nov 17, 2022

incremental: migrate diagnostics #100754

Merged

This was referenced Nov 17, 2022

Shrink ast::Expr harder #101562

Merged

Shrink ast::Expr to 88 bytes #102000

Closed

mattheww mentioned this pull request Nov 19, 2022

Update literal suffix docs for rust-lang/rust#102944 rust-lang/reference#1304

Closed

jruderman mentioned this pull request Nov 19, 2022

ICE "token_lit in from_mac_args: InvalidIntSuffix" #104620

Closed

mattheww mentioned this pull request Nov 20, 2022

Update literal suffix docs for rust-lang/rust#102944 rust-lang/reference#1305

Merged

nnethercote mentioned this pull request Nov 20, 2022

Remove ast::Lit::kind #101885

Closed

jruderman mentioned this pull request Nov 23, 2022

ICE: unreachable match arm in concat_bytes::invalid_type_err #104769

Closed

ehuss mentioned this pull request Nov 27, 2022

Update books #104993

Closed

jruderman mentioned this pull request Nov 28, 2022

ICE when invalid string literal used with macro #105011

Closed

rustbot mentioned this pull request Dec 5, 2022

Update books #105313

Merged

est31 mentioned this pull request Mar 10, 2023

Implement RFC 3348, c"foo" literals #108801

Merged

nnethercote mentioned this pull request Dec 12, 2023

Delay literal unescaping #118699

Closed

Uh oh!

Conversation

nnethercote commented Oct 12, 2022

Uh oh!

rustbot commented Oct 12, 2022

Uh oh!

nnethercote commented Oct 12, 2022

Uh oh!

nnethercote commented Oct 12, 2022

Uh oh!

This comment has been minimized.

petrochenkov commented Oct 13, 2022

Uh oh!

petrochenkov commented Oct 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercote commented Oct 14, 2022

Uh oh!

BennoLossin commented Oct 18, 2022

Uh oh!

petrochenkov commented Oct 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercote commented Oct 20, 2022

Uh oh!

This comment was marked as resolved.

scottmcm commented Oct 23, 2022

Uh oh!

nnethercote commented Oct 24, 2022

Uh oh!

petrochenkov commented Oct 25, 2022

Uh oh!

joshtriplett commented Oct 25, 2022

Uh oh!

joshtriplett commented Oct 25, 2022

Uh oh!

rfcbot commented Oct 25, 2022 • edited by cramertj Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercote commented Nov 15, 2022

Uh oh!

bors commented Nov 15, 2022

Uh oh!

Manishearth commented Nov 16, 2022

Uh oh!

bors commented Nov 16, 2022

Uh oh!

bors commented Nov 17, 2022

Uh oh!

rust-timer commented Nov 17, 2022

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Uh oh!

nnethercote commented Nov 17, 2022

Uh oh!

rylev commented Nov 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

petrochenkov commented Oct 13, 2022 •

edited

Loading

petrochenkov commented Oct 18, 2022 •

edited

Loading

rfcbot commented Oct 25, 2022 •

edited by cramertj

Loading