Conversation
Identifiers may contain $, ? or |, but not /. They may not begin with $ or ?. The identifier | is not valid (since it's the VBAR).
| ### Changed | ||
|
|
||
| - Identifiers may contain `$` or `?` but may not begin with | ||
| them. They may contain `|` as well. Identifiers must not contain `/`. |
There was a problem hiding this comment.
If they can contain VBAR, this may conflict with constructor declarations in inductive type definitions.
There was a problem hiding this comment.
Why not allowing / as (single-letter) identifier?
There was a problem hiding this comment.
I don't think there would be a conflict with constructors: a constructor is declared with | Foo so you parse the vbar and Foo. We force that there is a space between the vbar and the identifier, which is rather natural since tokens are separated by spaces.
There was a problem hiding this comment.
But what if the user forgets a space and write a|b ?
There was a problem hiding this comment.
Well it's incorrect! Tokens are separated by spaces, so it's incorrect as much as symbolfoo is not symbol foo, isn't it?
There was a problem hiding this comment.
Maybe we could try to follow standards and do like Rust https://doc.rust-lang.org/reference/identifiers.html, that is using standard sets of codepoints defined in https://www.unicode.org/reports/tr31/tr31-33.html.
There was a problem hiding this comment.
Unfortunately, math symbols are not included in identifiers. But according to this source, it is as good to keep the identifiers and maths symbols as separate classes. which we can do. We can say that an identifier is either xid_start, Star xid_continue or Plus math.
There was a problem hiding this comment.
So it seems that either we want to work around something like this
let ident = [%sedlex.regexp? xid_start, Star (xid_continue | Chars "'")]
let regid = [%sedlex.regexp? ident | Plus math]One of the main advantage is that people have taken care of removing ambiguous characters from these sets (such as the duplication between the Greek mu μ and the micro sign μ), and we can rely on the unicode standards to keep the set up to date, and our specification is handled by the standard (plus or minus some implementation specific details which shouldn't be that numerous).
|
Very fun things can appear with this signature: |
What do you mean? |
Identifiers may contain $, ? or |, but not /. They may not begin with $ or ?. The identifier | is not valid (since it's the VBAR).