Skip to content

parser: add (?:...) non-capturing and (?<name>...) named capture group syntax#4

Open
robertream wants to merge 14 commits into
2kai2kai2:mainfrom
robertream:add-ui-compile-fail-named-captures
Open

parser: add (?:...) non-capturing and (?<name>...) named capture group syntax#4
robertream wants to merge 14 commits into
2kai2kai2:mainfrom
robertream:add-ui-compile-fail-named-captures

Conversation

@robertream
Copy link
Copy Markdown

@robertream robertream commented Mar 6, 2026

Extends the POSIX ERE parser with two non-standard but widely-used group syntaxes:

  • (?:...) — non-capturing group: parsed and matched but produces no StartCapture/EndCapture transitions, so N stays smaller
  • (?<name>...) — named capture group: behaves identically to an unnamed group at the engine level; name is available via ERE::group_names()

New parser variants: NonCapturingSubexpression, NamedSubexpression.
New error: InvalidGroupName (empty or unclosed name).
New method: ERE::group_names() — depth-first walk returning Vec<Option<String>> in group-number order, mirroring the simplified tree traversal.

attr-macro: add named struct field binding with #[group(N)]

Extends __compile_regex_attr to handle Fields::Named in addition to the existing Fields::Unnamed (tuple struct) path:

  • Fields without #[group(N)]: matched to capture groups by field name via ERE::group_names() — field order is independent of regex group order
  • Fields with #[group(N)]: bound to explicit capture group index N
  • #[group(N)] attributes are stripped from the emitted struct so the compiler does not see an unknown attribute
  • Out-of-bounds #[group(N)] index emits a clear compile error instead of panicking

Also adds doc comment for Regex::exec describing the [Option<&str>; N] return shape and when None is returned.

Capture group validation

  • All capture groups (named and unnamed) must be bound to a struct field — unbound groups produce a compile error pointing to the regex literal
  • Named groups: "Named capture group 'x' has no corresponding field in the struct."
  • Unnamed groups: "Capture group N has no corresponding field in the struct. Add a field like '#[group(N)] captured: &'a str'."

tests: add coverage for named groups, non-capturing groups, and #[group(N)]

  • compile_regex_group_extensions: verifies mixed (?<name>...)/(?:...) pattern compiles as Regex<3> across all four engines (non-capturing group excluded from N, named groups count same as unnamed)
  • non_capturing_group, named_capture_group_tuple_struct, named_field_struct: attr-macro smoke tests for new syntax with tuple and named structs
  • compile_fail tests via trybuild: unknown_field_name, group_index_out_of_bounds, missing_named_capture_field, missing_unnamed_capture_group_binding confirm macro emits clear compile errors

docs: document named groups, non-capturing groups, and #[group(N)] (with doc tests)

  • README: add "Named Groups and Non-Capturing Groups" section with live code examples for (?:...) and (?<name>...), plus a rust,ignore example for #[group(N)] (feature-gated, so not doc-tested by default)
  • ROADMAP: mark (?:...) non-capturing groups as [x] shipped; add [x] entry for (?<name>...) named capture groups
  • ere-macros: extend #[regex(...)] docs with two live doc-test examples — named struct with field-order independence, and optional named group mapping to Option<&'a str>

Cleanup: remove pre-existing dead code

Removed orphaned nfa_static serialization pipeline, unused U8NFA construction helpers (superseded by build()), duplicate make_label, unused Run::start_state field, Atom::serialize_check, and SimplifiedTreeNode::from_ere_no_group0. Suppressed warnings for intentionally kept symmetric API (with_offset methods), in-progress visualization scaffolding, and QuantifierType::min/max.

Validation

  • cargo build — zero warnings
  • cargo test — all pass
  • cargo test -F unstable-attr-regex — all pass (including 4 compile-fail tests)

Closes #2

…p syntax

Extends the POSIX ERE parser with two non-standard but widely-used group
syntaxes:

- `(?:...)` — non-capturing group: parsed and matched but produces no
  StartCapture/EndCapture transitions, so N stays smaller
- `(?<name>...)` — named capture group: behaves identically to an unnamed
  group at the engine level; name is available via ERE::group_names()

New parser variants: NonCapturingSubexpression, NamedSubexpression.
New error: InvalidGroupName (empty or unclosed name).
New method: ERE::group_names() — depth-first walk returning Vec<Option<String>>
in group-number order, mirroring the simplified tree traversal.

attr-macro: add named struct field binding with #[group(N)]

Extends __compile_regex_attr to handle Fields::Named in addition to the
existing Fields::Unnamed (tuple struct) path:

- Fields without #[group(N)]: matched to capture groups by field name
  via ERE::group_names() — field order is independent of regex group order
- Fields with #[group(N)]: bound to explicit capture group index N
- #[group(N)] attributes are stripped from the emitted struct so the
  compiler does not see an unknown attribute
- Out-of-bounds #[group(N)] index emits a clear compile error instead
  of panicking

Also adds doc comment for Regex::exec describing the [Option<&str>; N]
return shape and when None is returned.

tests: add coverage for named groups, non-capturing groups, and #[group(N)]

- compile_regex_group_extensions: verifies mixed (?<name>...)/(?:...) pattern
  compiles as Regex<3> across all four engines (non-capturing group excluded
  from N, named groups count same as unnamed)
- non_capturing_group, named_capture_group_tuple_struct, named_field_struct:
  attr-macro smoke tests for new syntax with tuple and named structs
- compile_fail tests via trybuild: unknown_field_name and
  group_index_out_of_bounds confirm macro emits clear compile errors

docs: document named groups, non-capturing groups, and #[group(N)] (with doc tests)

- README: add 'Named Groups and Non-Capturing Groups' section with live
  code examples for (?:...) and (?<name>...), plus a rust,ignore example
  for #[group(N)] (feature-gated, so not doc-tested by default)
- ROADMAP: mark (?:...) non-capturing groups as [x] shipped; add [x] entry
  for (?<name>...) named capture groups
- ere-macros: extend #[regex(...)] doc with two live doc-test examples —
  named struct with field-order independence, and optional named group
  mapping to Option<&'a str>
@robertream robertream changed the title Add UI compile-fail coverage for named capture bindings parser: add (?:...) non-capturing and (?<name>...) named capture group syntax Mar 6, 2026
Copy link
Copy Markdown

@jtmoon79 jtmoon79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, was this AI created?

Either way, LGTM!

@robertream
Copy link
Copy Markdown
Author

@jtmoon79 of course, who even has the time to write artisanal hand written code anymore? That's sooo 2025! :)

@jtmoon79
Copy link
Copy Markdown

On second thought, this is emitting a number of warnings during the build:

‣ cargo build
warning: unused variable: `accept_state`
   --> ere-core/src/engines/flat_lockstep_nfa.rs:296:17
    |
296 |             let accept_state = ImplVMStateLabel(state_count - 1);
    |                 ^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_accept_state`
    |
    = note: `#[warn(unused_variables)]` (part of `#[warn(unused)]`) on by default

warning: unused variable: `accept_state`
   --> ere-core/src/engines/flat_lockstep_nfa_u8.rs:308:17
    |
308 |             let accept_state = ImplVMStateLabel(state_count - 1);
    |                 ^^^^^^^^^^^^ help: if this is intentional, prefix it with an underscore: `_accept_state`

warning: associated function `serialize_as_token_stream` is never used
  --> ere-core/src/engines/nfa_static.rs:94:8
   |
89 | impl NFATransitionStatic {
   | ------------------------ associated function in this implementation
...
94 |     fn serialize_as_token_stream(transition: &WorkingTransition) -> proc_macro2::TokenStream {
   |        ^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(dead_code)]` (part of `#[warn(unused)]`) on by default

warning: associated function `serialize_as_token_stream` is never used
   --> ere-core/src/engines/nfa_static.rs:127:8
    |
117 | impl NFAStateStatic {
    | ------------------- associated function in this implementation
...
127 |     fn serialize_as_token_stream(state: &WorkingState) -> proc_macro2::TokenStream {
    |        ^^^^^^^^^^^^^^^^^^^^^^^^^

warning: function `serialize_nfa_as_token_stream` is never used
   --> ere-core/src/engines/nfa_static.rs:228:15
    |
228 | pub(crate) fn serialize_nfa_as_token_stream(nfa: &WorkingNFA) -> proc_macro2::TokenStream {
    |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: field `start_state` is never read
   --> ere-core/src/engines/one_pass_u8.rs:157:5
    |
156 | struct Run {
    |        --- field in this struct
157 |     start_state: usize,
    |     ^^^^^^^^^^^
    |
    = note: `Run` has a derived impl for the trait `Clone`, but this is intentionally ignored during dead code analysis

warning: methods `group_names` and `collect_group_names` are never used
   --> ere-core/src/parse_tree.rs:100:19
    |
 69 | impl ERE {
    | -------- methods in this implementation
...
100 |     pub(crate) fn group_names(&self) -> Vec<Option<String>> {
    |                   ^^^^^^^^^^^
...
105 |     fn collect_group_names(&self, names: &mut Vec<Option<String>>) {
    |        ^^^^^^^^^^^^^^^^^^^

warning: method `collect_group_names` is never used
   --> ere-core/src/parse_tree.rs:259:8
    |
258 | impl EREExpression {
    | ------------------ method in this implementation
259 |     fn collect_group_names(&self, names: &mut Vec<Option<String>>) {
    |        ^^^^^^^^^^^^^^^^^^^

warning: methods `min` and `max` are never used
   --> ere-core/src/parse_tree.rs:291:14
    |
288 | impl QuantifierType {
    | ------------------- methods in this implementation
...
291 |     const fn min(&self) -> u32 {
    |              ^^^
...
302 |     const fn max(&self) -> Option<u32> {
    |              ^^^

warning: method `serialize_check` is never used
   --> ere-core/src/parse_tree.rs:542:19
    |
465 | impl Atom {
    | --------- method in this implementation
...
542 |     pub(crate) fn serialize_check(&self) -> TokenStream {
    |                   ^^^^^^^^^^^^^^^

warning: associated function `from_ere_no_group0` is never used
   --> ere-core/src/simplified_tree.rs:263:19
    |
164 | impl SimplifiedTreeNode {
    | ----------------------- associated function in this implementation
...
263 |     pub(crate) fn from_ere_no_group0(value: &ERE, config: &Config) -> (SimplifiedTreeNode, usize) {
    |                   ^^^^^^^^^^^^^^^^^^

warning: field `graph` is never read
  --> ere-core/src/visualization/layout.rs:18:5
   |
15 | pub struct DAGLayout<'a> {
   |            --------- field in this struct
...
18 |     graph: &'a LatexGraph,
   |     ^^^^^

warning: enum `LayerNode` is never used
  --> ere-core/src/visualization/layout.rs:94:14
   |
94 |         enum LayerNode {
   |              ^^^^^^^^^

warning: method `with_offset` is never used
  --> ere-core/src/working_nfa.rs:48:25
   |
41 | impl EpsilonTransition {
   | ---------------------- method in this implementation
...
48 |     pub(crate) const fn with_offset(self, offset: usize) -> EpsilonTransition {
   |                         ^^^^^^^^^^^

warning: method `with_offset` is never used
  --> ere-core/src/working_nfa.rs:94:12
   |
90 | impl WorkingTransition {
   | ---------------------- method in this implementation
...
94 |     pub fn with_offset(mut self, offset: usize) -> WorkingTransition {
   |            ^^^^^^^^^^^

warning: function `make_label` is never used
   --> ere-core/src/working_u8_dfa.rs:657:12
    |
657 |         fn make_label(nfa_indices: &[SubNFAStateID]) -> String {
    |            ^^^^^^^^^^

warning: method `with_offset` is never used
  --> ere-core/src/working_u8_nfa.rs:69:12
   |
65 | impl U8Transition {
   | ----------------- method in this implementation
...
69 |     pub fn with_offset(mut self, offset: usize) -> U8Transition {
   |            ^^^^^^^^^^^

warning: multiple associated functions are never used
   --> ere-core/src/working_u8_nfa.rs:155:8
    |
153 | impl U8NFA {
    | ---------- associated functions in this implementation
154 |     /// Makes an NFA that matches with zero length.
155 |     fn nfa_empty() -> U8NFA {
    |        ^^^^^^^^^
...
165 |     fn nfa_symbol_char(c: char) -> U8NFA {
    |        ^^^^^^^^^^^^^^^
...
225 |     fn nfa_union(nodes: &[U8NFA]) -> U8NFA {
    |        ^^^^^^^^^
...
250 |     fn nfa_capture(nfa: &U8NFA, group_num: usize) -> U8NFA {
    |        ^^^^^^^^^^^
...
289 |     fn nfa_repeat(nfa: &U8NFA, times: usize) -> U8NFA {
    |        ^^^^^^^^^^
...
293 |     fn nfa_upto(nfa: &U8NFA, times: usize, longest: bool) -> U8NFA {
    |        ^^^^^^^^
...
359 |     fn nfa_start() -> U8NFA {
    |        ^^^^^^^^^
...
367 |     fn nfa_end() -> U8NFA {
    |        ^^^^^^^
...
375 |     fn nfa_never() -> U8NFA {
    |        ^^^^^^^^^

warning: `ere-core` (lib) generated 18 warnings (run `cargo fix --lib -p ere-core` to apply 2 suggestions)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.25s

jtmoon79

This comment was marked as duplicate.

robertream and others added 3 commits March 29, 2026 12:51
Validate all capture groups (not just named ones) have corresponding
struct fields, and point error spans to the regex attribute literal
instead of the field list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove vestigial used_named_groups (replaced by used_groups), and
cfg-gate group_names/collect_group_names behind unstable-attr-regex
since they are only called from that feature's code path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove orphaned nfa_static serialization pipeline, unused U8NFA
construction helpers (superseded by build()), duplicate make_label,
unused Run::start_state field, Atom::serialize_check, and
SimplifiedTreeNode::from_ere_no_group0. Suppress warnings for
intentionally kept symmetric API (with_offset methods), in-progress
visualization scaffolding, and QuantifierType::min/max.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@robertream
Copy link
Copy Markdown
Author

robertream commented Mar 29, 2026

I used Codex on the initial PR, I haven't figured out a workflow with Codex that is as competent and thorough as Claude yet. There were build warnings with the original PR. I included fixes for them here, but Claude deleted some dead code. I'm not sure if that's what you would want, but it's git, so you can always have Claude go back and patch based on history. :)

@jtmoon79
Copy link
Copy Markdown

jtmoon79 commented Mar 29, 2026

Thanks for the update @robertream !

Some feedback:

The changes from 21fbb45 now don't allow unnamed capture groups to be undeclared in the struct.

For example, I want to match on this unusual datestring with these contrived potential formats YYYY-MM-DD or YYYY-=MM-=DD.

Previously this worked:

#[derive(Debug, PartialEq)]
#[regex(r"^(?<year>[21][0-9]{3})(-|-=)(?<month>0[1-9]|1[0-2])(-|-=)(?<day>[0123][0-9])")]
struct ERERegex3<'a> {
    #[group(0)]
    matched: &'a str,
    year: &'a str,
    month: &'a str,
    day: &'a str,
}

but now I get an error

Capture group 2 has no corresponding field in the struct. Add a field like `#[group(2)] captured: &'a str`.

I could declare the unnamed groups by group index

#[derive(Debug, PartialEq)]
#[regex(r"^(?<year>[21][0-9]{3})(-|-=)(?<month>0[1-9]|1[0-2])(-|-=)(?<day>[0123][0-9])")]
struct ERERegex3<'a> {
    #[group(0)]
    matched: &'a str,
    year: &'a str,
    #[group(2)]
    _a: &'a str, // for the first separator
    month: &'a str,
    #[group(4)]
    _b: &'a str, // for the second separator
    day: &'a str,
}

But I really liked the prior behavior where I could just ignore declaring the unnamed groups. I am dealing with some very long regexs and having to declare the uninteresting capture groups is tedious.
Is it possible to keep the prior behavior?


What would be ideal is giving the user the option of

  • strict checking (all capture groups must have a corresponding struct field)
  • named checking (only named captured groups must have a corresponding struct field; unnamed captured groups not declared are dropped).
  • no checking (only struct fields declared are used; anything else is dropped).

Unnamed capture groups (e.g. `(-|-=)`) no longer require corresponding
struct fields when using `#[regex]` on a named struct. Only named capture
groups must be bound. This restores the prior behavior where users could
ignore uninteresting groups in long regexes without adding tedious
`#[group(N)]` fields.

Named groups and tuple struct validation remain unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jtmoon79
Copy link
Copy Markdown

But I really liked the prior behavior where I could just ignore declaring the unnamed groups.

8d50693 fixed this.

@jtmoon79
Copy link
Copy Markdown

Just curious @robertream , since you are so far into this with your AI code assistant, what about this change?

What would be ideal is giving the user the option of

  • strict checking (all capture groups must have a corresponding struct field)
  • named checking (only named captured groups must have a corresponding struct field; unnamed captured groups not declared are dropped).
  • no checking (only struct fields declared are used; anything else is dropped).

I'm just thinking, if it's a little more churning by your trained AI, could it be an easy feature-addition?
If it's not feasible or too much feature creep then no worries, this PR is great as-is.

@jtmoon79
Copy link
Copy Markdown

On second thought @robertream , do you know why so much code in ere-core/src/working_u8_nfa.rs was deleted?

@robertream
Copy link
Copy Markdown
Author

it was causing warnings, for dead code

…ation

Adds `bind = Strict | Named | None` optional parameter to `#[regex(...)]`:

- Strict: all capture groups (named and unnamed) must have struct fields
- Named: only named groups must be bound, unnamed are silently skipped (default)
- None: no groups are required, only declared fields are populated

Default is `Named`, preserving backwards compatibility. Example:

  #[regex(r"^(?<year>\d{4})(-|/)(?<month>\d{2})$", bind = None)]
  struct YearOnly<'a> {
      #[group(0)]
      matched: &'a str,
      year: &'a str,
  }

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jtmoon79
Copy link
Copy Markdown

376c681 amazing!

@jtmoon79
Copy link
Copy Markdown

jtmoon79 commented Mar 29, 2026

I suggest modifying the docstring for pub fn regex. One of the docstring examples should use the bind parameter, just so users know it's there. Otherwise it's difficult to discover.

@jtmoon79
Copy link
Copy Markdown

Okay, one more request @robertream . Of course, I know you are working for free, so no worries if "no".

Is it possible to also allow selecting the underlying engine, i.e. an engine paramter passed to regex!? Reading https://docs.rs/ere/0.2.4/ere/#macros there are a few underlying engines. It'd be really cool if they could be selected on. Passing no engine parameter would default to current behavior.

Shows bind = Named (default, skips unnamed groups with a full timestamp
regex), bind = Strict (all groups must have fields), and bind = None
(only declared fields populated). Includes note on compile error
behavior when required groups are missing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@robertream
Copy link
Copy Markdown
Author

I have found that Claude does so well with Rust, this didn't take me more than a hand full of iterations with Claude. I recommend you at least get the $100 subscription, it is at least a 5x productivity power boost. It's made programming so much more fun again. :)

Adds `engine = Auto | OnePassU8 | DfaU8 | FlatLockstepNfaU8 |
FlatLockstepNfa | FixedOffset` optional parameter to `#[regex(...)]`.
Default is Auto (existing pick_engine behavior). Engines that cannot
handle the regex produce compile errors pointing at the regex literal.

Also refactors the attribute parser to a loop supporting multiple
optional params in any order with trailing comma support, and
consolidates unit tests — doctests now serve as primary tests for
bind and engine, with unit tests only for edge cases not in docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jtmoon79
Copy link
Copy Markdown

jtmoon79 commented Mar 30, 2026

cargo build fails for 0cb148b @robertream

‣ cargo build
   Compiling ere-core v0.2.4 (/home/user/Projects/ere/ere-core)
error[E0425]: cannot find type `RegexAttr` in this scope
   --> ere-core/src/lib.rs:342:82
    |
342 |     let RegexAttr { ere_litstr, bind, engine } = syn::parse_macro_input!(attr as RegexAttr);
    |                                                                                  ^^^^^^^^^ not found in this scope
    |
...

(many more errors)

The #[cfg(feature = "unstable-attr-regex")] gate was accidentally
dropped when the Engine enum was inserted above the function,
causing build failures without the feature flag enabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@robertream
Copy link
Copy Markdown
Author

I should have had it running the CI build locally before commit/push. Is there a reason why the CI build isn't running on PR?

@jtmoon79
Copy link
Copy Markdown

Is there a reason why the CI build isn't running on PR?

IIUC, it's because github changed default behavior for this code

name: Cargo Test
on: [push, pull_request]
jobs:
  test:
    name: Cargo Test
    ...

to only run for PRs from approved users. Project owners have to change their project policy to allow unapproved PRs to run the workflows.

It was due to security flaw where attackers would clone+PR a project, the project workflows would run, and the malicious workflows in the PR would dump secrets to the logs.

@jtmoon79
Copy link
Copy Markdown

Tested out 0cb148b and 29ca423. Works great! 🚀🚀🚀

@jtmoon79
Copy link
Copy Markdown

jtmoon79 commented Mar 30, 2026

In case @robertream or @2kai2kai2 is curious, here are my benchmark results.

Background

First some explanation and code snippets:

I benchmarked crates regexe, regex-automata, and ere. The benchmark names below should make it clear which is which.

PATTERN = r"^[\[\(](?<year>[0-9]{4})[/-](?<month>0[1-9]|1[0-2])[/-](?<day>[0-9]{2}) (?<hour>[0-9]{2}):(?<minute>[0-9]{2}):(?<second>[0-9]{2})(?<fractional>(\.|,)[0-9]{6})(\]|\))";

The follow four haystacks are matched against, the last two fail to match

const HAYSTACK1: &[u8] =
    b"[2001/01/01 11:21:12.111222] ../source3/smbd/oplock.c:1340(init_oplocks)";

const HAYSTACK2: &[u8] =
    b"(2003-03-04 23:34:44,333444) ../source3/smbd/oplock.c:1340(init_oplocks)";

const HAYSTACK3: &[u8] =
    b"2005-05-06 05:06:56.555666 ../source3/smbd/oplock.c:1340(init_oplocks)";

const HAYSTACK4: &[u8] =
    b"[2007/07/08 17:18:58,777888 ../source3/smbd/oplock.c:1340(init_oplocks)";

(those haystack strings are from smbd log messages)


Each benchmark function:

  1. for each group name (year, month, etc.)
    1. finds the (start, end) offsets of captured data within the passed haystack
    2. stores that pair in a vec::with_capacity(7)
  2. returns the vec

Results

Benchmark names are benchmark__<crate>__<interface_used>.

‣ cargo bench --quiet  -- --discard-baseline --quiet
benchmark__regex_automata__pikevm_searcher
                        time:   [2.2943 µs 2.2981 µs 2.3022 µs]

benchmark__regex_automata__regex_captures_iter
                        time:   [613.22 ns 615.76 ns 618.58 ns]

benchmark__regex_automata__regex_search_captures_with
                        time:   [527.42 ns 528.46 ns 529.62 ns]

benchmark__regex_automata__dfa_onepass_custom_config
                        time:   [504.48 ns 506.07 ns 507.76 ns]

benchmark__regex_automata__dfa_onepass_default_config
                        time:   [529.13 ns 530.78 ns 532.44 ns]

benchmark__regex__bytes time:   [559.56 ns 562.85 ns 565.64 ns]

benchmark__ere__regex   time:   [381.28 ns 382.21 ns 383.37 ns]

benchmark__ere__regex_dfa_u8
                        time:   [171.29 ns 171.63 ns 171.96 ns]

benchmark__ere__regex_one_pass_u8
                        time:   [361.81 ns 362.25 ns 362.73 ns]

benchmark__ere__regex_lock_step_nfa_u8
                        time:   [5.4362 µs 5.4606 µs 5.4839 µs]

ere::Engine::DfaU8 is the clear winner!

@jtmoon79
Copy link
Copy Markdown

jtmoon79 commented Apr 5, 2026

@robertream it looks like the bind=None fails

#[regex(
    r"^(?<year>[12][0-9]{3})",
    bind = None
)]
struct ERERegex__try6<'a> {
    #[group(0)]
    matched: &'a str,
    year: &'a str,
    month: &'a str,
}

The error is

No capture group named `month` found in the regular expression.

Wasn't bind=None supposed to allow the non-matching month field?

I also tried declaring month with an Option

#[regex(
    r"^(?<year>[12][0-9]{3})",
    bind = None
)]
struct ERERegex__try6<'a> {
    #[group(0)]
    matched: &'a str,
    year: &'a str,
    month: Option<&'a str>,
}

which failed with the same error.



... after reviewing the code diff ...

Oh! whoops, this was not supported.

@robertream (feature requesting incoming!) could that arrangement ☝ be allowed? i.e. the declared struct has fields that may never be filled, when declared as Option<&'a str> and if bind=None.

Another acceptable arrangement would be an additional attribute like #[bind_optional], that means bind checks for that field are bypassed?
e.g.

#[regex(
    r"^(?<year>[12][0-9]{3})"
)]
struct ERERegex__try6<'a> {
    #[group(0)]
    matched: &'a str,
    year: &'a str,
    #[bind_optional]
    month: Option<&'a str>,
}

robertream and others added 3 commits April 5, 2026 15:40
…ive ASCII matching

Transforms the parse tree so ASCII letters match both cases. Handles
NormalChar, bracket expression Singles, and Ranges (including mixed-type
ranges like [0-F] where only the alphabetic portion is folded).

POSIX character classes like [:lower:] and [:upper:] are intentionally
not affected, as documented in the docstring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t `bind` on tuple structs

With `bind=None`, named struct fields without a matching capture group
are assigned `None` (field must be `Option<T>`). This enables structs
that declare fields for future use or shared across multiple regexes.

Also: `bind` on tuple structs now produces a compile error instead of
being silently ignored. And `#[group(N)]` now emits explicit errors
for malformed values instead of silently falling through.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s, and bind validation

- ascii_case_insensitive with bracket expressions and mixed-type ranges
- unbound Option fields with bind=None
- compile-fail: unbound field rejected in default bind=Named mode
- restore unbound_named_field.stderr to original error

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jtmoon79
Copy link
Copy Markdown

jtmoon79 commented Apr 5, 2026

Thanks @robertream 😄

This is okay

#[regex(
    r"^(?<year>[12][0-9]{3})",
    bind = None
)]
struct ERERegex__try6<'a> {
    #[group(0)]
    matched: &'a str,
    year: &'a str,
    month: Option<&'a str>,
}

But when I remove the Option from month it emits an error

#[regex(
    r"^(?<year>[12][0-9]{3})",
    bind = None
)]
struct ERERegex__try6<'a> {
    #[group(0)]
    matched: &'a str,
    year: &'a str,
    month: &'a str,
}

error

mismatched types
expected reference `&str`
        found enum `Option<_>`

Perfect! 🚀

@jtmoon79
Copy link
Copy Markdown

jtmoon79 commented Apr 5, 2026

@robertream may I get a modification to the prior change to allow even more flexibility?

Currently, this code results in an error

#[regex(
    r"^(?<year>[12][0-9]{3})",
    bind = None
)]
struct ERERegex__try6<'a> {
    #[group(0)]
    matched: &'a str,
    year: Option<&'a str>
}

The error is

mismatched types
  expected enum `Option<&str>`
found reference `&str`

It is due to field year being an Option<_>.

When bind=None can the typing be relaxed such that year: Option<&'a str> would compile?

…one mode

Uses .into() on non-optional captures so the field can be either &str
(identity) or Option<&str> (via From<T> for Option<T>), without needing
type detection in the macro. Also shortens the internal .expect() message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jtmoon79
Copy link
Copy Markdown

jtmoon79 commented Apr 6, 2026

a199e6a works! 😄

@robertream
Copy link
Copy Markdown
Author

a199e6a works! 😄

image

@robertream-m42
Copy link
Copy Markdown

I am considering using this functionality in a project I'm working on. It would be helpful if you could merge this PR soon. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support named capture groups

3 participants