A complete grep implementation, including a regex engine written from scratch — no re, no external libraries. It parses the pattern into an AST of matcher nodes and evaluates matches with a continuation-passing search so backtracking, groups, and backreferences fall out naturally.
Built in under 48 hours with Claude Code while reaching CodeCrafters Python leaderboard rank #13.
If you searched for "build your own grep", "regex engine in Python", "how regular expressions work", or "backreferences from scratch" — this repo is a compact, readable reference.
$ echo "apple 123 pie" | ./your_program.sh -E "(\w+) (\d+)"
apple 123 pie
$ ./your_program.sh -r -E "TODO.*" src/
$ ./your_program.sh --color=always -E "^(\w+)@(\w+\.\w+)$" emails.txt- Literal characters and escapes
- Character classes (
[abc],[^abc]), predefined (\d,\w), wildcard (.) - Anchors:
^start,$end - Quantifiers:
?,+,*(greedy) - Alternation:
foo|bar - Capturing groups
( ... )and backreferences\1,\2, …
- Read from stdin or one or more files
- Recursive search with
-r - Color highlighting (
--color=auto|always|never) - Exit code 0 on match, 1 on no-match (POSIX-compliant)
Parserlowers the regex into a list of matcher objects (Literal,Digit,Word,CharClass,Group,Repeat,Backref, …)- Each node exposes
match(text, pos, caps, cont)— the continuation threads through groups and repetition, and returningNonenaturally triggers backtracking find_matchwalks the text trying each starting position
All ~420 lines in a single file (app/main.py).
echo "hello 42 world" | ./your_program.sh -E "\w+\s\d+"Requires uv and Python 3.14.
The regex engine isn't a toy — groups, alternation, greedy quantifiers, and backreferences all cooperate via continuations, which is a genuinely elegant design pattern and far clearer than a hand-rolled NFA. It's a great read if you've only ever used regexes.
Part of the CodeCrafters "Build Your Own grep" challenge.