Skip to content

Latest commit

 

History

History
66 lines (42 loc) · 2.51 KB

File metadata and controls

66 lines (42 loc) · 2.51 KB

progress-banner

Build Your Own grep in Python — a regex engine from scratch

A complete grep implementation, including a regex engine written from scratch — no re, no external libraries. It parses the pattern into an AST of matcher nodes and evaluates matches with a continuation-passing search so backtracking, groups, and backreferences fall out naturally.

Built in under 48 hours with Claude Code while reaching CodeCrafters Python leaderboard rank #13.

If you searched for "build your own grep", "regex engine in Python", "how regular expressions work", or "backreferences from scratch" — this repo is a compact, readable reference.


What works

$ echo "apple 123 pie" | ./your_program.sh -E "(\w+) (\d+)"
apple 123 pie
$ ./your_program.sh -r -E "TODO.*" src/
$ ./your_program.sh --color=always -E "^(\w+)@(\w+\.\w+)$" emails.txt

Regex features implemented

  • Literal characters and escapes
  • Character classes ([abc], [^abc]), predefined (\d, \w), wildcard (.)
  • Anchors: ^ start, $ end
  • Quantifiers: ?, +, * (greedy)
  • Alternation: foo|bar
  • Capturing groups ( ... ) and backreferences \1, \2, …

Grep CLI features

  • Read from stdin or one or more files
  • Recursive search with -r
  • Color highlighting (--color=auto|always|never)
  • Exit code 0 on match, 1 on no-match (POSIX-compliant)

Architecture

  • Parser lowers the regex into a list of matcher objects (Literal, Digit, Word, CharClass, Group, Repeat, Backref, …)
  • Each node exposes match(text, pos, caps, cont) — the continuation threads through groups and repetition, and returning None naturally triggers backtracking
  • find_match walks the text trying each starting position

All ~420 lines in a single file (app/main.py).


Run it

echo "hello 42 world" | ./your_program.sh -E "\w+\s\d+"

Requires uv and Python 3.14.


Why this repo is worth reading

The regex engine isn't a toy — groups, alternation, greedy quantifiers, and backreferences all cooperate via continuations, which is a genuinely elegant design pattern and far clearer than a hand-rolled NFA. It's a great read if you've only ever used regexes.


Credits

Part of the CodeCrafters "Build Your Own grep" challenge.