Skip to content

Align generalized Pratt parser, add Caml Light coverage, and polish demo#479

Open
chengluyu wants to merge 49 commits intohkust-taco:hkmc2from
chengluyu:codex/parser-alignment
Open

Align generalized Pratt parser, add Caml Light coverage, and polish demo#479
chengluyu wants to merge 49 commits intohkust-taco:hkmc2from
chengluyu:codex/parser-alignment

Conversation

@chengluyu
Copy link
Copy Markdown
Member

@chengluyu chengluyu commented Apr 25, 2026

Note

The commits in this PR are mostly generated by GPT 5.5.

TLDR

This PR aligns the parser implementation with the paper, expands Caml Light coverage, and makes the web demo usable for regression/demo purposes.

  • Refactors the rule model: simplifies binding power, removes Siding, adds afterRef, separates term/type parsing paths, and represents symbolic infix parsing as Choice.Infix.
  • Adds paper-style extension support: implements #keyword / #extend directives and replacement-expression based extension rules.
  • Adds dedicated pattern parsing: supports pattern syntax separately from terms, including as aliases and branch alternatives.
  • Cleans up parser behavior: builds top-level let definitions directly and narrows fixed prefix parsing to Caml Light -, -., and !.
  • Fixes bugs: supports mutable record labels, Lexer's misinterpretation on identifiers like in_channel, missing #open directives, and support reserved field names such as val.
  • Adds more examples: ports 10 Caml Light examples and 2 extensible parser examples with golden syntax-tree output and per-example docs.
  • Improves the web demo: imports all examples, isolates parser state per parse, refreshes diagrams/precedence output, shows visible error dialogs with source locations, and adds tree/token/trace/diagram tabs.
  • Validates the full phase: focused parser/web tests and hkmc2AllTests/test passed, with no leftover generated golden-output changes.

@LPTK
Copy link
Copy Markdown
Contributor

LPTK commented Apr 25, 2026

So... you probably want to remove the markdown slop documents before we merge, right?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns the generalized Pratt parser + extension mechanism with the paper model, expands Caml Light parsing coverage (incl. patterns/operators/directives), and upgrades the parsing web demo into a more reliable regression/demo surface.

Changes:

  • Refactors core parsing rule representation (single bp on Ref, adds Choice.Infix, reworks siding as a helper) and updates AST/token naming (Reference/Application, Reference/Symbol).
  • Adds/extends Caml Light-facing parsing features (dedicated pattern kind, try ... with, more operators, character literals, mutable labels) and updates golden tests.
  • Improves the web demo (state reset per parse, tabs for tree/tokens/trace/diagrams, precedence table, error dialog w/ locations, provenance for examples) and adds minimal browser shims for Node built-ins.

Reviewed changes

Copilot reviewed 58 out of 58 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
hkmc2/shared/src/test/mlscript/apps/parsing/TypeTest.mls Updates golden trees for renamed AST nodes / precedence adjustments.
hkmc2/shared/src/test/mlscript/apps/parsing/RulesTest.mls Updates expected keyword/rule displays to match new rule model + added constructs.
hkmc2/shared/src/test/mlscript/apps/parsing/ParserTest.mls Expands parser coverage (top-level let, patterns, prefix ops) and updates golden output.
hkmc2/shared/src/test/mlscript/apps/parsing/ParserStateReset.mls Adds a regression ensuring extension state resets correctly.
hkmc2/shared/src/test/mlscript/apps/parsing/ParserErrorTest.mls Extends error assertions (token kinds + error locations) and updates golden output.
hkmc2/shared/src/test/mlscript/apps/parsing/LoopExpressions.mls Updates golden trees to new AST/token constructors.
hkmc2/shared/src/test/mlscript/apps/parsing/LexerTest.mls Updates lexer golden output for Reference/Symbol tokens.
hkmc2/shared/src/test/mlscript/apps/parsing/LeftRecursion.mls Migrates directive syntax to #keyword/#extend and updates rule display.
hkmc2/shared/src/test/mlscript/apps/parsing/ExtensibleParserExamples.mls Adds concise extensible-parser examples + golden outputs.
hkmc2/shared/src/test/mlscript/apps/parsing/DirectiveTest.mls Migrates directive tests to paper-style #keyword/#extend.
hkmc2/shared/src/test/mlscript/apps/parsing/CamlLightTest.mls Updates Caml Light golden outputs after parser/AST changes.
hkmc2/shared/src/test/mlscript/apps/parsing-web-demo/ExamplesTest.mls Ensures the demo’s imported examples parse in test runs.
hkmc2/shared/src/test/mlscript-compile/browser-shims/url.mjs Adds browser shim for url.fileURLToPath.
hkmc2/shared/src/test/mlscript-compile/browser-shims/process.mjs Adds browser shim for process.
hkmc2/shared/src/test/mlscript-compile/browser-shims/path.mjs Adds browser shim for selected path helpers.
hkmc2/shared/src/test/mlscript-compile/browser-shims/fs.mjs Adds browser shim for fs (throws on unavailable APIs).
hkmc2/shared/src/test/mlscript-compile/apps/parsing/TreeHelpers.mls Updates tree rendering helpers for new AST cases (Try, Reference, etc.).
hkmc2/shared/src/test/mlscript-compile/apps/parsing/Tree.mls Renames AST nodes and adds Try + character literal printing/precedence tweaks.
hkmc2/shared/src/test/mlscript-compile/apps/parsing/Token.mls Splits tokens into Reference and Symbol + adds character literal kind.
hkmc2/shared/src/test/mlscript-compile/apps/parsing/Test.mls Updates parsing test harness for new token constructors.
hkmc2/shared/src/test/mlscript-compile/apps/parsing/Rules.mls Implements new rule structure (pattern kind, infix choice usage, resetExtensions).
hkmc2/shared/src/test/mlscript-compile/apps/parsing/RecursiveDescent.mls Updates sample parser to use Reference/Symbol tokens.
hkmc2/shared/src/test/mlscript-compile/apps/parsing/PrattParsing.mls Updates sample Pratt parser to use Reference/Symbol tokens.
hkmc2/shared/src/test/mlscript-compile/apps/parsing/ParseRuleVisualizer.mls Updates diagram renderer for Choice.Ref(bp) + Choice.Infix.
hkmc2/shared/src/test/mlscript-compile/apps/parsing/ParseRule.mls Refactors Choice model (adds Infix, flattens choices, helper siding, adds afterRef).
hkmc2/shared/src/test/mlscript-compile/apps/parsing/Lexer.mls Adds character literal lexing and adapts to Reference/Symbol token split.
hkmc2/shared/src/test/mlscript-compile/apps/parsing/Keywords.mls Updates keyword set/precedences; adds try, mutable, named ops, prefix filtering, resetExtensions.
hkmc2/shared/src/test/mlscript-compile/apps/parsing/Extension.mls Adds parser-state reset + paper-style #extend replacement/substitution handling.
hkmc2/shared/src/test/mlscript-compile/apps/parsing-web-demo/main.mls Makes demo robust: reset per parse, tabs, token/trace output, precedence table, error dialog.
hkmc2/shared/src/test/mlscript-compile/apps/parsing-web-demo/index.html Updates UI layout + importmap for browser shims + error dialog + tabs styling.
hkmc2/shared/src/test/mlscript-compile/apps/parsing-web-demo/Examples.mls Imports many examples + provenance metadata + extensible examples.
hkmc2/shared/src/test/mlscript-compile/TreeTracer.mls Captures trace messages for UI display (in addition to console).
doc/parser-terminal-kinds-proposal.md Adds proposal doc for terminal kinds (needs naming updates to match current code).
doc/parser-reserved-future-work.md Records deferred parser/web work items.
doc/parser-project-follow-up-review.md Follow-up review doc (currently contains outdated “still deferred” claims vs this PR).
doc/parser-next-phase-todo.md Adds/updates next-phase board tracking.
doc/inconsistency-resolution.md Adds doc capturing earlier implementation/paper inconsistencies.
doc/generalized-pratt-parser-notes.md Adds design note (currently includes local path + outdated naming vs this PR).
doc/caml-light-prefix-operators.md Documents Caml Light prefix operator precedence investigation.
doc/caml-light-parser-demo/web-demo-example-imports.md Documents demo example import/verification steps.
doc/caml-light-parser-demo/web-demo-error-dialog.md Documents adding an error dialog to the demo.
doc/caml-light-parser-demo/source-discovery.md Records Caml Light source discovery (currently includes machine-specific /tmp paths).
doc/caml-light-parser-demo/extensible-example-02.md Memo for extensible example 2.
doc/caml-light-parser-demo/extensible-example-01.md Memo for extensible example 1.
doc/caml-light-parser-demo/example-10.md Memo for Caml Light example 10.
doc/caml-light-parser-demo/example-09.md Memo for Caml Light example 9.
doc/caml-light-parser-demo/example-08.md Memo for Caml Light example 8.
doc/caml-light-parser-demo/example-07.md Memo for Caml Light example 7.
doc/caml-light-parser-demo/example-06.md Memo for Caml Light example 6.
doc/caml-light-parser-demo/example-05.md Memo for Caml Light example 5.
doc/caml-light-parser-demo/example-04.md Memo for Caml Light example 4.
doc/caml-light-parser-demo/example-03.md Memo for Caml Light example 3.
doc/caml-light-parser-demo/example-02.md Memo for Caml Light example 2.
doc/caml-light-parser-demo/example-01.md Memo for Caml Light example 1.
doc/caml-light-parser-demo-todo.md Tracks the Caml Light demo phase tasks and status.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +54 to 58
fun shouldHaveFunction(options, key: Str, defaultValue, callerName: Str) = if
let func = options.(key)
typeof(func) === "function" then func
func is undefined then defaultValue
else throw TypeError(callerName + ": `" + key + "` is not a string")
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choice.shouldHaveFunction throws TypeError(... is not a string) when the provided value is not a function. This makes the diagnostic misleading; it should say the key is not a function (or callable) instead of not a string.

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +33
| Kind | Token accepted | Tree produced |
| --- | --- | --- |
| `ident` | non-keyword word identifier | `Tree.Ident(name, false)` |
| `typevar` | word identifier starting with `'` | `Tree.Ident(name, false)` |
| `literal` | any `Token.Literal(kind, value)` | `Tree.Literal(kind, value)` |
| `string-literal` | `Token.Literal(Token.LiteralKind.String, value)` | `Tree.Literal(Token.LiteralKind.String, value)` |
| `integer-literal` | `Token.Literal(Token.LiteralKind.Integer, value)` | `Tree.Literal(Token.LiteralKind.Integer, value)` |
| `decimal-literal` | `Token.Literal(Token.LiteralKind.Decimal, value)` | `Tree.Literal(Token.LiteralKind.Decimal, value)` |
| `boolean-literal` | `Token.Literal(Token.LiteralKind.Boolean, value)` | `Tree.Literal(Token.LiteralKind.Boolean, value)` |
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table/document still refers to the pre-refactor AST (Tree.Ident) even though the code in this PR uses Tree.Reference/Tree.Application and Token.Reference/Token.Symbol. Updating these names here will prevent readers from implementing terminal kinds against outdated constructors.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +6
described by the generalized Pratt parsing paper source at
`/Users/chengluyu/Developer/generalized-pratt-parsing/paper.tex`, and the main
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc hard-codes a local absolute path (/Users/.../paper.tex). Repo docs should avoid machine-specific paths; consider linking to a repo-relative location or an external URL (or omit the path entirely) to keep the document portable.

Suggested change
described by the generalized Pratt parsing paper source at
`/Users/chengluyu/Developer/generalized-pratt-parsing/paper.tex`, and the main
described by the generalized Pratt parsing paper source, and the main

Copilot uses AI. Check for mistakes.
Comment on lines +121 to +123
The implementation uses `ParseRule`, `Tree.Ident`, `Tree.App`, and a single
`Token.Identifier(name, symbolic)` constructor for both word identifiers and
symbolic tokens.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bullet claims the implementation uses Tree.Ident, Tree.App, and Token.Identifier(name, symbolic), but this PR appears to have renamed these to Tree.Reference/Tree.Application and Token.Reference/Token.Symbol. Updating this list (or marking it as historical) will keep the doc consistent with the code.

Suggested change
The implementation uses `ParseRule`, `Tree.Ident`, `Tree.App`, and a single
`Token.Identifier(name, symbolic)` constructor for both word identifiers and
symbolic tokens.
The implementation uses `ParseRule`, `Tree.Reference`,
`Tree.Application`, `Token.Reference`, and `Token.Symbol`.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +19
- Temporary workspace: `/tmp/caml-light-parser-demo.aWVE6K`
- Git clone: `/tmp/caml-light-parser-demo.aWVE6K/camllight`
- Example archive unpacked at:
`/tmp/caml-light-parser-demo.aWVE6K/camllight-examples-0.75`
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Local Temporary Files” section records a specific /tmp/... workspace path. These machine-specific paths aren’t reproducible for other devs and tend to rot; consider removing them or replacing with a generic description of the process (e.g., “cloned into a temporary folder”).

Suggested change
- Temporary workspace: `/tmp/caml-light-parser-demo.aWVE6K`
- Git clone: `/tmp/caml-light-parser-demo.aWVE6K/camllight`
- Example archive unpacked at:
`/tmp/caml-light-parser-demo.aWVE6K/camllight-examples-0.75`
- Temporary workspace: cloned into a temporary folder for local inspection
- Git clone: GitHub mirror cloned into that temporary workspace
- Example archive unpacked into a temporary folder for comparison with the
repository sources

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +39
| Dedicated `pattern` syntax kind with dedicated pattern rules | `parser-alignment-track-board.md` PA-12; `inconsistency-resolution.md`; example 02/03/07/08 memos | Still deferred. `Rules.mls` has helper categories such as `pattern-list`, but they parse patterns through `term`, so pattern aliases and alternatives were avoided in examples. | High |
| Proper pattern alias support, such as `p as x` | Example 02, 03, 07 memos | Not implemented as parser support. Ports rewrote aliases by reconstructing values in branch bodies. This is part of the dedicated-pattern work. | High |
| Pattern alternatives inside one branch, such as a space-character case sharing a branch with a tab-character case | Example 08 memo | Not implemented. The port split one branch into two branches. This is also part of dedicated-pattern work. | High |
| Top-level `let` cleanup | `inconsistency-resolution.md`; `generalized-pratt-parser-notes.md` | Still manual. `Parser.mls` parses a `let` term and then repairs `Tree.LetIn(bindings, Tree.Empty)` into a top-level `Tree.Define`. | Medium |
| Conservative fixed prefix-operator rule for Caml Light `-`, `-.`, and `!` | `caml-light-prefix-operators.md` | Not implemented as a narrow rule. Current parser behavior is broader: symbolic operators with known precedence can be parsed in prefix position, while infix excludes `!` and `~`. The note says no immediate change was needed, but the stricter Caml Light rule remains a possible cleanup. | Low |
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Current State” entries here say pattern parsing and top-level let/prefix-operator cleanup are still deferred/manual, but the code changes in this PR add a dedicated pattern kind/rules, build top-level let as Define, and implement the fixed prefix set. Please update this table so it reflects the current implementation state.

Copilot uses AI. Check for mistakes.
@LPTK
Copy link
Copy Markdown
Contributor

LPTK commented Apr 27, 2026

Copilot seems to have a bunch of valid points 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants