Align generalized Pratt parser, add Caml Light coverage, and polish demo#479
Align generalized Pratt parser, add Caml Light coverage, and polish demo#479chengluyu wants to merge 49 commits intohkust-taco:hkmc2from
Conversation
|
So... you probably want to remove the markdown slop documents before we merge, right? |
There was a problem hiding this comment.
Pull request overview
Aligns the generalized Pratt parser + extension mechanism with the paper model, expands Caml Light parsing coverage (incl. patterns/operators/directives), and upgrades the parsing web demo into a more reliable regression/demo surface.
Changes:
- Refactors core parsing rule representation (single
bponRef, addsChoice.Infix, reworkssidingas a helper) and updates AST/token naming (Reference/Application,Reference/Symbol). - Adds/extends Caml Light-facing parsing features (dedicated
patternkind,try ... with, more operators, character literals, mutable labels) and updates golden tests. - Improves the web demo (state reset per parse, tabs for tree/tokens/trace/diagrams, precedence table, error dialog w/ locations, provenance for examples) and adds minimal browser shims for Node built-ins.
Reviewed changes
Copilot reviewed 58 out of 58 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| hkmc2/shared/src/test/mlscript/apps/parsing/TypeTest.mls | Updates golden trees for renamed AST nodes / precedence adjustments. |
| hkmc2/shared/src/test/mlscript/apps/parsing/RulesTest.mls | Updates expected keyword/rule displays to match new rule model + added constructs. |
| hkmc2/shared/src/test/mlscript/apps/parsing/ParserTest.mls | Expands parser coverage (top-level let, patterns, prefix ops) and updates golden output. |
| hkmc2/shared/src/test/mlscript/apps/parsing/ParserStateReset.mls | Adds a regression ensuring extension state resets correctly. |
| hkmc2/shared/src/test/mlscript/apps/parsing/ParserErrorTest.mls | Extends error assertions (token kinds + error locations) and updates golden output. |
| hkmc2/shared/src/test/mlscript/apps/parsing/LoopExpressions.mls | Updates golden trees to new AST/token constructors. |
| hkmc2/shared/src/test/mlscript/apps/parsing/LexerTest.mls | Updates lexer golden output for Reference/Symbol tokens. |
| hkmc2/shared/src/test/mlscript/apps/parsing/LeftRecursion.mls | Migrates directive syntax to #keyword/#extend and updates rule display. |
| hkmc2/shared/src/test/mlscript/apps/parsing/ExtensibleParserExamples.mls | Adds concise extensible-parser examples + golden outputs. |
| hkmc2/shared/src/test/mlscript/apps/parsing/DirectiveTest.mls | Migrates directive tests to paper-style #keyword/#extend. |
| hkmc2/shared/src/test/mlscript/apps/parsing/CamlLightTest.mls | Updates Caml Light golden outputs after parser/AST changes. |
| hkmc2/shared/src/test/mlscript/apps/parsing-web-demo/ExamplesTest.mls | Ensures the demo’s imported examples parse in test runs. |
| hkmc2/shared/src/test/mlscript-compile/browser-shims/url.mjs | Adds browser shim for url.fileURLToPath. |
| hkmc2/shared/src/test/mlscript-compile/browser-shims/process.mjs | Adds browser shim for process. |
| hkmc2/shared/src/test/mlscript-compile/browser-shims/path.mjs | Adds browser shim for selected path helpers. |
| hkmc2/shared/src/test/mlscript-compile/browser-shims/fs.mjs | Adds browser shim for fs (throws on unavailable APIs). |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/TreeHelpers.mls | Updates tree rendering helpers for new AST cases (Try, Reference, etc.). |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/Tree.mls | Renames AST nodes and adds Try + character literal printing/precedence tweaks. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/Token.mls | Splits tokens into Reference and Symbol + adds character literal kind. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/Test.mls | Updates parsing test harness for new token constructors. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/Rules.mls | Implements new rule structure (pattern kind, infix choice usage, resetExtensions). |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/RecursiveDescent.mls | Updates sample parser to use Reference/Symbol tokens. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/PrattParsing.mls | Updates sample Pratt parser to use Reference/Symbol tokens. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/ParseRuleVisualizer.mls | Updates diagram renderer for Choice.Ref(bp) + Choice.Infix. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/ParseRule.mls | Refactors Choice model (adds Infix, flattens choices, helper siding, adds afterRef). |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/Lexer.mls | Adds character literal lexing and adapts to Reference/Symbol token split. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/Keywords.mls | Updates keyword set/precedences; adds try, mutable, named ops, prefix filtering, resetExtensions. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/Extension.mls | Adds parser-state reset + paper-style #extend replacement/substitution handling. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing-web-demo/main.mls | Makes demo robust: reset per parse, tabs, token/trace output, precedence table, error dialog. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing-web-demo/index.html | Updates UI layout + importmap for browser shims + error dialog + tabs styling. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing-web-demo/Examples.mls | Imports many examples + provenance metadata + extensible examples. |
| hkmc2/shared/src/test/mlscript-compile/TreeTracer.mls | Captures trace messages for UI display (in addition to console). |
| doc/parser-terminal-kinds-proposal.md | Adds proposal doc for terminal kinds (needs naming updates to match current code). |
| doc/parser-reserved-future-work.md | Records deferred parser/web work items. |
| doc/parser-project-follow-up-review.md | Follow-up review doc (currently contains outdated “still deferred” claims vs this PR). |
| doc/parser-next-phase-todo.md | Adds/updates next-phase board tracking. |
| doc/inconsistency-resolution.md | Adds doc capturing earlier implementation/paper inconsistencies. |
| doc/generalized-pratt-parser-notes.md | Adds design note (currently includes local path + outdated naming vs this PR). |
| doc/caml-light-prefix-operators.md | Documents Caml Light prefix operator precedence investigation. |
| doc/caml-light-parser-demo/web-demo-example-imports.md | Documents demo example import/verification steps. |
| doc/caml-light-parser-demo/web-demo-error-dialog.md | Documents adding an error dialog to the demo. |
| doc/caml-light-parser-demo/source-discovery.md | Records Caml Light source discovery (currently includes machine-specific /tmp paths). |
| doc/caml-light-parser-demo/extensible-example-02.md | Memo for extensible example 2. |
| doc/caml-light-parser-demo/extensible-example-01.md | Memo for extensible example 1. |
| doc/caml-light-parser-demo/example-10.md | Memo for Caml Light example 10. |
| doc/caml-light-parser-demo/example-09.md | Memo for Caml Light example 9. |
| doc/caml-light-parser-demo/example-08.md | Memo for Caml Light example 8. |
| doc/caml-light-parser-demo/example-07.md | Memo for Caml Light example 7. |
| doc/caml-light-parser-demo/example-06.md | Memo for Caml Light example 6. |
| doc/caml-light-parser-demo/example-05.md | Memo for Caml Light example 5. |
| doc/caml-light-parser-demo/example-04.md | Memo for Caml Light example 4. |
| doc/caml-light-parser-demo/example-03.md | Memo for Caml Light example 3. |
| doc/caml-light-parser-demo/example-02.md | Memo for Caml Light example 2. |
| doc/caml-light-parser-demo/example-01.md | Memo for Caml Light example 1. |
| doc/caml-light-parser-demo-todo.md | Tracks the Caml Light demo phase tasks and status. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fun shouldHaveFunction(options, key: Str, defaultValue, callerName: Str) = if | ||
| let func = options.(key) | ||
| typeof(func) === "function" then func | ||
| func is undefined then defaultValue | ||
| else throw TypeError(callerName + ": `" + key + "` is not a string") |
There was a problem hiding this comment.
Choice.shouldHaveFunction throws TypeError(... is not a string) when the provided value is not a function. This makes the diagnostic misleading; it should say the key is not a function (or callable) instead of not a string.
| | Kind | Token accepted | Tree produced | | ||
| | --- | --- | --- | | ||
| | `ident` | non-keyword word identifier | `Tree.Ident(name, false)` | | ||
| | `typevar` | word identifier starting with `'` | `Tree.Ident(name, false)` | | ||
| | `literal` | any `Token.Literal(kind, value)` | `Tree.Literal(kind, value)` | | ||
| | `string-literal` | `Token.Literal(Token.LiteralKind.String, value)` | `Tree.Literal(Token.LiteralKind.String, value)` | | ||
| | `integer-literal` | `Token.Literal(Token.LiteralKind.Integer, value)` | `Tree.Literal(Token.LiteralKind.Integer, value)` | | ||
| | `decimal-literal` | `Token.Literal(Token.LiteralKind.Decimal, value)` | `Tree.Literal(Token.LiteralKind.Decimal, value)` | | ||
| | `boolean-literal` | `Token.Literal(Token.LiteralKind.Boolean, value)` | `Tree.Literal(Token.LiteralKind.Boolean, value)` | |
There was a problem hiding this comment.
This table/document still refers to the pre-refactor AST (Tree.Ident) even though the code in this PR uses Tree.Reference/Tree.Application and Token.Reference/Token.Symbol. Updating these names here will prevent readers from implementing terminal kinds against outdated constructors.
| described by the generalized Pratt parsing paper source at | ||
| `/Users/chengluyu/Developer/generalized-pratt-parsing/paper.tex`, and the main |
There was a problem hiding this comment.
This doc hard-codes a local absolute path (/Users/.../paper.tex). Repo docs should avoid machine-specific paths; consider linking to a repo-relative location or an external URL (or omit the path entirely) to keep the document portable.
| described by the generalized Pratt parsing paper source at | |
| `/Users/chengluyu/Developer/generalized-pratt-parsing/paper.tex`, and the main | |
| described by the generalized Pratt parsing paper source, and the main |
| The implementation uses `ParseRule`, `Tree.Ident`, `Tree.App`, and a single | ||
| `Token.Identifier(name, symbolic)` constructor for both word identifiers and | ||
| symbolic tokens. |
There was a problem hiding this comment.
This bullet claims the implementation uses Tree.Ident, Tree.App, and Token.Identifier(name, symbolic), but this PR appears to have renamed these to Tree.Reference/Tree.Application and Token.Reference/Token.Symbol. Updating this list (or marking it as historical) will keep the doc consistent with the code.
| The implementation uses `ParseRule`, `Tree.Ident`, `Tree.App`, and a single | |
| `Token.Identifier(name, symbolic)` constructor for both word identifiers and | |
| symbolic tokens. | |
| The implementation uses `ParseRule`, `Tree.Reference`, | |
| `Tree.Application`, `Token.Reference`, and `Token.Symbol`. |
| - Temporary workspace: `/tmp/caml-light-parser-demo.aWVE6K` | ||
| - Git clone: `/tmp/caml-light-parser-demo.aWVE6K/camllight` | ||
| - Example archive unpacked at: | ||
| `/tmp/caml-light-parser-demo.aWVE6K/camllight-examples-0.75` |
There was a problem hiding this comment.
The “Local Temporary Files” section records a specific /tmp/... workspace path. These machine-specific paths aren’t reproducible for other devs and tend to rot; consider removing them or replacing with a generic description of the process (e.g., “cloned into a temporary folder”).
| - Temporary workspace: `/tmp/caml-light-parser-demo.aWVE6K` | |
| - Git clone: `/tmp/caml-light-parser-demo.aWVE6K/camllight` | |
| - Example archive unpacked at: | |
| `/tmp/caml-light-parser-demo.aWVE6K/camllight-examples-0.75` | |
| - Temporary workspace: cloned into a temporary folder for local inspection | |
| - Git clone: GitHub mirror cloned into that temporary workspace | |
| - Example archive unpacked into a temporary folder for comparison with the | |
| repository sources |
| | Dedicated `pattern` syntax kind with dedicated pattern rules | `parser-alignment-track-board.md` PA-12; `inconsistency-resolution.md`; example 02/03/07/08 memos | Still deferred. `Rules.mls` has helper categories such as `pattern-list`, but they parse patterns through `term`, so pattern aliases and alternatives were avoided in examples. | High | | ||
| | Proper pattern alias support, such as `p as x` | Example 02, 03, 07 memos | Not implemented as parser support. Ports rewrote aliases by reconstructing values in branch bodies. This is part of the dedicated-pattern work. | High | | ||
| | Pattern alternatives inside one branch, such as a space-character case sharing a branch with a tab-character case | Example 08 memo | Not implemented. The port split one branch into two branches. This is also part of dedicated-pattern work. | High | | ||
| | Top-level `let` cleanup | `inconsistency-resolution.md`; `generalized-pratt-parser-notes.md` | Still manual. `Parser.mls` parses a `let` term and then repairs `Tree.LetIn(bindings, Tree.Empty)` into a top-level `Tree.Define`. | Medium | | ||
| | Conservative fixed prefix-operator rule for Caml Light `-`, `-.`, and `!` | `caml-light-prefix-operators.md` | Not implemented as a narrow rule. Current parser behavior is broader: symbolic operators with known precedence can be parsed in prefix position, while infix excludes `!` and `~`. The note says no immediate change was needed, but the stricter Caml Light rule remains a possible cleanup. | Low | |
There was a problem hiding this comment.
The “Current State” entries here say pattern parsing and top-level let/prefix-operator cleanup are still deferred/manual, but the code changes in this PR add a dedicated pattern kind/rules, build top-level let as Define, and implement the fixed prefix set. Please update this table so it reflects the current implementation state.
|
Copilot seems to have a bunch of valid points 👀 |
Note
The commits in this PR are mostly generated by GPT 5.5.
TLDR
This PR aligns the parser implementation with the paper, expands Caml Light coverage, and makes the web demo usable for regression/demo purposes.
Siding, addsafterRef, separates term/type parsing paths, and represents symbolic infix parsing asChoice.Infix.#keyword/#extenddirectives and replacement-expression based extension rules.asaliases and branch alternatives.letdefinitions directly and narrows fixed prefix parsing to Caml Light-,-., and!.mutablerecord labels,Lexer's misinterpretation on identifiers likein_channel, missing#opendirectives, and support reserved field names such asval.hkmc2AllTests/testpassed, with no leftover generated golden-output changes.