I sing of golden-throned Hera whom Rhea bare. Queen of the Immortals is she, surpassing all in beauty: she is the sister and wife of loud-thundering Zeus,--the glorious one whom all the blessed throughout high Olympos reverence and honour even as Zeus who delights in thunder.
— Homeric Hymn 12 to Hera (trans. Evelyn-White) (Greek epic C7th to 4th B.C.)
The mother of all parsers.
- Parsing Expression Grammars: A Recognition-Based Syntactic Foundation
- Wikipedia
- Developer Guide
- CHANGELOG.md
npm install @danielx/heraExample grammar that counts the number of 'a's followed by the number of 'b's then returns the difference.
# parser.hera
Program
A:a B:b ->
return a.length - b.length
A
"a"*
B
"b"*
Import the parser into your JS/TS files:
// index.mjs
import { parse } from "./parser.hera"
console.log(parse("aaaa"), parse("bbb"), parse("aabb")) // 4 -3 0
parse("c") // throws errorRun on the command line:
node --import '@danielx/hera/register' index.mjsThe handler blocks in your grammar can also be written in TypeScript or Civet syntax. In that case, use the appropriate loader:
node --import '@danielx/hera/register/tsc' index.mjs
# or
node --import '@danielx/hera/register/civet' index.mjsUse the Hera esbuild plugin to bundle:
import esbuild from "esbuild"
import heraPlugin from "@danielx/hera/esbuild"
await esbuild.build({
entryPoints: ['index.mjs'],
bundle: true,
outfile: 'out.js',
plugins: [heraPlugin()],
})If your handler blocks use TypeScript or Civet syntax,
pass language: "typescript" or language: "civet" into heraPlugin(...).
Hera uses Parsing Expression grammars to create parsers for programmatic languages.
Hera grammars are indentation based, with each rule consisting of a name, indented choices that the name could expand to, and an optional further-indented code block (handler) for each choice:
RuleName
Choice1
Choice2 ->
...code...
Parsing makes heavy use of the built-in regular expression capabilities of JavaScript. Terminals are either literal strings or regular expressions. Rules are composed of choices or sequences of other rules and terminals.
The first rule listed in the grammar is the starting point. Each choice for the rule is checked in order, returning on the first match.
Rule: A named production. The name is written on one unindented line by itself, and the choices (possible expansions) are written on separate lines with common indentation. For example:
RuleName
Choice1
Choice2
Choices are attempted in order, and the first one to succeed wins. Note that this property is recursive, so may involve backtracking. Each choice can be any expression, as defined below, together with an optional handler.
Expression: An expression can be a sequence, choice expression, or repetition of terminals, rule names, or expressions (recursive sequences, choice expressions, or repetitions). When mixing sequences, choice expressions, and repetitions, use parentheses to separate them. For example, Part ( "," Part )* is a sequence of a rule name and a repetition of a terminal and a rule name, representing one or more Parts separated by commas.
Choice expression (/): A short inline way to specify a choice between two or more subchoices. For example, This / That / Other matches This or That or Other, whichever succeeds first. This is equivalent to AnonymousRule where
AnonymousRule
This
That
Other
Sequence ( ): One thing after another, separated by spaces. For example, "(" Expr ")" matches the character "(" followed by a match of Expr followed by the character ")". Sequences with more than one part return an array of the parts.
Terminal ("...", /.../):
A string literal (surrounded by double quotes)
or a regular expression (normally surrounded by forward slashes).
Simple regular expressions consisting of just . or character classes
like [A-Z][a-z]* do not need surrounding slashes.
In any case, the entire terminal must be matched at the exact position.
(For regular expressions, this is as if the expressions started with ^
and it was applied to the rest of the string.)
Terminals return a string when they match.
If the entire choice of a rule is a regular expression, then
the groups of the regular expression are available as $1, $2, ...
and the matching string is available as $0.
Repetition (*, +): ...* means "zero or more expansions of ...", and ...+ means one or more repetitions of Choice. Repetitions return an array of the matches.
Optional (?): ...? means "zero or one expansion of ...". If ... matches, ...? returns that value directly. Otherwise it succeeds without consuming input and returns undefined. Unlike * and +, ? does not wrap its result in an array.
Lookahead predicates (&, !): &... and !... assert the existence or non-existence, respectively, of a match of ..., without advancing the position or consuming any input. For example, &/\s/ is like the look-ahead regular expression /(?=\s)/.
Both & and ! predicates returns true, so that matching vs. not can be distinguished when as marked optional with ? (e.g. (&Pattern)?).
Stringify ($): $... matches ... but returns just the string of the input that matched, instead of the computed return value from the matching process (from handlers and the arrays from sequences and repetitions).
Handler: A mapping from the matched choice to a language primitive.
Handlers are attached to rule choices by adding -> after the choice.
Optionally, -> can be preceded by a return type annotation of the form ::type.
The most general handler is JavaScript, TypeScript, or Civet code indented
beneath the choice, which returns the desired value for the matched choice.
Alternatively, a single expression can start on the same line as ->, and it will be implicitly returned.
In either case, the handler code can refer to the default value
(strings for terminals, arrays for sequences or repetitions) via $0,
which is what the choice returns if you don't provide a handler.
The nth matching item in the topmost sequence can be accessed via $n;
each item in the topmost sequence can also be named via a :name suffix
(for example, Block:name), and then the code can also refer to it as name.
If the expansion is a single regular expression,
$n instead refers to the nth group in the regex
(and $0 is the full matching string).
The handler code can return the special value $skip
to indicate a failed match.
Comment (#...):
Outside of handlers, lines starting with # (after possible indentation) are treated as comments.
Inside handlers, use the handler language's comments, e.g., // or /*...*/.
You can use three backticks to create a code block that is inserted directly into the compiled file, using the same language as handlers. These are useful for creating utility functions or adding exports.
```
function toInt(n) {
return parseInt(n)
}
```
Number
[0-9]+ ->
return toInt($0)
If these demos are not interactive then view this page at https://danielx.net/hera/docs/README.html
URL Parser https://tools.ietf.org/html/rfc3986
#! hera url
Math example.
#! hera math
Hera is self generating:
#! hera hera
Token location example
#! hera Grammar Punctuation? A+ Punctuation? -> return [$1, ...$2, $3].filter((token) => token !== undefined) A ("a" / "A") -> return {type: "A", loc: $loc, value: $1} Punctuation "!" / "." / "?" -> return {type: "Punctuation", loc: $loc, value: $1}
Regex Groups
#! hera Phone /1-(\d{3})-(\d{3})-(\d{4})/ -> [$1, $2, $3]
#! hera Grammar NamedMapping NamedMapping NamedMapping Punctuation -> ["P", $0] Punctuation "."
Compiling parsers to TypeScript.
EOS - End of statement
EOF - End of file/input
Easier way to output a string from a portion of a matching sequence. Maybe add a caret/select prefix operator.
Optimize option, sequence, and repetition of regexes (combine together) to reduce calls to invoke.
Splat in mapping and other convience mappings.
Named arguments to handlers.
Reduce backtracking on common subsequence:
RuleBody
Indent Sequence EOS (Indent ^Sequence EOS)+ -> ["/", [2, 4...]]
Indent Sequence EOS -> 2
The above rule should be able to be made efficient (won't need to backtrack all the way to the beginning) since it has a common subsequence it should be able to re-use the work already done.
One alternative is to make it one rule with an optional section and add logic into the handler, but that seems crude.
#! setup require("./interactive")(register)
To enable direct import or require of .hera files,
use the appropriate loader for your handler language:
Loads .hera files that compile to JavaScript.
Loads .hera files that compile to TypeScript.
Attempts to load the typescript npm module to transpile the resulting TypeScript to JavaScript.
Loads .hera files that compile to Civet.
Attempts to load the @danielx/civet npm module to transpile the resulting Civet to JavaScript.
See loader-examples/hera-custom
for examples of passing in options to the Hera compiler.
Supported Hera compiler options are documented in Options.
Instead of using --import @danielx/hera/register, register the @danielx/hera/register/esm hooks and pass it the compiler options.
E.g.
// loader.js
const heraOptions = { ... }
register("@danielx/hera/register/esm", pathToFileURL(__filename), { data: heraOptions })node --import ./loader.js my-script.mjsInstead of using --require @danielx/hera/register, require and set options on the @danielx/hera/register/cjs module.
E.g.
require("@danielx/hera/register/cjs").options.hera = { inlineMap: false }
const { parse } = require("./my-typed-grammar.hera")See loader-examples/hera-custom/register.js
for an example of simultaneous registration of both ESM and CJS hooks with options.
See loader-examples/tsc-custom
or loader-examples/civet-custom
for examples of passing in options to the TypeScript or Civet compiler.
Instead of using --import @danielx/hera/register/tsc or --import @danielx/hera/register/civet, register the register/esm and register/tsc/esm or register/civet/esm modules yourself and pass the options you want.
(Note that you still need to register the base register/esm module, because the ESM loaders chain together.)
E.g.
// custom-loader.js
const { register } = require("node:module")
const { pathToFileURL } = require("node:url")
register("@danielx/hera/register/esm", pathToFileURL(__filename), {
data: heraOptions
})
register("@danielx/hera/register/tsc/esm", pathToFileURL(__filename), {
data: tscCompilerOptions,
})
// OR
register("@danielx/hera/register/civet/esm", pathToFileURL(__filename), {
data: civetCompilerOptions,
})node --import ./custom-loader.js ./my-script.mjsInstead of using --require @danielx/hera/register/tsc or --require @danielx/hera/register/civet, require and set options on the @danielx/hera/register/tsc/cjs or @danielx/hera/register/civet/cjs module.
E.g.
const loader = require("@danielx/hera/register/tsc/cjs")
loader.options.hera = heraOptions
loader.options.tsc = tscCompilerOptions
const { parse } = require("./typed-grammar.hera")const loader = require("@danielx/hera/register/civet/cjs")
loader.options.hera = heraOptions
loader.options.civet = civetCompilerOptions
const { parse } = require("./civet-grammar.hera")See loader-examples/tsc-custom/register.js
and loader-examples/civet-custom/register.js
for examples of simultaneous registration of both ESM and CJS hooks with options.
Use heraPlugin() to import .hera files in esbuild builds.
If handler bodies or code blocks use TypeScript or Civet, set the plugin language explicitly:
heraPlugin({ language: "typescript" })
heraPlugin({ language: "civet" })The plugin chooses the appropriate esbuild loader automatically,
though you can override it with the loader option.
The VS Code extension and language server support hera.language to control
how handler bodies are interpreted.
In VS Code settings:
{
"hera.language": "civet"
}Or in your workspace package.json:
{
"hera": {
"language": "civet"
}
}See lsp/README.md.
ESM:
import { compile } from "@danielx/hera"CJS:
const { compile } = require("@danielx/hera")Compiles a Hera grammar into parser source code.
- Pass in raw grammar source as a String.
- Returns a code string by default.
- Returns
{ code, sourceMap }whenoptions.sourceMapistrue.
compile() accepts these options:
language?: "javascript" | "typescript" | "civet": Language used for handler bodies and code blocks. Default:"javascript".types?: boolean: Generate TypeScript-flavored parser output. Default:falsewhenlanguageis unspecified or"javascript", andtruewhenlanguageis"typescript"or"civet".module?: boolean: Emit ESM instead of CJS output. Default:false.filename?: string: Filename used in generated source maps. Default:"anonymous".source?: string: Original grammar source text. Required wheninlineMaporsourceMapis enabled.inlineMap?: boolean: Append an inline source map comment to the generated output. Default:false.sourceMap?: boolean: Return{ code, sourceMap }instead of a code string. Default:false.libPath?: string: Import path for the Hera runtime library used by generated parsers. Default:"@danielx/hera/lib".