This project implements a Pratt Parser in Java, an efficient parsing technique for mathematical expressions with support for operator precedence and associativity. Inspired by the article Simple but powerful Pratt parsing, this parser was built from scratch using an object-oriented architecture with good design practices. The parser output follows the Lisp-style S-expressions notation.
The repository contains two PDF documents that explain in detail the code flow of both the lexer and the parser.
prat-parser/
├── src/
│ ├── main/
│ │ ├── java/
│ │ │ └── parser/
│ │ │ └── prat_parser/
│ │ │ ├── App.java
│ │ │ ├── PrattParser.java
│ │ │ ├── lexer/
│ │ │ │ ├── Lexer.java
│ │ │ │ ├── LexerFactory.java
│ │ │ │ ├── LexerState.java
│ │ │ │ └── Rule.java
│ │ │ └── model/
│ │ │ ├── Atom.java
│ │ │ ├── Cons.java
│ │ │ ├── Expression.java
│ │ │ ├── Token.java
│ │ │ └── TokenType.java
│ └── main/
│ └── resources/
│
└── src/
└── test/
└── java/
└── parser/
└── prat_parser/
├── LexerTest.java
└── PrattParserTest.java
The parser is composed of two main stages:
- Lexical Analysis (Lexer): Converts the input string into a sequence of tokens.
- Parsing (Parser): Builds an abstract syntax tree (AST) using the operator precedence technique (Pratt Parsing).
The core of the parser is in the PrattParser.java class, which analyzes tokens based on their prefix, infix, and postfix precedence levels.
This project followed the SOLID principles whenever possible:
Each class has a single responsibility:
Lexer: tokenizes the input.Rule: defines how to recognize and produce tokens.PrattParser: performs syntactic analysis.Expression,Atom,Cons: represent the abstract syntax tree (AST).
The parser can be extended with new types of expressions or tokens without modifying existing code — simply add new rules (Rule) or expressions (Expression).
Subclasses of Expression (such as Atom and Cons) can be used interchangeably without breaking expected behavior.
Interfaces like LexerState are cohesive and specific, avoiding forcing implementations to depend on methods they do not use.
Classes depend on abstractions, such as LexerState and Expression, promoting low coupling between components.
The project includes two test classes (in src/test/java/parser/prat_parser) that validate both tokenization and AST construction:
LexerTest.java– Validates the behavior of the lexical analyzer.PrattParserTest.java– Verifies parsing of basic, intermediate, and advanced expressions. The tests directly compare the structure of the syntax tree (AST) usingAtomandConsobjects, ensuring greater robustness and precision than string-based comparisons.
The tests follow the JUnit standard and can be executed with:
mvn testLet’s use a simple expression to illustrate how the Pratt Parser builds an Abstract Syntax Tree (AST). Consider the expression:
1 + 2 * 3
In a Pratt Parser, analysis takes operator precedence into account. We know that multiplication (*) has higher precedence than addition (+). Therefore, the expression is interpreted as:
1 + (2 * 3)
The corresponding AST would be:
[+]
/ \
[1] [*]
/ \
[2] [3]
+: The root node of the AST, representing the addition operation.1: The left operand of the addition.*: The right operand of the addition, representing the multiplication operation.2and3: The operands of the multiplication.
The Pratt Parser builds this tree while respecting operator precedence, ensuring that multiplication is evaluated before addition.
The current parser recognizes and evaluates expressions with:
- Numeric literals and variables (
1,x) - Infix operators (
+,-,*,/,.,=,? :) - Prefix operators (
-x,+x) - Postfix operators (
!,[ ]) - Nested parentheses
- Support for expressions with functions and chained calls
- Validation and error handling with more descriptive messages
- Support for parsing boolean and logical expressions
This project was heavily inspired by the work of matklad, and adapts the same idea to the object-oriented paradigm in Java.