Pratt Parser in Java

This project implements a Pratt Parser in Java, an efficient parsing technique for mathematical expressions with support for operator precedence and associativity. Inspired by the article Simple but powerful Pratt parsing, this parser was built from scratch using an object-oriented architecture with good design practices. The parser output follows the Lisp-style S-expressions notation.

The repository contains two PDF documents that explain in detail the code flow of both the lexer and the parser.

Directory Structure

prat-parser/
├── src/
│   ├── main/
│   │   ├── java/
│   │   │   └── parser/
│   │   │       └── prat_parser/
│   │   │           ├── App.java
│   │   │           ├── PrattParser.java
│   │   │           ├── lexer/
│   │   │           │   ├── Lexer.java
│   │   │           │   ├── LexerFactory.java
│   │   │           │   ├── LexerState.java
│   │   │           │   └── Rule.java
│   │   │           └── model/
│   │   │               ├── Atom.java
│   │   │               ├── Cons.java
│   │   │               ├── Expression.java
│   │   │               ├── Token.java
│   │   │               └── TokenType.java
│   └── main/
│       └── resources/
│
└── src/
    └── test/
        └── java/
            └── parser/
                └── prat_parser/
                    ├── LexerTest.java
                    └── PrattParserTest.java

🔍 Overview

The parser is composed of two main stages:

Lexical Analysis (Lexer): Converts the input string into a sequence of tokens.
Parsing (Parser): Builds an abstract syntax tree (AST) using the operator precedence technique (Pratt Parsing).

The core of the parser is in the PrattParser.java class, which analyzes tokens based on their prefix, infix, and postfix precedence levels.

✅ Applied SOLID Principles

This project followed the SOLID principles whenever possible:

🔹 SRP (Single Responsibility Principle)

Each class has a single responsibility:

Lexer: tokenizes the input.
Rule: defines how to recognize and produce tokens.
PrattParser: performs syntactic analysis.
Expression, Atom, Cons: represent the abstract syntax tree (AST).

🔹 OCP (Open/Closed Principle)

The parser can be extended with new types of expressions or tokens without modifying existing code — simply add new rules (Rule) or expressions (Expression).

🔹 LSP (Liskov Substitution Principle)

Subclasses of Expression (such as Atom and Cons) can be used interchangeably without breaking expected behavior.

🔹 ISP (Interface Segregation Principle)

Interfaces like LexerState are cohesive and specific, avoiding forcing implementations to depend on methods they do not use.

🔹 DIP (Dependency Inversion Principle)

Classes depend on abstractions, such as LexerState and Expression, promoting low coupling between components.

🧪 Tests

The project includes two test classes (in src/test/java/parser/prat_parser) that validate both tokenization and AST construction:

LexerTest.java – Validates the behavior of the lexical analyzer.
PrattParserTest.java – Verifies parsing of basic, intermediate, and advanced expressions. The tests directly compare the structure of the syntax tree (AST) using Atom and Cons objects, ensuring greater robustness and precision than string-based comparisons.

The tests follow the JUnit standard and can be executed with:

mvn test

AST Construction

Let’s use a simple expression to illustrate how the Pratt Parser builds an Abstract Syntax Tree (AST). Consider the expression:

1 + 2 * 3

In a Pratt Parser, analysis takes operator precedence into account. We know that multiplication (*) has higher precedence than addition (+). Therefore, the expression is interpreted as:

1 + (2 * 3)

The corresponding AST would be:

      [+]
     /   \
   [1]   [*]
         / \
       [2] [3]

+: The root node of the AST, representing the addition operation.
1: The left operand of the addition.
*: The right operand of the addition, representing the multiplication operation.
2 and 3: The operands of the multiplication.

The Pratt Parser builds this tree while respecting operator precedence, ensuring that multiplication is evaluated before addition.

🛠️ Current Features

The current parser recognizes and evaluates expressions with:

Numeric literals and variables (1, x)
Infix operators (+, -, *, /, ., =, ? :)
Prefix operators (-x, +x)
Postfix operators (!, [ ])
Nested parentheses

📌 Development Roadmap

Support for expressions with functions and chained calls
Validation and error handling with more descriptive messages
Support for parsing boolean and logical expressions

📖 Technical Reference

This project was heavily inspired by the work of matklad, and adapts the same idea to the object-oriented paradigm in Java.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.settings		.settings
src		src
.classpath		.classpath
.gitignore		.gitignore
.project		.project
README.md		README.md
entendendo_funcionamento_do_lexer.pdf		entendendo_funcionamento_do_lexer.pdf
entendendo_o_funcionamento_do_parser.pdf		entendendo_o_funcionamento_do_parser.pdf
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pratt Parser in Java

Directory Structure

🔍 Overview

✅ Applied SOLID Principles

🔹 SRP (Single Responsibility Principle)

🔹 OCP (Open/Closed Principle)

🔹 LSP (Liskov Substitution Principle)

🔹 ISP (Interface Segregation Principle)

🔹 DIP (Dependency Inversion Principle)

🧪 Tests

AST Construction

🛠️ Current Features

📌 Development Roadmap

📖 Technical Reference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pratt Parser in Java

Directory Structure

🔍 Overview

✅ Applied SOLID Principles

🔹 SRP (Single Responsibility Principle)

🔹 OCP (Open/Closed Principle)

🔹 LSP (Liskov Substitution Principle)

🔹 ISP (Interface Segregation Principle)

🔹 DIP (Dependency Inversion Principle)

🧪 Tests

AST Construction

🛠️ Current Features

📌 Development Roadmap

📖 Technical Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages