Skip to content

rgiovann/pratt_parser

Repository files navigation

Pratt Parser in Java

This project implements a Pratt Parser in Java, an efficient parsing technique for mathematical expressions with support for operator precedence and associativity. Inspired by the article Simple but powerful Pratt parsing, this parser was built from scratch using an object-oriented architecture with good design practices. The parser output follows the Lisp-style S-expressions notation.

The repository contains two PDF documents that explain in detail the code flow of both the lexer and the parser.

Directory Structure

prat-parser/
├── src/
│   ├── main/
│   │   ├── java/
│   │   │   └── parser/
│   │   │       └── prat_parser/
│   │   │           ├── App.java
│   │   │           ├── PrattParser.java
│   │   │           ├── lexer/
│   │   │           │   ├── Lexer.java
│   │   │           │   ├── LexerFactory.java
│   │   │           │   ├── LexerState.java
│   │   │           │   └── Rule.java
│   │   │           └── model/
│   │   │               ├── Atom.java
│   │   │               ├── Cons.java
│   │   │               ├── Expression.java
│   │   │               ├── Token.java
│   │   │               └── TokenType.java
│   └── main/
│       └── resources/
│
└── src/
    └── test/
        └── java/
            └── parser/
                └── prat_parser/
                    ├── LexerTest.java
                    └── PrattParserTest.java

🔍 Overview

The parser is composed of two main stages:

  • Lexical Analysis (Lexer): Converts the input string into a sequence of tokens.
  • Parsing (Parser): Builds an abstract syntax tree (AST) using the operator precedence technique (Pratt Parsing).

The core of the parser is in the PrattParser.java class, which analyzes tokens based on their prefix, infix, and postfix precedence levels.

✅ Applied SOLID Principles

This project followed the SOLID principles whenever possible:

🔹 SRP (Single Responsibility Principle)

Each class has a single responsibility:

  • Lexer: tokenizes the input.
  • Rule: defines how to recognize and produce tokens.
  • PrattParser: performs syntactic analysis.
  • Expression, Atom, Cons: represent the abstract syntax tree (AST).

🔹 OCP (Open/Closed Principle)

The parser can be extended with new types of expressions or tokens without modifying existing code — simply add new rules (Rule) or expressions (Expression).

🔹 LSP (Liskov Substitution Principle)

Subclasses of Expression (such as Atom and Cons) can be used interchangeably without breaking expected behavior.

🔹 ISP (Interface Segregation Principle)

Interfaces like LexerState are cohesive and specific, avoiding forcing implementations to depend on methods they do not use.

🔹 DIP (Dependency Inversion Principle)

Classes depend on abstractions, such as LexerState and Expression, promoting low coupling between components.

🧪 Tests

The project includes two test classes (in src/test/java/parser/prat_parser) that validate both tokenization and AST construction:

  • LexerTest.java – Validates the behavior of the lexical analyzer.
  • PrattParserTest.java – Verifies parsing of basic, intermediate, and advanced expressions. The tests directly compare the structure of the syntax tree (AST) using Atom and Cons objects, ensuring greater robustness and precision than string-based comparisons.

The tests follow the JUnit standard and can be executed with:

mvn test

AST Construction

Let’s use a simple expression to illustrate how the Pratt Parser builds an Abstract Syntax Tree (AST). Consider the expression:

1 + 2 * 3

In a Pratt Parser, analysis takes operator precedence into account. We know that multiplication (*) has higher precedence than addition (+). Therefore, the expression is interpreted as:

1 + (2 * 3)

The corresponding AST would be:

      [+]
     /   \
   [1]   [*]
         / \
       [2] [3]
  • +: The root node of the AST, representing the addition operation.
  • 1: The left operand of the addition.
  • *: The right operand of the addition, representing the multiplication operation.
  • 2 and 3: The operands of the multiplication.

The Pratt Parser builds this tree while respecting operator precedence, ensuring that multiplication is evaluated before addition.

🛠️ Current Features

The current parser recognizes and evaluates expressions with:

  • Numeric literals and variables (1, x)
  • Infix operators (+, -, *, /, ., =, ? :)
  • Prefix operators (-x, +x)
  • Postfix operators (!, [ ])
  • Nested parentheses

📌 Development Roadmap

  • Support for expressions with functions and chained calls
  • Validation and error handling with more descriptive messages
  • Support for parsing boolean and logical expressions

📖 Technical Reference

This project was heavily inspired by the work of matklad, and adapts the same idea to the object-oriented paradigm in Java.

About

A Java implementation of a Pratt parser for parsing mathematical expressions with proper operator precedence and associativity.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages