You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+88-7Lines changed: 88 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,12 +4,12 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
4
4
5
5
## Project Overview
6
6
7
-
High-performance hand-written recursive descent SQL parser for ProxySQL. Supports MySQL and PostgreSQL dialects via compile-time templating (`Parser<Dialect::MySQL>` / `Parser<Dialect::PostgreSQL>`). Designed for sub-microsecond latency on the proxy hot path.
7
+
High-performance hand-written recursive descent SQL parser and query engine for ProxySQL. Supports MySQL and PostgreSQL dialects via compile-time templating (`Parser<Dialect::MySQL>` / `Parser<Dialect::PostgreSQL>`). Designed for sub-microsecond latency on the proxy hot path. The query engine takes parsed ASTs and executes them through a Volcano-model operator pipeline.
8
8
9
9
## Build Commands
10
10
11
11
```bash
12
-
make -f Makefile.new all # Build library + run all 430 tests
12
+
make -f Makefile.new all # Build library + run all 871 tests
13
13
make -f Makefile.new lib # Build only libsqlparser.a
14
14
make -f Makefile.new test# Build + run tests
15
15
make -f Makefile.new bench # Build + run benchmarks
**Note:** The old `Makefile` (no `.new`) is for the legacy Flex/Bison parser — do not use it for new code.
24
24
25
-
## Architecture
25
+
## Parser Architecture
26
26
27
27
### Three-layer pipeline
28
28
@@ -68,18 +68,99 @@ Everything is in `namespace sql_parser`. All templates are parameterized on `Dia
68
68
- Digest mode (`EmitMode::DIGEST`): literals→`?`, IN collapsing, keyword uppercasing
69
69
- Bindings mode: materializes `?` placeholders with bound parameter values
70
70
71
-
### Tests
71
+
##Query Engine
72
72
73
-
Google Test. 430 tests across 16 test files. Validated against 86K+ external queries (PostgreSQL regression, MySQL MTR, CockroachDB, Vitess, TiDB, sqlparser-rs, SQLGlot).
73
+
### Architecture
74
+
75
+
The engine follows a five-component pipeline:
76
+
77
+
1.**Type System** (`types.h`, `value.h`) — `SqlType` describes column types (30+ kinds). `Value` is a 14-tag discriminated union for runtime values (null, bool, int64, uint64, double, decimal, string, bytes, date, time, datetime, timestamp, interval, json).
78
+
2.**Expression Evaluator** (`expression_eval.h`) — Recursively evaluates AST expression nodes against a row. Handles arithmetic, comparisons, boolean logic (three-valued), BETWEEN, IN, LIKE, CASE/WHEN, function calls. Uses `CoercionRules<D>` for type promotion and `null_semantics` for NULL propagation.
79
+
3.**Catalog** (`catalog.h`, `in_memory_catalog.h`) — Abstract interface for table/column metadata. `InMemoryCatalog` is the hash-map implementation. `CatalogResolver` creates column-resolve callbacks from catalog + table + row.
80
+
4.**Plan Builder** (`plan_builder.h`) — Translates a SELECT AST into a `PlanNode` tree. Translation order: FROM (Scan/Join) → WHERE (Filter) → GROUP BY (Aggregate) → HAVING (Filter) → SELECT list (Project) → DISTINCT → ORDER BY (Sort) → LIMIT.
81
+
5.**Executor** (`plan_executor.h`) — Converts a `PlanNode` tree into an `Operator` tree (Volcano model: open/next/close). Pulls rows through the tree and collects them into a `ResultSet`.
82
+
83
+
### Key types
84
+
85
+
-`Value` — 14-tag discriminated union. Constructors: `value_null()`, `value_int(i)`, `value_string(s)`, etc.
86
+
-`SqlType` — Column type descriptor with `Kind` enum, precision, scale, unsigned flag, timezone flag.
87
+
-`Row` — `{Value* values, uint16_t column_count}`. Arena-allocated via `make_row()`.
2. Add a new `PlanNodeType` enum value in `plan_node.h` and a corresponding union member in `PlanNode`
96
+
3. Add `build_xxx()` in `PlanExecutor` (in `plan_executor.h`) and wire it into `build_operator()` switch
97
+
4. Add translation logic in `PlanBuilder::build_select()` (in `plan_builder.h`)
98
+
5. Include the new header in `plan_executor.h`
99
+
6. Write tests in `tests/test_operators.cpp` or a new test file
100
+
101
+
### How to add a new SQL function
102
+
103
+
1. Write the function in the appropriate file under `include/sql_engine/functions/` (arithmetic.h, string.h, comparison.h, or cast.h). Signature: `Value fn(const Value* args, uint16_t arg_count, Arena& arena)`.
104
+
2. Register it in `src/sql_engine/function_registry.cpp` inside `register_builtins()`:
`TableInfo` must outlive any queries that reference it. `ColumnInfo::ordinal` must match the column position in rows returned by the corresponding `DataSource`.
139
+
140
+
### Engine namespace
141
+
142
+
Everything is in `namespace sql_engine`. Templates are parameterized on `Dialect D` where dialect-specific behavior applies (coercion rules, `||` semantics, LIKE matching).
143
+
144
+
## Tests
145
+
146
+
Google Test. 871 tests across 30 test files. Validated against 86K+ external queries (PostgreSQL regression, MySQL MTR, CockroachDB, Vitess, TiDB, sqlparser-rs, SQLGlot).
74
147
75
148
Run a single test: `./run_tests --gtest_filter="*SetTest*"`
A high-performance, hand-written recursive descent SQL parser for [ProxySQL](https://github.com/sysown/proxysql). Supports both MySQL and PostgreSQL dialects with compile-time dispatch — zero runtime overhead for dialect selection.
3
+
A high-performance, hand-written recursive descent SQL parser and composable query engine for [ProxySQL](https://github.com/sysown/proxysql). Supports both MySQL and PostgreSQL dialects with compile-time dispatch — zero runtime overhead for dialect selection. The parser produces an AST that feeds directly into the query engine's plan builder and executor pipeline.
4
4
5
5
## Performance
6
6
7
-
All operations run in sub-microsecond latency on modern hardware:
7
+
All parser operations run in sub-microsecond latency on modern hardware:
8
8
9
9
| Operation | Latency | Notes |
10
10
|---|---|---|
@@ -26,19 +26,10 @@ Compared to other parsers on the same queries:
26
26
27
27
See [docs/benchmarks/](docs/benchmarks/) for full results and [REPRODUCING.md](docs/benchmarks/REPRODUCING.md) for reproduction instructions.
// row.get(0) = name (Value), row.get(1) = age (Value)
113
+
}
114
+
115
+
parser.reset();
90
116
```
91
117
92
-
Requires: `g++` or `clang++` with C++17 support. No external dependencies for the parser itself. Google Test and Google Benchmark are vendored in `third_party/`.
Requires: `g++` or `clang++` with C++17 support. No external dependencies for the parser itself. Google Test and Google Benchmark are vendored in `third_party/`.
0 commit comments