Skip to content

Commit c26adcc

Browse files
committed
docs: add sqlengine reference and cross-link from architecture doc
The architecture overview only mentioned sqlengine in passing, which left readers (including the maintainer) without a real picture of the project's primary interactive entry point. This adds a focused reference covering: the two modes (in-memory / backend-connected), exact CLI syntax, REPL behaviour, the two undocumented behaviours (MySQL-only schema discovery, hard-coded MySQL dialect), seven copy-paste recipes, the honest gap list (2PC isn't demoable here today), companion-tool table, and a "first 10 minutes" demo script.
1 parent 7f11af5 commit c26adcc

2 files changed

Lines changed: 291 additions & 1 deletion

File tree

docs/architecture-and-status.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ This repository is a **production-targeted SQL parsing and distributed query eng
1818
| **Query engine** | Functional, growing | ~30 | 15 operators, 13 value tags, ~50 builtins, 3 optimizer rules. Volcano model. |
1919
| **Distributed layer** | Functional | ~3 | Sharding, scatter/gather, merge-aggregate, merge-sort, distributed DML, SSL, pooling. |
2020
| **Transaction layer** | Functional | ~3 | 2PC for MySQL (XA) and PostgreSQL (PREPARE TRANSACTION); durable WAL + auto-compaction; recovery. |
21-
| **Tools / wire server** | Functional | n/a | `sqlengine`, `mysql_server`, `bench_distributed`, `engine_stress_test`, `corpus_test`. |
21+
| **Tools / wire server** | Functional | n/a | `sqlengine` (see [docs/sqlengine.md](sqlengine.md)), `mysql_server`, `bench_distributed`, `engine_stress_test`, `corpus_test`. |
2222

2323
### What is in flight (uncommitted in the working tree, 2026-04-17)
2424

@@ -660,3 +660,4 @@ The order below reflects the explicit `docs/issues/` priorities and the newest s
660660
- **`docs/superpowers/specs/`** — historical design specs for parser, type system, evaluator, optimizer, executor, distributed planner, transactions, subqueries, backend connections, and the newest compound-value design.
661661
- **`docs/superpowers/plans/`** — historical implementation plans matching those specs, plus the two newest (2026-04-15 distributed-2PC-safe-pinning, 2026-04-16 compound-value-support).
662662
- **`docs/benchmarks/`** — benchmark outputs and reproduction notes.
663+
- **`docs/sqlengine.md`** — reference for the interactive `sqlengine` CLI: modes, flags, REPL behaviour, recipes for in-memory / single-backend / sharded / SSL / cross-shard JOIN, and what it deliberately does *not* do today.

docs/sqlengine.md

Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
# `sqlengine` — Interactive SQL Engine CLI
2+
3+
`sqlengine` is the primary interactive entry point to the SQL engine. It is the binary you reach for when you want to **try the engine** — evaluate an expression, run a query against a real backend, exercise sharding, or step through a 2PC transaction by hand.
4+
5+
It is the most useful tool in the repo for demos, blog post screencasts, smoke tests after a build, and answering "wait, does X actually work end-to-end?".
6+
7+
> Source: `tools/sqlengine.cpp` (one file, ~436 lines).
8+
> Build: `make build-sqlengine`. Output: `./sqlengine`.
9+
10+
---
11+
12+
## 1. The two modes
13+
14+
`sqlengine` has exactly two modes; the mode is decided by whether you pass any `--backend` flag.
15+
16+
| Mode | Trigger | What it does |
17+
| ---------------- | ---------------------- | ----------------------------------------------------------------------------- |
18+
| **In-memory** | no `--backend` | Uses `InMemoryCatalog` + `LocalTransactionManager`. Evaluates literal expressions and constant queries. No tables, no remote I/O. |
19+
| **Backend-connected** | one or more `--backend` | Wires up a `ThreadSafeMultiRemoteExecutor` against the listed MySQL/PostgreSQL backends. Optionally adds a `ShardMap` from `--shard` flags. Real distributed query execution. |
20+
21+
Either mode runs the same REPL.
22+
23+
---
24+
25+
## 2. Invocation
26+
27+
```text
28+
./sqlengine [OPTIONS]
29+
30+
Options:
31+
--backend URL Add a backend (mysql://... or pgsql://... or postgres://... or postgresql://...)
32+
--shard SPEC Add a shard config (table:key:shard1,shard2,...)
33+
--help Show built-in help
34+
```
35+
36+
### Backend URL syntax
37+
38+
Parsed by `parse_backend_url` in `src/sql_engine/tool_config_parser.cpp`. Accepted schemes: `mysql`, `pgsql`, `postgres`, `postgresql`.
39+
40+
```text
41+
mysql://USER[:PASSWORD]@HOST[:PORT]/DATABASE?KEY=VALUE&...
42+
```
43+
44+
Required query parameter: `name=` — the logical name used by `--shard` and by the WAL.
45+
46+
Optional query parameters: `ssl_mode`, `ssl_ca`, `ssl_cert`, `ssl_key`.
47+
48+
Example:
49+
50+
```text
51+
mysql://root:test@127.0.0.1:13306/testdb?name=shard1
52+
pgsql://app:secret@db1:5432/orders?name=primary&ssl_mode=REQUIRED&ssl_ca=/etc/ssl/ca.pem
53+
```
54+
55+
### Shard spec syntax
56+
57+
```text
58+
TABLE:SHARD_KEY:BACKEND1,BACKEND2,...
59+
```
60+
61+
Backend names refer to the `name=` value from the backend URLs. A table with one backend is unsharded but pinned. Two or more backends turns on hash-based sharding by `SHARD_KEY`.
62+
63+
Example:
64+
65+
```text
66+
--shard "users:id:shard1,shard2,shard3"
67+
```
68+
69+
### Multiple flags
70+
71+
`--backend` and `--shard` are repeatable. Order does not matter — backends are registered first, then shards.
72+
73+
---
74+
75+
## 3. REPL behaviour
76+
77+
`sqlengine` reads SQL from stdin. It auto-detects whether stdin is a TTY:
78+
79+
- **Interactive** (TTY): prints a banner, lists connected backends, prompts with `sql> `, exits on `Ctrl+D`, `quit`, `exit`, or `\q`.
80+
- **Piped** (not a TTY): silent — reads to EOF, prints results inline. Good for one-shot demos and scripted tests.
81+
82+
### Statement parsing rules
83+
84+
These are not power-user details — they are rough edges to know about:
85+
86+
- **One statement per line.** Multi-line queries are not supported. A trailing `;` is stripped.
87+
- **Empty lines are skipped.**
88+
- **Line-leading comments are skipped:** `-- ...` and `/* ...`. Inline comments inside a statement are passed through to the parser.
89+
- **Quit tokens:** `quit`, `exit`, `\q`. (No `\help`, no `\d`, no other meta-commands.)
90+
91+
### Output format
92+
93+
Queries (SELECT, SHOW, DESCRIBE, EXPLAIN) print a MySQL-style table, plus a row count and elapsed time:
94+
95+
```text
96+
+----+-----------+
97+
| id | name |
98+
+----+-----------+
99+
| 1 | alice |
100+
| 2 | bob |
101+
+----+-----------+
102+
2 rows in set (0.003 sec)
103+
```
104+
105+
DML statements (INSERT, UPDATE, DELETE, BEGIN, COMMIT, …) print one of:
106+
107+
```text
108+
Query OK, 1 row affected (0.012 sec)
109+
ERROR: <message>
110+
```
111+
112+
Parse errors are reported inline with the message from the parser:
113+
114+
```text
115+
ERROR: parse error — unexpected token ',' (0.000 sec)
116+
```
117+
118+
---
119+
120+
## 4. Two important behaviours that are not in `--help`
121+
122+
### 4.1 Automatic schema discovery from the first backend
123+
124+
When you start in backend-connected mode **with at least one `--shard`**, `sqlengine` queries each sharded table's first backend with `SHOW COLUMNS FROM <table>` and registers the result in the local `InMemoryCatalog`. This is what lets queries against sharded tables type-check and plan.
125+
126+
Caveats — flag these in any demo:
127+
128+
- Discovery uses `SHOW COLUMNS` (MySQL syntax). Against a PostgreSQL backend it will silently fail and the table will not appear in the catalog.
129+
- Type mapping is intentionally rough: anything containing `int` becomes `INT`, anything containing `decimal` becomes `DECIMAL(10,2)`, anything containing `date` becomes `DATE`, everything else falls back to `VARCHAR(255)`. Fine for demos; not a reflection of a column's true type.
130+
- Discovery only runs for tables named in a `--shard` flag. Unsharded tables are not auto-registered. (You can still query them via REMOTE_SCAN passthrough; they just won't have catalog metadata locally.)
131+
132+
### 4.2 The dialect is hard-coded to MySQL
133+
134+
The session is `Session<Dialect::MySQL>`. The MySQL keyword tables, `||` semantics, LIKE rules, and 0-vs-1-based array indexing apply regardless of which backend you are talking to.
135+
136+
This means:
137+
138+
- You can connect to a PostgreSQL backend and queries will be sent to it, but they will be *parsed and rewritten* with MySQL grammar first. PostgreSQL-specific syntax (`PREPARE TRANSACTION`, `RETURNING`, `::` casts, `'string' || 'string'` for concat in some configurations) may not parse.
139+
- Cross-dialect setups in `--backend` are technically allowed but the practical sweet spot today is MySQL backends.
140+
- There is no `--dialect` flag yet.
141+
142+
---
143+
144+
## 5. What you can actually do — recipe book
145+
146+
Each recipe is meant to be runnable as-is. Replace ports / hosts / credentials as needed.
147+
148+
### 5.1 In-memory expression evaluation (no backends)
149+
150+
```bash
151+
echo "SELECT 1 + 2, UPPER('hello'), COALESCE(NULL, 42)" | ./sqlengine
152+
```
153+
154+
Demonstrates: parser, expression evaluator, function registry, three-valued NULL logic. Zero infrastructure.
155+
156+
### 5.2 Interactive REPL, in-memory
157+
158+
```bash
159+
./sqlengine
160+
```
161+
162+
```text
163+
sql> SELECT 1 + 2 AS x, UPPER('hi')
164+
sql> SELECT CASE WHEN 1 < 2 THEN 'yes' ELSE 'no' END
165+
sql> SELECT NOW(), CURRENT_DATE
166+
sql> \q
167+
```
168+
169+
### 5.3 Single backend (passthrough)
170+
171+
```bash
172+
./sqlengine \
173+
--backend "mysql://root:test@127.0.0.1:13306/testdb?name=shard1"
174+
```
175+
176+
Then in the REPL:
177+
178+
```text
179+
sql> SELECT 1 + 1
180+
sql> SELECT version()
181+
```
182+
183+
Useful smoke test that the executor and connection pool can talk to a real backend.
184+
185+
### 5.4 Sharded query with scatter/gather
186+
187+
Two backends, one sharded table:
188+
189+
```bash
190+
./sqlengine \
191+
--backend "mysql://root:test@127.0.0.1:13306/testdb?name=shard1" \
192+
--backend "mysql://root:test@127.0.0.1:13307/testdb?name=shard2" \
193+
--shard "users:id:shard1,shard2"
194+
```
195+
196+
```text
197+
sql> SELECT id, name FROM users WHERE id = 42 -- single-shard route
198+
sql> SELECT COUNT(*) FROM users -- scatter + MERGE_AGGREGATE
199+
sql> SELECT name FROM users ORDER BY id LIMIT 10 -- scatter + MERGE_SORT
200+
```
201+
202+
### 5.5 Cross-shard JOIN
203+
204+
With two sharded tables on the same backends:
205+
206+
```bash
207+
./sqlengine \
208+
--backend "mysql://...@shard1...?name=shard1" \
209+
--backend "mysql://...@shard2...?name=shard2" \
210+
--shard "users:id:shard1,shard2" \
211+
--shard "orders:user_id:shard1,shard2"
212+
```
213+
214+
```text
215+
sql> SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name
216+
```
217+
218+
The planner emits scatter scans, builds a hash table on one side via `HashJoinOperator`, and aggregates with `MERGE_AGGREGATE`.
219+
220+
### 5.6 SSL/TLS to a backend
221+
222+
```bash
223+
./sqlengine --backend "mysql://app:secret@db1:3306/orders?name=primary&ssl_mode=REQUIRED&ssl_ca=/etc/ssl/ca.pem&ssl_cert=/etc/ssl/client.crt&ssl_key=/etc/ssl/client.key"
224+
```
225+
226+
### 5.7 Local transaction (single backend)
227+
228+
```text
229+
sql> BEGIN
230+
sql> INSERT INTO t VALUES (1)
231+
sql> SELECT * FROM t
232+
sql> ROLLBACK
233+
sql> SELECT * FROM t -- empty
234+
```
235+
236+
> Note: `sqlengine` instantiates a `LocalTransactionManager`, not `SingleBackendTransactionManager` or `DistributedTransactionManager`. So today, transaction *semantics* in `sqlengine` follow the local manager — useful for exercising `BEGIN/COMMIT/ROLLBACK/SAVEPOINT` against in-memory data, but **not** the right tool for a 2PC demo. See §6.
237+
238+
---
239+
240+
## 6. What `sqlengine` does **not** do today
241+
242+
These are real gaps to know before you film a demo or ship a deck:
243+
244+
- **No 2PC demos out of the box.** The transaction manager is `LocalTransactionManager`. To exercise the `DistributedTransactionManager` end-to-end you need a small custom harness, or `bench_distributed`, or the integration tests under `tests/test_distributed_real.cpp` and `tests/test_distributed_txn.cpp`.
245+
- **No multi-line statements.** Each statement must fit on one line.
246+
- **No `\` meta-commands** beyond `\q`. No `\d`, no `\h`, no `\timing` toggle (timing is always on).
247+
- **No `--dialect` flag.** Always parses as MySQL.
248+
- **No persistent history file.** Use `rlwrap ./sqlengine` if you want readline-style history and editing.
249+
- **Schema discovery is MySQL-only and intentionally lossy.** See §4.1.
250+
- **No prepared statements over the wire** — the prepared-statement *cache* is on the parser side, but `EXECUTE`/`DEALLOCATE` are Tier 2 extracted statements and not executed against backends.
251+
252+
---
253+
254+
## 7. Companion tools (for context)
255+
256+
`sqlengine` is the interactive front-end. Other tools in `tools/` cover paths it doesn't:
257+
258+
| Tool | Source | When to reach for it |
259+
| --------------------- | ----------------------------------- | --------------------------------------------------------------------------- |
260+
| `mysql_server` | `tools/mysql_server.cpp` | A MySQL wire-protocol server fronted by the engine. Connect any MySQL client (`mysql` CLI, your app) and the engine handles the query. |
261+
| `bench_distributed` | `tools/bench_distributed.cpp` | Throughput / latency benchmarking of distributed queries. Pipeline breakdown. |
262+
| `engine_stress_test` | `tools/engine_stress_test.cpp` | Multi-threaded direct-API stress harness. No client protocol overhead. |
263+
| `corpus_test` | `tests/corpus_test.cpp` | Parse SQL from stdin/files and report OK/PARTIAL/ERROR. Used for the 86K corpus run. |
264+
265+
All four share the same backend / shard configuration syntax via `tool_config_parser` (in the working tree).
266+
267+
---
268+
269+
## 8. Where to look in the source
270+
271+
- `tools/sqlengine.cpp` — the whole tool, top to bottom. Worth reading once.
272+
- `include/sql_engine/session.h` — the `Session<D>` class that ties parser + plan + optimize + distribute + execute together. `sqlengine` is a thin REPL on top of this.
273+
- `include/sql_engine/multi_remote_executor.h`, `connection_pool.h` — the backend connection layer.
274+
- `include/sql_engine/tool_config_parser.h` — the URL / shard parsing (working tree).
275+
- `include/sql_engine/in_memory_catalog.h`, `catalog.h` — the catalog into which `sqlengine` registers auto-discovered schemas.
276+
277+
---
278+
279+
## 9. Suggested "first 10 minutes" path
280+
281+
If a new contributor or a viewer asks "show me what this thing does", in this order:
282+
283+
1. `make build-sqlengine`
284+
2. `echo "SELECT 1 + 2, UPPER('hi')" | ./sqlengine` — proves the engine works in 5 seconds with no setup.
285+
3. `./sqlengine` — interactive REPL, run a `CASE WHEN`, a `COALESCE`, a `NOW()`.
286+
4. Spin up one MySQL backend; run §5.3 — proves real backend integration.
287+
5. Spin up two MySQL backends; run §5.4 — the *distributed query* moment, where the project's value becomes visible.
288+
289+
Steps 1–3 cost nothing and already make a watchable demo. Steps 4–5 are the headline.

0 commit comments

Comments
 (0)