|
| 1 | +# `sqlengine` — Interactive SQL Engine CLI |
| 2 | + |
| 3 | +`sqlengine` is the primary interactive entry point to the SQL engine. It is the binary you reach for when you want to **try the engine** — evaluate an expression, run a query against a real backend, exercise sharding, or step through a 2PC transaction by hand. |
| 4 | + |
| 5 | +It is the most useful tool in the repo for demos, blog post screencasts, smoke tests after a build, and answering "wait, does X actually work end-to-end?". |
| 6 | + |
| 7 | +> Source: `tools/sqlengine.cpp` (one file, ~436 lines). |
| 8 | +> Build: `make build-sqlengine`. Output: `./sqlengine`. |
| 9 | +
|
| 10 | +--- |
| 11 | + |
| 12 | +## 1. The two modes |
| 13 | + |
| 14 | +`sqlengine` has exactly two modes; the mode is decided by whether you pass any `--backend` flag. |
| 15 | + |
| 16 | +| Mode | Trigger | What it does | |
| 17 | +| ---------------- | ---------------------- | ----------------------------------------------------------------------------- | |
| 18 | +| **In-memory** | no `--backend` | Uses `InMemoryCatalog` + `LocalTransactionManager`. Evaluates literal expressions and constant queries. No tables, no remote I/O. | |
| 19 | +| **Backend-connected** | one or more `--backend` | Wires up a `ThreadSafeMultiRemoteExecutor` against the listed MySQL/PostgreSQL backends. Optionally adds a `ShardMap` from `--shard` flags. Real distributed query execution. | |
| 20 | + |
| 21 | +Either mode runs the same REPL. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## 2. Invocation |
| 26 | + |
| 27 | +```text |
| 28 | +./sqlengine [OPTIONS] |
| 29 | +
|
| 30 | +Options: |
| 31 | + --backend URL Add a backend (mysql://... or pgsql://... or postgres://... or postgresql://...) |
| 32 | + --shard SPEC Add a shard config (table:key:shard1,shard2,...) |
| 33 | + --help Show built-in help |
| 34 | +``` |
| 35 | + |
| 36 | +### Backend URL syntax |
| 37 | + |
| 38 | +Parsed by `parse_backend_url` in `src/sql_engine/tool_config_parser.cpp`. Accepted schemes: `mysql`, `pgsql`, `postgres`, `postgresql`. |
| 39 | + |
| 40 | +```text |
| 41 | +mysql://USER[:PASSWORD]@HOST[:PORT]/DATABASE?KEY=VALUE&... |
| 42 | +``` |
| 43 | + |
| 44 | +Required query parameter: `name=` — the logical name used by `--shard` and by the WAL. |
| 45 | + |
| 46 | +Optional query parameters: `ssl_mode`, `ssl_ca`, `ssl_cert`, `ssl_key`. |
| 47 | + |
| 48 | +Example: |
| 49 | + |
| 50 | +```text |
| 51 | +mysql://root:test@127.0.0.1:13306/testdb?name=shard1 |
| 52 | +pgsql://app:secret@db1:5432/orders?name=primary&ssl_mode=REQUIRED&ssl_ca=/etc/ssl/ca.pem |
| 53 | +``` |
| 54 | + |
| 55 | +### Shard spec syntax |
| 56 | + |
| 57 | +```text |
| 58 | +TABLE:SHARD_KEY:BACKEND1,BACKEND2,... |
| 59 | +``` |
| 60 | + |
| 61 | +Backend names refer to the `name=` value from the backend URLs. A table with one backend is unsharded but pinned. Two or more backends turns on hash-based sharding by `SHARD_KEY`. |
| 62 | + |
| 63 | +Example: |
| 64 | + |
| 65 | +```text |
| 66 | +--shard "users:id:shard1,shard2,shard3" |
| 67 | +``` |
| 68 | + |
| 69 | +### Multiple flags |
| 70 | + |
| 71 | +`--backend` and `--shard` are repeatable. Order does not matter — backends are registered first, then shards. |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +## 3. REPL behaviour |
| 76 | + |
| 77 | +`sqlengine` reads SQL from stdin. It auto-detects whether stdin is a TTY: |
| 78 | + |
| 79 | +- **Interactive** (TTY): prints a banner, lists connected backends, prompts with `sql> `, exits on `Ctrl+D`, `quit`, `exit`, or `\q`. |
| 80 | +- **Piped** (not a TTY): silent — reads to EOF, prints results inline. Good for one-shot demos and scripted tests. |
| 81 | + |
| 82 | +### Statement parsing rules |
| 83 | + |
| 84 | +These are not power-user details — they are rough edges to know about: |
| 85 | + |
| 86 | +- **One statement per line.** Multi-line queries are not supported. A trailing `;` is stripped. |
| 87 | +- **Empty lines are skipped.** |
| 88 | +- **Line-leading comments are skipped:** `-- ...` and `/* ...`. Inline comments inside a statement are passed through to the parser. |
| 89 | +- **Quit tokens:** `quit`, `exit`, `\q`. (No `\help`, no `\d`, no other meta-commands.) |
| 90 | + |
| 91 | +### Output format |
| 92 | + |
| 93 | +Queries (SELECT, SHOW, DESCRIBE, EXPLAIN) print a MySQL-style table, plus a row count and elapsed time: |
| 94 | + |
| 95 | +```text |
| 96 | ++----+-----------+ |
| 97 | +| id | name | |
| 98 | ++----+-----------+ |
| 99 | +| 1 | alice | |
| 100 | +| 2 | bob | |
| 101 | ++----+-----------+ |
| 102 | +2 rows in set (0.003 sec) |
| 103 | +``` |
| 104 | + |
| 105 | +DML statements (INSERT, UPDATE, DELETE, BEGIN, COMMIT, …) print one of: |
| 106 | + |
| 107 | +```text |
| 108 | +Query OK, 1 row affected (0.012 sec) |
| 109 | +ERROR: <message> |
| 110 | +``` |
| 111 | + |
| 112 | +Parse errors are reported inline with the message from the parser: |
| 113 | + |
| 114 | +```text |
| 115 | +ERROR: parse error — unexpected token ',' (0.000 sec) |
| 116 | +``` |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +## 4. Two important behaviours that are not in `--help` |
| 121 | + |
| 122 | +### 4.1 Automatic schema discovery from the first backend |
| 123 | + |
| 124 | +When you start in backend-connected mode **with at least one `--shard`**, `sqlengine` queries each sharded table's first backend with `SHOW COLUMNS FROM <table>` and registers the result in the local `InMemoryCatalog`. This is what lets queries against sharded tables type-check and plan. |
| 125 | + |
| 126 | +Caveats — flag these in any demo: |
| 127 | + |
| 128 | +- Discovery uses `SHOW COLUMNS` (MySQL syntax). Against a PostgreSQL backend it will silently fail and the table will not appear in the catalog. |
| 129 | +- Type mapping is intentionally rough: anything containing `int` becomes `INT`, anything containing `decimal` becomes `DECIMAL(10,2)`, anything containing `date` becomes `DATE`, everything else falls back to `VARCHAR(255)`. Fine for demos; not a reflection of a column's true type. |
| 130 | +- Discovery only runs for tables named in a `--shard` flag. Unsharded tables are not auto-registered. (You can still query them via REMOTE_SCAN passthrough; they just won't have catalog metadata locally.) |
| 131 | + |
| 132 | +### 4.2 The dialect is hard-coded to MySQL |
| 133 | + |
| 134 | +The session is `Session<Dialect::MySQL>`. The MySQL keyword tables, `||` semantics, LIKE rules, and 0-vs-1-based array indexing apply regardless of which backend you are talking to. |
| 135 | + |
| 136 | +This means: |
| 137 | + |
| 138 | +- You can connect to a PostgreSQL backend and queries will be sent to it, but they will be *parsed and rewritten* with MySQL grammar first. PostgreSQL-specific syntax (`PREPARE TRANSACTION`, `RETURNING`, `::` casts, `'string' || 'string'` for concat in some configurations) may not parse. |
| 139 | +- Cross-dialect setups in `--backend` are technically allowed but the practical sweet spot today is MySQL backends. |
| 140 | +- There is no `--dialect` flag yet. |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +## 5. What you can actually do — recipe book |
| 145 | + |
| 146 | +Each recipe is meant to be runnable as-is. Replace ports / hosts / credentials as needed. |
| 147 | + |
| 148 | +### 5.1 In-memory expression evaluation (no backends) |
| 149 | + |
| 150 | +```bash |
| 151 | +echo "SELECT 1 + 2, UPPER('hello'), COALESCE(NULL, 42)" | ./sqlengine |
| 152 | +``` |
| 153 | + |
| 154 | +Demonstrates: parser, expression evaluator, function registry, three-valued NULL logic. Zero infrastructure. |
| 155 | + |
| 156 | +### 5.2 Interactive REPL, in-memory |
| 157 | + |
| 158 | +```bash |
| 159 | +./sqlengine |
| 160 | +``` |
| 161 | + |
| 162 | +```text |
| 163 | +sql> SELECT 1 + 2 AS x, UPPER('hi') |
| 164 | +sql> SELECT CASE WHEN 1 < 2 THEN 'yes' ELSE 'no' END |
| 165 | +sql> SELECT NOW(), CURRENT_DATE |
| 166 | +sql> \q |
| 167 | +``` |
| 168 | + |
| 169 | +### 5.3 Single backend (passthrough) |
| 170 | + |
| 171 | +```bash |
| 172 | +./sqlengine \ |
| 173 | + --backend "mysql://root:test@127.0.0.1:13306/testdb?name=shard1" |
| 174 | +``` |
| 175 | + |
| 176 | +Then in the REPL: |
| 177 | + |
| 178 | +```text |
| 179 | +sql> SELECT 1 + 1 |
| 180 | +sql> SELECT version() |
| 181 | +``` |
| 182 | + |
| 183 | +Useful smoke test that the executor and connection pool can talk to a real backend. |
| 184 | + |
| 185 | +### 5.4 Sharded query with scatter/gather |
| 186 | + |
| 187 | +Two backends, one sharded table: |
| 188 | + |
| 189 | +```bash |
| 190 | +./sqlengine \ |
| 191 | + --backend "mysql://root:test@127.0.0.1:13306/testdb?name=shard1" \ |
| 192 | + --backend "mysql://root:test@127.0.0.1:13307/testdb?name=shard2" \ |
| 193 | + --shard "users:id:shard1,shard2" |
| 194 | +``` |
| 195 | + |
| 196 | +```text |
| 197 | +sql> SELECT id, name FROM users WHERE id = 42 -- single-shard route |
| 198 | +sql> SELECT COUNT(*) FROM users -- scatter + MERGE_AGGREGATE |
| 199 | +sql> SELECT name FROM users ORDER BY id LIMIT 10 -- scatter + MERGE_SORT |
| 200 | +``` |
| 201 | + |
| 202 | +### 5.5 Cross-shard JOIN |
| 203 | + |
| 204 | +With two sharded tables on the same backends: |
| 205 | + |
| 206 | +```bash |
| 207 | +./sqlengine \ |
| 208 | + --backend "mysql://...@shard1...?name=shard1" \ |
| 209 | + --backend "mysql://...@shard2...?name=shard2" \ |
| 210 | + --shard "users:id:shard1,shard2" \ |
| 211 | + --shard "orders:user_id:shard1,shard2" |
| 212 | +``` |
| 213 | + |
| 214 | +```text |
| 215 | +sql> SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name |
| 216 | +``` |
| 217 | + |
| 218 | +The planner emits scatter scans, builds a hash table on one side via `HashJoinOperator`, and aggregates with `MERGE_AGGREGATE`. |
| 219 | + |
| 220 | +### 5.6 SSL/TLS to a backend |
| 221 | + |
| 222 | +```bash |
| 223 | +./sqlengine --backend "mysql://app:secret@db1:3306/orders?name=primary&ssl_mode=REQUIRED&ssl_ca=/etc/ssl/ca.pem&ssl_cert=/etc/ssl/client.crt&ssl_key=/etc/ssl/client.key" |
| 224 | +``` |
| 225 | + |
| 226 | +### 5.7 Local transaction (single backend) |
| 227 | + |
| 228 | +```text |
| 229 | +sql> BEGIN |
| 230 | +sql> INSERT INTO t VALUES (1) |
| 231 | +sql> SELECT * FROM t |
| 232 | +sql> ROLLBACK |
| 233 | +sql> SELECT * FROM t -- empty |
| 234 | +``` |
| 235 | + |
| 236 | +> Note: `sqlengine` instantiates a `LocalTransactionManager`, not `SingleBackendTransactionManager` or `DistributedTransactionManager`. So today, transaction *semantics* in `sqlengine` follow the local manager — useful for exercising `BEGIN/COMMIT/ROLLBACK/SAVEPOINT` against in-memory data, but **not** the right tool for a 2PC demo. See §6. |
| 237 | +
|
| 238 | +--- |
| 239 | + |
| 240 | +## 6. What `sqlengine` does **not** do today |
| 241 | + |
| 242 | +These are real gaps to know before you film a demo or ship a deck: |
| 243 | + |
| 244 | +- **No 2PC demos out of the box.** The transaction manager is `LocalTransactionManager`. To exercise the `DistributedTransactionManager` end-to-end you need a small custom harness, or `bench_distributed`, or the integration tests under `tests/test_distributed_real.cpp` and `tests/test_distributed_txn.cpp`. |
| 245 | +- **No multi-line statements.** Each statement must fit on one line. |
| 246 | +- **No `\` meta-commands** beyond `\q`. No `\d`, no `\h`, no `\timing` toggle (timing is always on). |
| 247 | +- **No `--dialect` flag.** Always parses as MySQL. |
| 248 | +- **No persistent history file.** Use `rlwrap ./sqlengine` if you want readline-style history and editing. |
| 249 | +- **Schema discovery is MySQL-only and intentionally lossy.** See §4.1. |
| 250 | +- **No prepared statements over the wire** — the prepared-statement *cache* is on the parser side, but `EXECUTE`/`DEALLOCATE` are Tier 2 extracted statements and not executed against backends. |
| 251 | + |
| 252 | +--- |
| 253 | + |
| 254 | +## 7. Companion tools (for context) |
| 255 | + |
| 256 | +`sqlengine` is the interactive front-end. Other tools in `tools/` cover paths it doesn't: |
| 257 | + |
| 258 | +| Tool | Source | When to reach for it | |
| 259 | +| --------------------- | ----------------------------------- | --------------------------------------------------------------------------- | |
| 260 | +| `mysql_server` | `tools/mysql_server.cpp` | A MySQL wire-protocol server fronted by the engine. Connect any MySQL client (`mysql` CLI, your app) and the engine handles the query. | |
| 261 | +| `bench_distributed` | `tools/bench_distributed.cpp` | Throughput / latency benchmarking of distributed queries. Pipeline breakdown. | |
| 262 | +| `engine_stress_test` | `tools/engine_stress_test.cpp` | Multi-threaded direct-API stress harness. No client protocol overhead. | |
| 263 | +| `corpus_test` | `tests/corpus_test.cpp` | Parse SQL from stdin/files and report OK/PARTIAL/ERROR. Used for the 86K corpus run. | |
| 264 | + |
| 265 | +All four share the same backend / shard configuration syntax via `tool_config_parser` (in the working tree). |
| 266 | + |
| 267 | +--- |
| 268 | + |
| 269 | +## 8. Where to look in the source |
| 270 | + |
| 271 | +- `tools/sqlengine.cpp` — the whole tool, top to bottom. Worth reading once. |
| 272 | +- `include/sql_engine/session.h` — the `Session<D>` class that ties parser + plan + optimize + distribute + execute together. `sqlengine` is a thin REPL on top of this. |
| 273 | +- `include/sql_engine/multi_remote_executor.h`, `connection_pool.h` — the backend connection layer. |
| 274 | +- `include/sql_engine/tool_config_parser.h` — the URL / shard parsing (working tree). |
| 275 | +- `include/sql_engine/in_memory_catalog.h`, `catalog.h` — the catalog into which `sqlengine` registers auto-discovered schemas. |
| 276 | + |
| 277 | +--- |
| 278 | + |
| 279 | +## 9. Suggested "first 10 minutes" path |
| 280 | + |
| 281 | +If a new contributor or a viewer asks "show me what this thing does", in this order: |
| 282 | + |
| 283 | +1. `make build-sqlengine` |
| 284 | +2. `echo "SELECT 1 + 2, UPPER('hi')" | ./sqlengine` — proves the engine works in 5 seconds with no setup. |
| 285 | +3. `./sqlengine` — interactive REPL, run a `CASE WHEN`, a `COALESCE`, a `NOW()`. |
| 286 | +4. Spin up one MySQL backend; run §5.3 — proves real backend integration. |
| 287 | +5. Spin up two MySQL backends; run §5.4 — the *distributed query* moment, where the project's value becomes visible. |
| 288 | + |
| 289 | +Steps 1–3 cost nothing and already make a watchable demo. Steps 4–5 are the headline. |
0 commit comments