Skip to content

Commit 567d8b2

Browse files
committed
Fix spec review issues: shared TableRefParser, compound query layering, UPDATE AST
- Extract TableRefParser from SelectParser as prerequisite refactoring - CompoundQueryParser as separate layer above SelectParser - UPDATE AST uses single NODE_FROM_CLAUSE with positional disambiguation - Document classifier switch updates and is_alias_start blocklist changes - Enumerate digest-mode emit method behavior changes - DEFAULT VALUES uses two-token approach (TK_DEFAULT + TK_VALUES)
1 parent 149b54e commit 567d8b2

1 file changed

Lines changed: 62 additions & 9 deletions

File tree

docs/superpowers/specs/2026-03-24-tier1-promotions-and-digest-design.md

Lines changed: 62 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,33 @@ Extends the SQL parser with full Tier 1 deep parsing for INSERT, UPDATE, and DEL
1818
- All new parsers follow the established pattern: `XxxParser<D>` header-only template, uses `ExpressionParser<D>`, integrated via `parser.cpp`.
1919
- Emitter extended for all new node types + digest mode.
2020

21+
### Prerequisite Refactoring: Shared Table Reference Parsing
22+
23+
The existing `SelectParser<D>` has `parse_from_clause()`, `parse_table_reference()`, `parse_join()`, and `parse_optional_alias()` as private methods. These are needed by `InsertParser` (for INSERT ... SELECT), `UpdateParser` (for MySQL multi-table UPDATE), and `DeleteParser` (for MySQL multi-table DELETE and PostgreSQL USING).
24+
25+
**Solution:** Extract these methods into a shared `TableRefParser<D>` utility class that takes a `Tokenizer<D>&` and `Arena&`. All parsers (SelectParser, InsertParser, UpdateParser, DeleteParser) instantiate it internally. SelectParser's private methods are replaced with calls to TableRefParser.
26+
27+
```
28+
include/sql_parser/
29+
table_ref_parser.h — shared FROM/JOIN/table reference parsing
30+
```
31+
32+
This refactoring is a prerequisite for Plans 7-9 and should be done as the first task of Plan 7.
33+
34+
### Classifier Updates
35+
36+
The classifier switch in `Parser<D>::classify_and_dispatch()` must be updated:
37+
38+
- `TK_INSERT``parse_insert()` (was `extract_insert()`)
39+
- `TK_UPDATE``parse_update()` (was `extract_update()`)
40+
- `TK_DELETE``parse_delete()` (was `extract_delete()`)
41+
- `TK_REPLACE``parse_insert()` with a REPLACE flag (was `extract_replace()`)
42+
- `TK_SELECT` → compound query aware `parse_select()` (existing, updated)
43+
44+
### is_alias_start() Update
45+
46+
The `is_alias_start()` blocklist in SelectParser must be updated to include new clause-starting keywords: `TK_RETURNING`, `TK_INTERSECT`, `TK_EXCEPT`, `TK_CONFLICT`, `TK_DO`, `TK_NOTHING`, `TK_DUPLICATE`.
47+
2148
---
2249

2350
## New NodeType Additions
@@ -140,18 +167,21 @@ UPDATE [ONLY] table_name [[AS] alias]
140167
```
141168
NODE_UPDATE_STMT
142169
├── NODE_STMT_OPTIONS (LOW_PRIORITY, IGNORE)
143-
├── NODE_FROM_CLAUSE (table references, may include JOINs for MySQL)
170+
├── NODE_TABLE_REF (target table — for both dialects, the primary table being updated)
171+
├── NODE_FROM_CLAUSE (MySQL: additional JOINed table refs; PostgreSQL: FROM join source)
172+
│ (distinguished by position: always comes after SET clause for PostgreSQL,
173+
│ comes as part of the initial table refs for MySQL)
174+
│ (MySQL multi-table: flags field has FLAG_UPDATE_TARGET_TABLES = 0x01)
144175
├── NODE_UPDATE_SET_CLAUSE
145176
│ ├── NODE_UPDATE_SET_ITEM (col = expr)
146177
│ └── NODE_UPDATE_SET_ITEM (col = expr)
147178
├── NODE_WHERE_CLAUSE
148179
├── NODE_ORDER_BY_CLAUSE (MySQL only)
149180
├── NODE_LIMIT_CLAUSE (MySQL only)
150-
├── NODE_FROM_CLAUSE (PostgreSQL FROM — second FROM node, distinct from table ref)
151181
└── NODE_RETURNING_CLAUSE (PostgreSQL)
152182
```
153183

154-
For MySQL multi-table UPDATE, the table references (with JOINs) reuse the existing `parse_from_clause()` / `parse_join()` logic from SelectParser. For PostgreSQL, the single table is parsed first, then an optional FROM clause provides joined tables.
184+
For MySQL multi-table UPDATE, the table references (with JOINs) reuse the shared `TableRefParser` methods. For MySQL, the JOINed tables appear as children of the first `NODE_FROM_CLAUSE` (before SET). For PostgreSQL, the single target table is a `NODE_TABLE_REF`, and the optional `FROM` clause (after SET, before WHERE) is a separate `NODE_FROM_CLAUSE` child. The emitter checks the statement type to determine emission order.
155185

156186
---
157187

@@ -251,7 +281,21 @@ NODE_COMPOUND_QUERY
251281

252282
### Integration
253283

254-
The `parse_select()` method in `Parser<D>` is updated: after parsing the first SELECT, it checks for UNION/INTERSECT/EXCEPT. If found, it wraps the result in a compound query. This is transparent to the caller — `parse()` still returns a `ParseResult`.
284+
A new `CompoundQueryParser<D>` class sits above `SelectParser<D>`. The `parse_select()` method in `Parser<D>` is updated to call `CompoundQueryParser` instead of `SelectParser` directly.
285+
286+
`CompoundQueryParser` works as follows:
287+
1. Parse the first operand: if `(`, consume it, parse inner compound recursively, expect `)`. Otherwise, call `SelectParser::parse()` for a single SELECT.
288+
2. Check for set operator (UNION/INTERSECT/EXCEPT). If none, return the single SELECT as-is.
289+
3. If found, enter a Pratt-like precedence loop: parse the operator, parse the next operand, build `NODE_SET_OPERATION` nodes respecting INTERSECT > UNION/EXCEPT precedence.
290+
4. After the compound, parse optional trailing ORDER BY / LIMIT (applies to whole result).
291+
5. Wrap in `NODE_COMPOUND_QUERY` and return.
292+
293+
This layering means `SelectParser` is unchanged — it still parses a single SELECT statement. The compound logic is entirely in `CompoundQueryParser`, which is a separate header-only template.
294+
295+
```
296+
include/sql_parser/
297+
compound_query_parser.h — UNION/INTERSECT/EXCEPT with precedence
298+
```
255299

256300
---
257301

@@ -328,7 +372,15 @@ Emitter(Arena& arena, EmitMode mode = EmitMode::NORMAL,
328372
const ParamBindings* bindings = nullptr);
329373
```
330374

331-
In digest mode, `emit_literal_*` methods write `?` instead of the actual value, keywords are uppercased, and IN lists are collapsed.
375+
In digest mode, the following methods change behavior:
376+
- `emit_value()` / `emit_string_literal()` — for literal nodes (`NODE_LITERAL_INT`, `NODE_LITERAL_FLOAT`, `NODE_LITERAL_STRING`), emit `?` instead of actual value
377+
- `emit_in_list()` — emit `IN (?)` regardless of how many values, collapsing the list
378+
- `emit_values_row()` — emit `(?, ?, ...)` matching column count but with `?` for all values
379+
- `emit_placeholder()` — emit `?` (same as normal mode, already a placeholder)
380+
- All keyword text emitted in uppercase (e.g., `SELECT`, `FROM`, `WHERE`)
381+
- `emit_alias()` — skip aliases in digest mode (aliases don't affect query semantics for routing)
382+
383+
Methods that do NOT change in digest mode: structural emission (FROM, JOIN, WHERE, GROUP BY, ORDER BY, LIMIT, etc.) remains identical since the query structure matters for digest grouping.
332384

333385
---
334386

@@ -348,7 +400,8 @@ TK_ONLY, // already exists in enum, verify in keyword tables
348400
TK_EXCEPT,
349401
TK_INTERSECT,
350402
TK_CONSTRAINT,
351-
TK_DEFAULT_VALUES, // or handle as TK_DEFAULT + TK_VALUES
403+
// Note: DEFAULT VALUES uses existing TK_DEFAULT + TK_VALUES (two-token approach, no compound token needed)
404+
// Note: TK_UNION and TK_OF already exist from Plan 3
352405
```
353406

354407
---
@@ -357,13 +410,13 @@ TK_DEFAULT_VALUES, // or handle as TK_DEFAULT + TK_VALUES
357410

358411
This spec should be implemented across 5 plans:
359412

360-
1. **Plan 7: INSERT deep parser** — INSERT/REPLACE with all syntax, emitter, tests. Closes #5.
413+
1. **Plan 7: Shared table ref parser + INSERT deep parser** Extract TableRefParser from SelectParser, then INSERT/REPLACE with all syntax, emitter, tests. Closes #5.
361414
2. **Plan 8: UPDATE deep parser** — full UPDATE syntax, emitter, tests. Closes #6.
362415
3. **Plan 9: DELETE deep parser** — full DELETE syntax, emitter, tests. Closes #7.
363-
4. **Plan 10: Compound queries** — UNION/INTERSECT/EXCEPT with nesting, emitter, tests. Closes #8.
416+
4. **Plan 10: Compound queries**CompoundQueryParser with UNION/INTERSECT/EXCEPT nesting, emitter, tests. Closes #8.
364417
5. **Plan 11: Query digest** — Digest module with both AST and token-level modes, tests. Closes #9.
365418

366-
Plans 7-9 are independent of each other (can be done in any order). Plan 10 depends on SELECT parser (already done). Plan 11 depends on the emitter (already done) and benefits from Plans 7-9 being complete (more node types to digest), but can work with Tier 2 token-level fallback for unstubbed types.
419+
**Dependencies:** Plan 7 must come first (extracts shared TableRefParser). Plans 8-9 depend on Plan 7's TableRefParser but are independent of each other. Plan 10 is independent of 7-9. Plan 11 benefits from all prior plans being complete but works with Tier 2 token-level fallback.
367420

368421
---
369422

0 commit comments

Comments
 (0)