You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The existing `SelectParser<D>` has `parse_from_clause()`, `parse_table_reference()`, `parse_join()`, and `parse_optional_alias()` as private methods. These are needed by `InsertParser` (for INSERT ... SELECT), `UpdateParser` (for MySQL multi-table UPDATE), and `DeleteParser` (for MySQL multi-table DELETE and PostgreSQL USING).
24
+
25
+
**Solution:** Extract these methods into a shared `TableRefParser<D>` utility class that takes a `Tokenizer<D>&` and `Arena&`. All parsers (SelectParser, InsertParser, UpdateParser, DeleteParser) instantiate it internally. SelectParser's private methods are replaced with calls to TableRefParser.
The `is_alias_start()` blocklist in SelectParser must be updated to include new clause-starting keywords: `TK_RETURNING`, `TK_INTERSECT`, `TK_EXCEPT`, `TK_CONFLICT`, `TK_DO`, `TK_NOTHING`, `TK_DUPLICATE`.
├── NODE_FROM_CLAUSE (table references, may include JOINs for MySQL)
170
+
├── NODE_TABLE_REF (target table — for both dialects, the primary table being updated)
171
+
├── NODE_FROM_CLAUSE (MySQL: additional JOINed table refs; PostgreSQL: FROM join source)
172
+
│ (distinguished by position: always comes after SET clause for PostgreSQL,
173
+
│ comes as part of the initial table refs for MySQL)
174
+
│ (MySQL multi-table: flags field has FLAG_UPDATE_TARGET_TABLES = 0x01)
144
175
├── NODE_UPDATE_SET_CLAUSE
145
176
│ ├── NODE_UPDATE_SET_ITEM (col = expr)
146
177
│ └── NODE_UPDATE_SET_ITEM (col = expr)
147
178
├── NODE_WHERE_CLAUSE
148
179
├── NODE_ORDER_BY_CLAUSE (MySQL only)
149
180
├── NODE_LIMIT_CLAUSE (MySQL only)
150
-
├── NODE_FROM_CLAUSE (PostgreSQL FROM — second FROM node, distinct from table ref)
151
181
└── NODE_RETURNING_CLAUSE (PostgreSQL)
152
182
```
153
183
154
-
For MySQL multi-table UPDATE, the table references (with JOINs) reuse the existing `parse_from_clause()` / `parse_join()` logic from SelectParser. For PostgreSQL, the single table is parsed first, then an optional FROM clause provides joined tables.
184
+
For MySQL multi-table UPDATE, the table references (with JOINs) reuse the shared `TableRefParser` methods. For MySQL, the JOINed tables appear as children of the first `NODE_FROM_CLAUSE` (before SET). For PostgreSQL, the single target table is a `NODE_TABLE_REF`, and the optional `FROM` clause (after SET, before WHERE) is a separate `NODE_FROM_CLAUSE` child. The emitter checks the statement type to determine emission order.
155
185
156
186
---
157
187
@@ -251,7 +281,21 @@ NODE_COMPOUND_QUERY
251
281
252
282
### Integration
253
283
254
-
The `parse_select()` method in `Parser<D>` is updated: after parsing the first SELECT, it checks for UNION/INTERSECT/EXCEPT. If found, it wraps the result in a compound query. This is transparent to the caller — `parse()` still returns a `ParseResult`.
284
+
A new `CompoundQueryParser<D>` class sits above `SelectParser<D>`. The `parse_select()` method in `Parser<D>` is updated to call `CompoundQueryParser` instead of `SelectParser` directly.
285
+
286
+
`CompoundQueryParser` works as follows:
287
+
1. Parse the first operand: if `(`, consume it, parse inner compound recursively, expect `)`. Otherwise, call `SelectParser::parse()` for a single SELECT.
288
+
2. Check for set operator (UNION/INTERSECT/EXCEPT). If none, return the single SELECT as-is.
289
+
3. If found, enter a Pratt-like precedence loop: parse the operator, parse the next operand, build `NODE_SET_OPERATION` nodes respecting INTERSECT > UNION/EXCEPT precedence.
290
+
4. After the compound, parse optional trailing ORDER BY / LIMIT (applies to whole result).
291
+
5. Wrap in `NODE_COMPOUND_QUERY` and return.
292
+
293
+
This layering means `SelectParser` is unchanged — it still parses a single SELECT statement. The compound logic is entirely in `CompoundQueryParser`, which is a separate header-only template.
294
+
295
+
```
296
+
include/sql_parser/
297
+
compound_query_parser.h — UNION/INTERSECT/EXCEPT with precedence
In digest mode, `emit_literal_*` methods write `?` instead of the actual value, keywords are uppercased, and IN lists are collapsed.
375
+
In digest mode, the following methods change behavior:
376
+
-`emit_value()` / `emit_string_literal()` — for literal nodes (`NODE_LITERAL_INT`, `NODE_LITERAL_FLOAT`, `NODE_LITERAL_STRING`), emit `?` instead of actual value
377
+
-`emit_in_list()` — emit `IN (?)` regardless of how many values, collapsing the list
378
+
-`emit_values_row()` — emit `(?, ?, ...)` matching column count but with `?` for all values
379
+
-`emit_placeholder()` — emit `?` (same as normal mode, already a placeholder)
380
+
- All keyword text emitted in uppercase (e.g., `SELECT`, `FROM`, `WHERE`)
381
+
-`emit_alias()` — skip aliases in digest mode (aliases don't affect query semantics for routing)
382
+
383
+
Methods that do NOT change in digest mode: structural emission (FROM, JOIN, WHERE, GROUP BY, ORDER BY, LIMIT, etc.) remains identical since the query structure matters for digest grouping.
332
384
333
385
---
334
386
@@ -348,7 +400,8 @@ TK_ONLY, // already exists in enum, verify in keyword tables
348
400
TK_EXCEPT,
349
401
TK_INTERSECT,
350
402
TK_CONSTRAINT,
351
-
TK_DEFAULT_VALUES, // or handle as TK_DEFAULT + TK_VALUES
// Note: TK_UNION and TK_OF already exist from Plan 3
352
405
```
353
406
354
407
---
@@ -357,13 +410,13 @@ TK_DEFAULT_VALUES, // or handle as TK_DEFAULT + TK_VALUES
357
410
358
411
This spec should be implemented across 5 plans:
359
412
360
-
1.**Plan 7: INSERT deep parser** — INSERT/REPLACE with all syntax, emitter, tests. Closes #5.
413
+
1.**Plan 7: Shared table ref parser + INSERT deep parser** — Extract TableRefParser from SelectParser, then INSERT/REPLACE with all syntax, emitter, tests. Closes #5.
361
414
2.**Plan 8: UPDATE deep parser** — full UPDATE syntax, emitter, tests. Closes #6.
362
415
3.**Plan 9: DELETE deep parser** — full DELETE syntax, emitter, tests. Closes #7.
5.**Plan 11: Query digest** — Digest module with both AST and token-level modes, tests. Closes #9.
365
418
366
-
Plans 7-9 are independent of each other (can be done in any order). Plan 10 depends on SELECT parser (already done). Plan 11 depends on the emitter (already done) and benefits from Plans 7-9 being complete (more node types to digest), but can work with Tier 2 token-level fallback for unstubbed types.
419
+
**Dependencies:** Plan 7 must come first (extracts shared TableRefParser). Plans 8-9 depend on Plan 7's TableRefParser but are independent of each other. Plan 10 is independent of 7-9. Plan 11 benefits from all prior plans being complete but works with Tier 2 token-level fallback.
0 commit comments