|
| 1 | +# Compound Value Support Design |
| 2 | + |
| 3 | +**Goal:** Add first-class runtime `Value` support for arrays and tuples/composites so expression evaluation stops collapsing those AST nodes to `NULL`. |
| 4 | + |
| 5 | +**Scope decision:** This design covers local runtime support only: `Value` representation, expression evaluation, and tests. It does not include remote protocol serialization, planner type inference, or a full SQL row/composite type system. |
| 6 | + |
| 7 | +## Problem |
| 8 | + |
| 9 | +The evaluator currently treats `NODE_TUPLE`, bare `NODE_ARRAY_CONSTRUCTOR`, and `NODE_FIELD_ACCESS` as unsupported, and `NODE_ARRAY_SUBSCRIPT` only works against literal array AST nodes. That means compound expressions do not have a real runtime representation and cannot flow through the engine as values. |
| 10 | + |
| 11 | +## Chosen Approach |
| 12 | + |
| 13 | +Use first-class `Value` tags for compound values instead of more AST special cases. |
| 14 | + |
| 15 | +- Add `Value::TAG_ARRAY` |
| 16 | +- Add `Value::TAG_TUPLE` |
| 17 | +- Store both through a shared arena-owned descriptor rather than inventing a heap-managed object system |
| 18 | + |
| 19 | +This keeps the runtime model aligned with the rest of the engine: `Value` stays copyable, payload lifetime is arena-bound, and compound values can be nested without a separate ownership framework. |
| 20 | + |
| 21 | +## Runtime Model |
| 22 | + |
| 23 | +Introduce a lightweight sequence descriptor in the value layer: |
| 24 | + |
| 25 | +- `count`: number of elements |
| 26 | +- `elements`: arena-allocated `Value[]` |
| 27 | +- `field_names`: optional parallel `StringRef[]`, present only for named tuple/composite values |
| 28 | + |
| 29 | +`TAG_ARRAY` and `TAG_TUPLE` both point at this descriptor. The tag defines semantics: |
| 30 | + |
| 31 | +- `TAG_ARRAY`: ordered collection addressed by subscript |
| 32 | +- `TAG_TUPLE`: ordered collection that may optionally expose named fields |
| 33 | + |
| 34 | +Compound values remain opaque to arithmetic, comparison, and coercion unless a dedicated operator explicitly handles them. |
| 35 | + |
| 36 | +## Evaluation Semantics |
| 37 | + |
| 38 | +- `NODE_ARRAY_CONSTRUCTOR` evaluates each child and returns `TAG_ARRAY` |
| 39 | +- `NODE_TUPLE` evaluates each child and returns `TAG_TUPLE` |
| 40 | +- `NODE_ARRAY_SUBSCRIPT` evaluates the left-hand side first; if it yields `TAG_ARRAY`, apply existing dialect indexing rules (`0`-based for MySQL, `1`-based for PostgreSQL) |
| 41 | +- `NODE_FIELD_ACCESS` evaluates the left-hand side first; if it yields `TAG_TUPLE` with `field_names`, resolve by case-insensitive field name match |
| 42 | + |
| 43 | +Error semantics stay conservative: |
| 44 | + |
| 45 | +- out-of-bounds subscript -> `NULL` |
| 46 | +- `NULL` index -> `NULL` |
| 47 | +- field name not found -> `NULL` |
| 48 | +- field access on unnamed tuples -> `NULL` |
| 49 | +- field/subscript access on non-compound values -> `NULL` |
| 50 | + |
| 51 | +## Scope Boundaries |
| 52 | + |
| 53 | +Included in this design: |
| 54 | + |
| 55 | +- standalone array and tuple values |
| 56 | +- nested compound values |
| 57 | +- array subscript against evaluated runtime values, not just literal AST arrays |
| 58 | +- named tuple field access once a tuple carries field metadata |
| 59 | + |
| 60 | +Explicitly deferred: |
| 61 | + |
| 62 | +- remote executor transport for arrays/tuples |
| 63 | +- planner or catalog type inference for compound result columns |
| 64 | +- decimal representation redesign |
| 65 | +- full SQL composite producers beyond tuple helpers and evaluator paths |
| 66 | + |
| 67 | +## Implementation Order |
| 68 | + |
| 69 | +1. Extend `Value` with compound tags and descriptor helpers |
| 70 | +2. Add evaluator support for array and tuple construction |
| 71 | +3. Generalize array subscript to work against `TAG_ARRAY` |
| 72 | +4. Add named tuple helpers and `NODE_FIELD_ACCESS` support |
| 73 | +5. Expand tests for standalone values, nested compounds, and access semantics |
| 74 | + |
| 75 | +## Testing |
| 76 | + |
| 77 | +Add targeted unit coverage in: |
| 78 | + |
| 79 | +- `tests/test_expression_eval.cpp` for tuple/array construction, nested arrays, runtime subscript, and field access |
| 80 | +- `tests/test_value.cpp` if helper constructors or invariants need direct verification |
| 81 | + |
| 82 | +Regression expectations: |
| 83 | + |
| 84 | +- existing array subscript tests continue to pass |
| 85 | +- `TupleReturnsNull` and `ArrayConstructorReturnsNull` become positive-value tests |
| 86 | +- new field access tests use named tuple helpers or resolver-returned tuple values rather than requiring a new SQL producer first |
| 87 | + |
| 88 | +## Non-Goals |
| 89 | + |
| 90 | +- No attempt to make arrays or tuples comparable in this phase |
| 91 | +- No attempt to print or serialize compound values for wire protocols yet |
| 92 | +- No attempt to introduce schema-aware composite typing beyond optional field-name metadata |
0 commit comments