Skip to content

Commit ab8b7fb

Browse files
committed
fix(distributed): single-shard aggregate schema and use-after-free (issue 08)
Resolves the two stacked bugs surfaced by scripts/test_sqlengine.sh single. Both manifested as the same user-visible symptom (catalog column headers and "?" rendered values for SELECT COUNT(*) / SELECT SUM(...) against a single-shard config), but each had a different root cause and needed an independent fix. Bug A — wrong column headers (schema) make_unsharded_aggregate built a remote SQL like SELECT COUNT(*) FROM users, sent it to the one backend, and returned a bare REMOTE_SCAN tagged with the source users table. build_column_names saw the REMOTE_SCAN and emitted all of users.columns[] as the result schema, mis-labelling the projected aggregates with unrelated catalog column names. The multi-shard path dodged this because its MERGE_AGGREGATE node carried explicit output_exprs. Fix: - Add output_exprs + output_expr_count to PlanNode::remote_scan, mirroring the same fields on merge_aggregate. - New helper make_remote_scan_with_outputs in distributed_planner.h populates them. - make_unsharded_aggregate uses the new helper, attaching the group_by + agg_exprs projection list. - build_column_names(REMOTE_SCAN) prefers output_exprs (rendered through Emitter) when present and falls back to the table's catalog columns only for SELECT * passthrough. Bug B — "?" values (use-after-free) After bug A was fixed and headers became correct, every aggregate value still rendered as "?". Tracing showed Value::tag = 58 arriving at the renderer — an enum value out of range, i.e. uninitialized memory. PlanExecutor is a stack-local in Session::execute_query. When it goes out of scope, every operator in operators_ is destroyed. RemoteScanOperator owns an internal ResultSet (returned from RemoteExecutor::execute) which heap-owns the Value arrays the rows point into. When the operator died those arrays were freed, leaving the outer ResultSet returned to the caller with rows whose values pointer references freed memory. This was invisible for any plan tree that wraps the REMOTE_SCAN in a local operator that re-allocates rows in the executor's arena (PROJECT, MERGE_AGGREGATE, etc.). The 1-shard aggregate path is the one shape that yields rows directly from a REMOTE_SCAN, so issue 08 was the only place it surfaced. A first attempt moved heap arrays and strings out of the operator's ResultSet into the outer ResultSet. That worked for value arrays but corrupted SSO std::string content (every string lost its first byte) because StringRefs into the source string's inline buffer became dangling after the move. Fix that worked: keep the operators alive as long as the returned ResultSet. New std::vector<std::shared_ptr<void>> backing_lifetimes on ResultSet holds type-erased ownership of operators released from PlanExecutor::operators_. Operators (and all storage they own — heap arrays, owned_strings deque, the inner ResultSet) survive until the caller is done with the ResultSet. Pointers stay valid; nothing moves. Verification - scripts/test_sqlengine.sh single — was 8/10, now 10/10. - scripts/test_sqlengine.sh all — 34/34. Files changed - include/sql_engine/plan_node.h - include/sql_engine/distributed_planner.h - include/sql_engine/plan_executor.h - include/sql_engine/result_set.h - docs/issues/08-... (resolution notes) - docs/issues/README.md (status flip) - docs/architecture-and-status.md (issue 08 in §8) Future work - Consider extending output_exprs to every REMOTE_SCAN (including projected non-aggregate pushdown) so the renderer emits better column headers across the board. - Audit other paths where rows might be returned from operator-local heap storage without lifetime extension.
1 parent f0f2915 commit ab8b7fb

7 files changed

Lines changed: 149 additions & 11 deletions

File tree

docs/architecture-and-status.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -595,8 +595,8 @@ Source: `docs/issues/README.md`. Status reflects the working tree on 2026-04-17.
595595
*New module `tool_config_parser.{h,cpp}`. Replaces ~135 lines of duplication in each of `sqlengine`, `mysql_server`, `bench_distributed`, `engine_stress_test`, plus `tests/test_ssl_config.cpp`.*
596596
4. **[04] Close join execution coverage gaps** — 🟦 in working tree.
597597
*`NestedLoopJoinOperator` now executes RIGHT and FULL outer joins (with right-row match tracking) and resolves qualified column names in join conditions.*
598-
8. **[08] Aggregate projection schema wrong in single-shard mode**📋 open. Surfaced 2026-04-18 by `scripts/test_sqlengine.sh single`.
599-
*`SELECT COUNT(*)` / `SELECT SUM(salary)` against a 1-shard config show catalog table columns as headers and put the aggregate value in the wrong slot. Two-shard mode is unaffected.*
598+
8. **[08] Aggregate projection schema wrong in single-shard mode**✅ resolved 2026-04-18.
599+
*Two stacked bugs. (a) `make_unsharded_aggregate` returned a bare `REMOTE_SCAN` whose `build_column_names` defaulted to the source table's columns; fixed by adding `output_exprs` to the remote-scan plan node. (b) After (a), values still rendered as `?` because `RemoteScanOperator`'s heap-owned storage died with the stack-local `PlanExecutor` while the returned rows still pointed into it; fixed by adding `backing_lifetimes` to `ResultSet` so operators outlive the executor's stack frame.*
600600

601601
### P2 — deferred
602602

docs/issues/08-aggregate-projection-schema-single-shard.md

Lines changed: 51 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
## Status
88

9-
Open. Surfaced by `scripts/test_sqlengine.sh single` on 2026-04-18.
9+
Resolved 2026-04-18.
1010

1111
## Problem
1212

@@ -91,7 +91,54 @@ Worth checking which shard-count branch in `distribute_node` runs when there is
9191
docker rm -f parsersql-single 2>/dev/null
9292
./scripts/setup_single_backend.sh
9393
make build-sqlengine
94-
./scripts/test_sqlengine.sh single # expect 10/10
95-
make test # full unit suite
96-
./scripts/test_sqlengine.sh all # expect all-green
94+
./scripts/test_sqlengine.sh single # 10/10
95+
./scripts/test_sqlengine.sh all # 34/34
9796
```
97+
98+
## Resolution Notes
99+
100+
The single bug surfaced by the test was actually two bugs stacked.
101+
102+
### Bug A — wrong column headers (schema)
103+
104+
`make_unsharded_aggregate` in `distributed_planner.h` builds a remote SQL like `SELECT COUNT(*) FROM users`, sends it to the one backend, and returns a bare `REMOTE_SCAN` plan node tagged with the source `users` table. `build_column_names` in `plan_executor.h` saw the `REMOTE_SCAN` and emitted *all of* `users.columns[]` (`id, name, age, dept, salary`) as the result schema. The user saw `id` where the result was actually `COUNT(*)`.
105+
106+
The multi-shard path dodged this because its `MERGE_AGGREGATE` node carried explicit `output_exprs` and had its own `build_column_names` case that used them. The 1-shard path had no equivalent.
107+
108+
**Fix:**
109+
110+
- Added optional `output_exprs` + `output_expr_count` fields to `PlanNode::remote_scan` (mirroring the same fields on `merge_aggregate`).
111+
- New helper `make_remote_scan_with_outputs(...)` in `distributed_planner.h` populates them.
112+
- `make_unsharded_aggregate` now uses the new helper, attaching the projection list (`group_by + agg_exprs`) to the REMOTE_SCAN.
113+
- `build_column_names(REMOTE_SCAN)` prefers the `output_exprs` (rendered through `Emitter`) when present, and falls back to the table's catalog columns only for `SELECT *` passthrough.
114+
115+
### Bug B — `?` rendered values (use-after-free)
116+
117+
After bug A was fixed and headers became correct, every aggregate value still rendered as `?`. Tracing showed `Value::tag = 58` arriving at the renderer — a value far outside the enum range, i.e. uninitialized memory.
118+
119+
`PlanExecutor` is a stack-local in `Session::execute_query`. When it goes out of scope, every operator in `operators_` is destroyed. `RemoteScanOperator` owns an internal `ResultSet` (returned from `RemoteExecutor::execute`) which heap-owns the `Value` arrays the rows point into. When the operator died, those arrays were freed; the outer `ResultSet` returned to the caller had `rows[i].values` pointing at freed memory.
120+
121+
The bug is invisible for any query whose plan tree wraps the `REMOTE_SCAN` in a local operator that re-allocates rows in the executor's arena (PROJECT, MERGE_AGGREGATE, etc.). The 1-shard aggregate path is the one shape that yields rows directly from a `REMOTE_SCAN` to the caller — that's why issue 08 only surfaced there.
122+
123+
A first attempt moved heap arrays and strings out of the operator's `ResultSet` into the outer `ResultSet`. That worked for value arrays but corrupted SSO `std::string` content (every string lost its first byte) because `StringRef`s captured into the source string's inline buffer became dangling after the move.
124+
125+
**Fix that worked:** keep the operators themselves alive as long as the returned `ResultSet`. A new `std::vector<std::shared_ptr<void>> backing_lifetimes` on `ResultSet` holds type-erased ownership of operators released from `PlanExecutor::operators_`. The operators (and all storage they own — heap arrays, the `owned_strings` deque, the inner `ResultSet`) survive until the caller is done with the `ResultSet`. Pointers stay valid; nothing moves.
126+
127+
## Test Coverage
128+
129+
- `scripts/test_sqlengine.sh single` — 10/10 with the fix; the two failing assertions (`single: total user count` finds `10` for `SELECT COUNT(*) FROM users`; `single: SUM(salary) Engineering = 530000` finds the value) now pass.
130+
- `scripts/test_sqlengine.sh all` — 34/34.
131+
132+
The shell suite is the regression guard. Anyone who reintroduces either bug will see it fail loudly within seconds of running `make test-sqlengine-single`.
133+
134+
## Files Touched
135+
136+
- `include/sql_engine/plan_node.h``remote_scan.output_exprs` + `output_expr_count`.
137+
- `include/sql_engine/distributed_planner.h``make_remote_scan_with_outputs`; `make_unsharded_aggregate` uses it.
138+
- `include/sql_engine/plan_executor.h``build_column_names(REMOTE_SCAN)` honours `output_exprs`; `execute()` releases operators into `rs.backing_lifetimes`.
139+
- `include/sql_engine/result_set.h``backing_lifetimes` field; move ctor / move assignment carry it.
140+
141+
## Future Work
142+
143+
- Consider extending `output_exprs` to *every* `REMOTE_SCAN` produced by the planner, including projected (non-aggregate) pushdown, so the renderer can emit better column headers across the board (e.g. `name` instead of catalog ordinal-0).
144+
- Audit other places where rows might be returned from operator-local heap storage without lifetime extension. `SCAN` against a `MutableDataSource` that returns owned strings could have a similar shape.

docs/issues/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ This directory is the local working backlog for implementation gaps identified f
1414
2. [Make 2PC phase timeouts deterministic rather than best-effort](02-distributed-2pc-deterministic-phase-timeouts.md) — implemented in current working tree
1515
3. [Extract shared backend and shard configuration parsing](03-shared-backend-config-parsing.md) — implemented in current working tree
1616
4. [Close join execution coverage gaps or reject unsupported joins earlier](04-join-operator-coverage.md) — implemented in current working tree
17-
8. [Aggregate projection schema wrong in single-shard mode](08-aggregate-projection-schema-single-shard.md)open; surfaced 2026-04-18 by `scripts/test_sqlengine.sh single`
17+
8. [Aggregate projection schema wrong in single-shard mode](08-aggregate-projection-schema-single-shard.md)resolved 2026-04-18 (REMOTE_SCAN.output_exprs + ResultSet.backing_lifetimes; surfaced two stacked bugs: schema and use-after-free)
1818

1919
### P2
2020

include/sql_engine/distributed_planner.h

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -503,6 +503,26 @@ class DistributedPlanner {
503503
node->remote_scan.remote_sql = sql.ptr;
504504
node->remote_scan.remote_sql_len = static_cast<uint16_t>(sql.len);
505505
node->remote_scan.table = table;
506+
// Caller is responsible for setting output_exprs when the remote SQL
507+
// is not a passthrough SELECT *. make_plan_node() already zero-fills
508+
// the union, so leaving these unset here means "fall back to the
509+
// table's catalog columns".
510+
return node;
511+
}
512+
513+
PlanNode* make_remote_scan_with_outputs(
514+
const char* backend, sql_parser::StringRef sql, const TableInfo* table,
515+
const std::vector<const sql_parser::AstNode*>& output_exprs)
516+
{
517+
PlanNode* node = make_remote_scan(backend, sql, table);
518+
if (!output_exprs.empty()) {
519+
uint16_t n = static_cast<uint16_t>(output_exprs.size());
520+
auto** arr = static_cast<const sql_parser::AstNode**>(
521+
arena_.allocate(sizeof(sql_parser::AstNode*) * n));
522+
for (uint16_t i = 0; i < n; ++i) arr[i] = output_exprs[i];
523+
node->remote_scan.output_exprs = arr;
524+
node->remote_scan.output_expr_count = n;
525+
}
506526
return node;
507527
}
508528

@@ -624,7 +644,12 @@ class DistributedPlanner {
624644
projs.data(), static_cast<uint16_t>(projs.size()),
625645
gb.data(), static_cast<uint16_t>(gb.size()),
626646
nullptr, nullptr, 0, -1, false);
627-
return make_remote_scan(backend, sql, table);
647+
// Carry the projection expressions on the REMOTE_SCAN so the result
648+
// schema picks them up instead of mis-labelling with the source
649+
// table's catalog columns. Without this, "SELECT COUNT(*) FROM users"
650+
// against a single-shard config renders as the table's first
651+
// column ("id") rather than the aggregate.
652+
return make_remote_scan_with_outputs(backend, sql, table, projs);
628653
}
629654

630655
void decompose_aggregate(const sql_parser::AstNode* expr,

include/sql_engine/plan_executor.h

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,29 @@ class PlanExecutor {
270270
}
271271
root->close();
272272

273+
// Extend operator lifetimes to match the returned ResultSet.
274+
// PlanExecutor is typically a stack-local in Session::execute_query,
275+
// so without this every operator would die when the executor goes
276+
// out of scope -- taking with it any heap storage that the rows'
277+
// Value* / StringRef pointers reference. RemoteScanOperator is the
278+
// canonical case: its rows live inside an internal ResultSet
279+
// returned from a remote backend, owned by the operator. Bare
280+
// single-shard REMOTE_SCAN paths (e.g. SELECT COUNT(*) FROM t
281+
// against a one-backend "shard") have no local operator above
282+
// them to copy values into the executor's arena, so without
283+
// lifetime extension the caller sees garbage tag values rendered
284+
// as "?" and zero-overwritten string starts.
285+
for (auto& op : operators_) {
286+
if (op) {
287+
Operator* raw = op.release();
288+
rs.backing_lifetimes.emplace_back(
289+
std::shared_ptr<void>(raw, [](void* p) {
290+
delete static_cast<Operator*>(p);
291+
}));
292+
}
293+
}
294+
operators_.clear();
295+
273296
if (!rs.rows.empty()) {
274297
rs.column_count = rs.rows[0].column_count;
275298
}
@@ -1203,7 +1226,27 @@ class PlanExecutor {
12031226
break;
12041227
}
12051228
case PlanNodeType::REMOTE_SCAN: {
1206-
if (plan->remote_scan.table) {
1229+
// Prefer the projection expressions the planner attached
1230+
// (set by make_remote_scan_with_outputs for aggregate or
1231+
// projected pushdown). Fall back to the table's catalog
1232+
// columns only for SELECT * passthrough.
1233+
if (plan->remote_scan.output_exprs &&
1234+
plan->remote_scan.output_expr_count > 0) {
1235+
for (uint16_t i = 0; i < plan->remote_scan.output_expr_count; ++i) {
1236+
const sql_parser::AstNode* expr = plan->remote_scan.output_exprs[i];
1237+
if (expr) {
1238+
sql_parser::Emitter<D> emitter(arena_);
1239+
emitter.emit(expr);
1240+
sql_parser::StringRef ev = emitter.result();
1241+
if (ev.ptr && ev.len > 0)
1242+
rs.column_names.emplace_back(ev.ptr, ev.len);
1243+
else
1244+
rs.column_names.push_back("?column?");
1245+
} else {
1246+
rs.column_names.push_back("?column?");
1247+
}
1248+
}
1249+
} else if (plan->remote_scan.table) {
12071250
for (uint16_t i = 0; i < plan->remote_scan.table->column_count; ++i) {
12081251
auto& cn = plan->remote_scan.table->columns[i].name;
12091252
rs.column_names.emplace_back(cn.ptr, cn.len);

include/sql_engine/plan_node.h

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,15 @@ struct PlanNode {
102102
const char* backend_name;
103103
const char* remote_sql;
104104
uint16_t remote_sql_len;
105-
const TableInfo* table; // expected result schema
105+
const TableInfo* table; // expected result schema (for SELECT *)
106+
// Optional projection expressions used to derive result column
107+
// names when the remote SQL is not a passthrough SELECT *. When
108+
// non-null, build_column_names() prefers these over the table's
109+
// catalog columns. Required for aggregate / projected pushdown
110+
// to a single-shard backend; otherwise the catalog's table
111+
// columns mis-label the result.
112+
const sql_parser::AstNode** output_exprs;
113+
uint16_t output_expr_count;
106114
} remote_scan;
107115

108116
// Merge operations for distributed aggregation

include/sql_engine/result_set.h

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,15 @@
44
#include "sql_engine/row.h"
55
#include <vector>
66
#include <deque>
7+
#include <memory>
78
#include <string>
89
#include <algorithm>
910
#include <utility>
1011

1112
namespace sql_engine {
1213

14+
class Operator; // forward decl — ResultSet may extend operator lifetime
15+
1316
struct ResultSet {
1417
std::vector<Row> rows;
1518
std::vector<std::string> column_names;
@@ -30,6 +33,16 @@ struct ResultSet {
3033
// existing elements on push_back, so pointers stay valid.
3134
std::deque<std::string> owned_strings;
3235

36+
// Operators that produced rows in this ResultSet may own heap or
37+
// arena storage that the rows' Value* / StringRef pointers reference
38+
// (notably RemoteScanOperator's internal ResultSet returned from a
39+
// remote backend). Without keeping those operators alive past
40+
// PlanExecutor's stack scope, rows would dangle the moment
41+
// execute_query returned. Keeping them as opaque shared_ptr<void>
42+
// here avoids a circular include with operator.h while still
43+
// extending their lifetime to match this ResultSet.
44+
std::vector<std::shared_ptr<void>> backing_lifetimes;
45+
3346
ResultSet() = default;
3447
~ResultSet() {
3548
for (auto* arr : owned_value_arrays) ::operator delete(arr);
@@ -41,7 +54,8 @@ struct ResultSet {
4154
column_names(std::move(o.column_names)),
4255
column_count(o.column_count),
4356
owned_value_arrays(std::move(o.owned_value_arrays)),
44-
owned_strings(std::move(o.owned_strings)) {
57+
owned_strings(std::move(o.owned_strings)),
58+
backing_lifetimes(std::move(o.backing_lifetimes)) {
4559
o.column_count = 0;
4660
}
4761

@@ -53,6 +67,7 @@ struct ResultSet {
5367
column_count = o.column_count;
5468
owned_value_arrays = std::move(o.owned_value_arrays);
5569
owned_strings = std::move(o.owned_strings);
70+
backing_lifetimes = std::move(o.backing_lifetimes);
5671
o.column_count = 0;
5772
}
5873
return *this;

0 commit comments

Comments
 (0)