Skip to content

Commit 2466c3f

Browse files
committed
Add design spec for DML execution — INSERT/UPDATE/DELETE local + distributed
1 parent 6638ddc commit 2466c3f

1 file changed

Lines changed: 308 additions & 0 deletions

File tree

Lines changed: 308 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,308 @@
1+
# DML Execution (INSERT/UPDATE/DELETE) — Design Specification
2+
3+
## Overview
4+
5+
Extends the query engine with INSERT, UPDATE, and DELETE execution — both locally (against in-memory data sources) and distributed (routed to remote backends). Completes the engine's ability to handle all core SQL operations.
6+
7+
Sub-project 9. Depends on: executor (sub-project 7), distributed planner (sub-project 8), parser (INSERT/UPDATE/DELETE parsers already exist).
8+
9+
### Goals
10+
11+
- **MutableDataSource interface** — write operations (insert, update_where, delete_where)
12+
- **DML plan nodes** — INSERT_PLAN, UPDATE_PLAN, DELETE_PLAN
13+
- **Local DML execution** — execute against InMemoryMutableDataSource
14+
- **Distributed DML** — route INSERTs by shard key, scatter UPDATEs/DELETEs when needed
15+
- **DmlResult** — affected row count, success/error status
16+
17+
### Constraints
18+
19+
- C++17, arena-compatible
20+
- No transactions (atomic multi-statement deferred)
21+
- No auto-increment / sequence support yet
22+
- No ON DUPLICATE KEY / ON CONFLICT handling yet (parse-only, not executed)
23+
24+
---
25+
26+
## MutableDataSource
27+
28+
```cpp
29+
class MutableDataSource : public DataSource {
30+
public:
31+
virtual bool insert(const Row& row) = 0;
32+
virtual uint64_t delete_where(std::function<bool(const Row&)> predicate) = 0;
33+
virtual uint64_t update_where(std::function<bool(const Row&)> predicate,
34+
std::function<void(Row&)> updater) = 0;
35+
};
36+
```
37+
38+
### InMemoryMutableDataSource
39+
40+
Extends the existing InMemoryDataSource with write support:
41+
42+
```cpp
43+
class InMemoryMutableDataSource : public MutableDataSource {
44+
public:
45+
InMemoryMutableDataSource(const TableInfo* table);
46+
47+
// DataSource interface (read)
48+
const TableInfo* table_info() const override;
49+
void open() override;
50+
bool next(Row& out) override;
51+
void close() override;
52+
53+
// MutableDataSource interface (write)
54+
bool insert(const Row& row) override;
55+
uint64_t delete_where(std::function<bool(const Row&)> predicate) override;
56+
uint64_t update_where(std::function<bool(const Row&)> predicate,
57+
std::function<void(Row&)> updater) override;
58+
59+
// Utility
60+
size_t row_count() const;
61+
};
62+
```
63+
64+
Internally stores rows in a `std::vector<Row>`. INSERT appends. DELETE removes matching rows. UPDATE modifies matching rows in place.
65+
66+
---
67+
68+
## DML Plan Nodes
69+
70+
```cpp
71+
enum class PlanNodeType : uint8_t {
72+
// ... existing ...
73+
INSERT_PLAN,
74+
UPDATE_PLAN,
75+
DELETE_PLAN,
76+
};
77+
```
78+
79+
### INSERT_PLAN
80+
81+
```cpp
82+
struct {
83+
const TableInfo* table;
84+
const AstNode** columns; // column names (nullable = all columns in order)
85+
uint16_t column_count;
86+
const AstNode** value_rows; // array of NODE_VALUES_ROW pointers
87+
uint16_t row_count;
88+
PlanNode* select_source; // INSERT ... SELECT (nullable, mutually exclusive with value_rows)
89+
} insert_plan;
90+
```
91+
92+
### UPDATE_PLAN
93+
94+
```cpp
95+
struct {
96+
const TableInfo* table;
97+
const AstNode** set_columns; // column name AST nodes
98+
const AstNode** set_exprs; // new value expression AST nodes (parallel array)
99+
uint16_t set_count;
100+
const AstNode* where_expr; // WHERE condition (nullable = update all)
101+
} update_plan;
102+
```
103+
104+
### DELETE_PLAN
105+
106+
```cpp
107+
struct {
108+
const TableInfo* table;
109+
const AstNode* where_expr; // WHERE condition (nullable = delete all)
110+
} delete_plan;
111+
```
112+
113+
---
114+
115+
## DmlResult
116+
117+
```cpp
118+
struct DmlResult {
119+
uint64_t affected_rows = 0;
120+
uint64_t last_insert_id = 0;
121+
bool success = false;
122+
std::string error_message;
123+
};
124+
```
125+
126+
---
127+
128+
## DML Plan Builder
129+
130+
```cpp
131+
template <Dialect D>
132+
class DmlPlanBuilder {
133+
public:
134+
DmlPlanBuilder(const Catalog& catalog, Arena& arena);
135+
136+
PlanNode* build_insert(const AstNode* insert_ast);
137+
PlanNode* build_update(const AstNode* update_ast);
138+
PlanNode* build_delete(const AstNode* delete_ast);
139+
};
140+
```
141+
142+
Translates INSERT/UPDATE/DELETE AST nodes into DML plan nodes. Resolves table names via catalog.
143+
144+
---
145+
146+
## DML Execution
147+
148+
### PlanExecutor extensions
149+
150+
```cpp
151+
template <Dialect D>
152+
class PlanExecutor {
153+
public:
154+
// ... existing ...
155+
ResultSet execute(PlanNode* plan);
156+
157+
// New
158+
DmlResult execute_dml(PlanNode* plan);
159+
void add_mutable_data_source(const char* table_name, MutableDataSource* source);
160+
};
161+
```
162+
163+
### INSERT execution
164+
165+
1. Look up MutableDataSource for the target table
166+
2. For VALUES: evaluate each expression in each row, build Row, call `source->insert(row)`
167+
3. For INSERT ... SELECT: execute the SELECT sub-plan, insert each result row
168+
4. Return DmlResult with affected_rows = number of rows inserted
169+
170+
### UPDATE execution
171+
172+
1. Look up MutableDataSource for the target table
173+
2. Build predicate from WHERE expression: `[&](const Row& row) -> bool { evaluate WHERE against row }`
174+
3. Build updater from SET list: `[&](Row& row) { for each SET col=expr: evaluate expr, row.set(col_ordinal, result) }`
175+
4. Call `source->update_where(predicate, updater)`
176+
5. Return DmlResult with affected_rows
177+
178+
### DELETE execution
179+
180+
1. Look up MutableDataSource for the target table
181+
2. Build predicate from WHERE expression
182+
3. Call `source->delete_where(predicate)`
183+
4. Return DmlResult with affected_rows
184+
185+
---
186+
187+
## Distributed DML
188+
189+
### DistributedPlanner extensions
190+
191+
```cpp
192+
template <Dialect D>
193+
class DistributedPlanner {
194+
// ... existing distribute() for SELECT ...
195+
196+
// New
197+
PlanNode* distribute_dml(PlanNode* dml_plan);
198+
};
199+
```
200+
201+
### INSERT routing
202+
203+
- **Unsharded table:** Single RemoteScan with INSERT SQL to the backend
204+
- **Sharded table:** Examine the shard key value in each VALUES row. Group rows by target shard. Generate one INSERT per shard with its subset of rows.
205+
- **INSERT ... SELECT:** Execute SELECT distributedly first, then route each result row to the correct shard
206+
207+
### UPDATE routing
208+
209+
- **Unsharded:** Send UPDATE to single backend
210+
- **Shard key in WHERE** (e.g., `WHERE user_id = 42`): Route to specific shard
211+
- **No shard key in WHERE:** Scatter UPDATE to ALL shards. Sum affected_rows from all backends.
212+
213+
### DELETE routing
214+
215+
- Same logic as UPDATE: shard key → specific shard, otherwise → scatter to all
216+
217+
### Remote DML SQL generation
218+
219+
```cpp
220+
template <Dialect D>
221+
class RemoteQueryBuilder {
222+
// ... existing ...
223+
StringRef build_insert(const TableInfo* table, const AstNode** columns, uint16_t col_count,
224+
const AstNode** value_rows, uint16_t row_count);
225+
StringRef build_update(const TableInfo* table, const AstNode** set_cols,
226+
const AstNode** set_exprs, uint16_t set_count,
227+
const AstNode* where_expr);
228+
StringRef build_delete(const TableInfo* table, const AstNode* where_expr);
229+
};
230+
```
231+
232+
### RemoteExecutor extension
233+
234+
```cpp
235+
class RemoteExecutor {
236+
public:
237+
virtual ResultSet execute(const char* backend_name, StringRef sql) = 0;
238+
virtual DmlResult execute_dml(const char* backend_name, StringRef sql) = 0;
239+
};
240+
```
241+
242+
---
243+
244+
## File Organization
245+
246+
```
247+
include/sql_engine/
248+
mutable_data_source.h — MutableDataSource + InMemoryMutableDataSource
249+
dml_result.h — DmlResult struct
250+
dml_plan_builder.h — AST → DML plan nodes
251+
252+
(modify) plan_node.h — INSERT_PLAN, UPDATE_PLAN, DELETE_PLAN
253+
(modify) plan_executor.h — execute_dml(), add_mutable_data_source()
254+
(modify) remote_executor.h — execute_dml()
255+
(modify) remote_query_builder.h — build_insert/update/delete()
256+
(modify) distributed_planner.h — distribute_dml()
257+
258+
tests/
259+
test_dml.cpp — Local DML against InMemoryMutableDataSource
260+
test_distributed_dml.cpp — Distributed DML routing + correctness
261+
```
262+
263+
---
264+
265+
## Testing Strategy
266+
267+
### Local DML (test_dml.cpp)
268+
269+
Set up InMemoryMutableDataSource with initial data. Execute DML, verify with SELECT.
270+
271+
- INSERT single row → affected_rows = 1, SELECT confirms row present
272+
- INSERT multiple rows (multi-value) → affected_rows = N
273+
- INSERT ... SELECT → rows copied from source table
274+
- UPDATE with WHERE → only matching rows changed, affected_rows correct
275+
- UPDATE without WHERE → all rows changed
276+
- UPDATE with expression (SET age = age + 1) → computed correctly
277+
- DELETE with WHERE → matching rows removed, affected_rows correct
278+
- DELETE without WHERE → all rows gone, affected_rows = total
279+
- INSERT then DELETE then SELECT → verify correct state
280+
- NULL in INSERT values → row has NULL in that column
281+
- UPDATE SET column = NULL → column becomes NULL
282+
283+
### Distributed DML (test_distributed_dml.cpp)
284+
285+
MockRemoteExecutor with 3 backends.
286+
287+
- INSERT to unsharded → single backend receives INSERT
288+
- INSERT to sharded → routed to correct shard by shard key value
289+
- Multi-row INSERT to sharded → rows grouped by shard
290+
- UPDATE unsharded → single backend
291+
- UPDATE sharded with shard key in WHERE → single shard targeted
292+
- UPDATE sharded without shard key → scattered to all shards, affected_rows summed
293+
- DELETE unsharded → single backend
294+
- DELETE sharded with shard key → single shard
295+
- DELETE sharded scatter → all shards
296+
- Correctness: INSERT distributedly then SELECT to verify all rows present
297+
298+
---
299+
300+
## Performance Targets
301+
302+
| Operation | Target |
303+
|---|---|
304+
| INSERT single row (local) | <1us |
305+
| INSERT 100 rows (local) | <50us |
306+
| DELETE with WHERE (100 rows, 10 match) | <20us |
307+
| UPDATE with WHERE (100 rows, 10 match) | <30us |
308+
| Distributed INSERT routing (3 shards) | <10us (excluding network) |

0 commit comments

Comments
 (0)