Skip to content

Commit 40dad5d

Browse files
committed
Add design spec for backend connections — MySQL + PostgreSQL via libmysqlclient/libpq
1 parent 648f01e commit 40dad5d

1 file changed

Lines changed: 305 additions & 0 deletions

File tree

Lines changed: 305 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
# Backend Connections — Design Specification
2+
3+
## Overview
4+
5+
Replace the MockRemoteExecutor with real MySQL and PostgreSQL backend connections. The distributed query engine can now run queries against actual databases over the network.
6+
7+
Sub-project 10. Depends on: distributed planner (sub-project 8), DML execution (sub-project 9).
8+
9+
### Goals
10+
11+
- **MySQLRemoteExecutor** — connect to MySQL backends via libmysqlclient, run queries, convert results to Row/Value
12+
- **PgSQLRemoteExecutor** — connect to PostgreSQL backends via libpq, run queries, convert results
13+
- **MultiRemoteExecutor** — unified executor that routes to MySQL or PostgreSQL by backend dialect
14+
- **Type conversion** — map MySQL/PostgreSQL wire types to our Value tags accurately
15+
- **Docker-based test infrastructure** — spin up real MySQL + PostgreSQL for integration tests
16+
- **Full end-to-end** — parse SQL, distribute, run against real backends, merged results
17+
18+
### Constraints
19+
20+
- C++17
21+
- Link against libmysqlclient and libpq (system packages)
22+
- Single persistent connection per backend (no pool — prototype)
23+
- Sequential query execution (no parallelism — correct but not optimal)
24+
- Tests skip gracefully if databases not available
25+
26+
### Non-Goals
27+
28+
- Connection pooling (future — ProxySQL already has this)
29+
- Async/parallel query execution (future)
30+
- SSL/TLS connections (future)
31+
- Prepared statements over the wire (future)
32+
- Custom wire protocol implementation (use libraries)
33+
34+
---
35+
36+
## BackendConfig
37+
38+
```cpp
39+
struct BackendConfig {
40+
std::string name; // logical name: "shard_1", "analytics_db"
41+
std::string host;
42+
uint16_t port;
43+
std::string user;
44+
std::string password;
45+
std::string database;
46+
Dialect dialect; // MySQL or PostgreSQL
47+
};
48+
```
49+
50+
---
51+
52+
## MySQLRemoteExecutor
53+
54+
```cpp
55+
class MySQLRemoteExecutor : public RemoteExecutor {
56+
public:
57+
MySQLRemoteExecutor(Arena& arena);
58+
~MySQLRemoteExecutor();
59+
60+
void add_backend(const BackendConfig& config);
61+
ResultSet execute(const char* backend_name, StringRef sql) override;
62+
DmlResult execute_dml(const char* backend_name, StringRef sql) override;
63+
void disconnect_all();
64+
65+
private:
66+
struct Connection {
67+
BackendConfig config;
68+
MYSQL* conn = nullptr;
69+
bool connected = false;
70+
};
71+
72+
std::unordered_map<std::string, Connection> backends_;
73+
Arena& arena_;
74+
75+
Connection& get_or_connect(const std::string& name);
76+
ResultSet mysql_result_to_resultset(MYSQL_RES* res);
77+
Value mysql_field_to_value(const char* data, unsigned long length,
78+
enum_field_types type, bool is_null);
79+
};
80+
```
81+
82+
### Connection lifecycle
83+
84+
- `add_backend()` stores the config. Does NOT connect yet (lazy).
85+
- `get_or_connect()` connects on first use via `mysql_real_connect()`.
86+
- If connection drops, attempts one reconnect before failing.
87+
- `disconnect_all()` closes all connections. Called in destructor.
88+
89+
### MySQL type to Value conversion
90+
91+
All values arrive as strings in text protocol mode. Conversion by field type:
92+
93+
| MySQL C API type | Value tag | Conversion |
94+
|---|---|---|
95+
| MYSQL_TYPE_TINY/SHORT/LONG/LONGLONG | TAG_INT64 | strtoll(data) |
96+
| MYSQL_TYPE_FLOAT/DOUBLE | TAG_DOUBLE | strtod(data) |
97+
| MYSQL_TYPE_DECIMAL/NEWDECIMAL | TAG_STRING | arena string copy |
98+
| MYSQL_TYPE_STRING/VAR_STRING | TAG_STRING | arena string copy |
99+
| MYSQL_TYPE_BLOB | TAG_BYTES | arena copy |
100+
| MYSQL_TYPE_DATE | TAG_DATE | parse "YYYY-MM-DD" to days since epoch |
101+
| MYSQL_TYPE_DATETIME | TAG_DATETIME | parse "YYYY-MM-DD HH:MM:SS" to microseconds |
102+
| MYSQL_TYPE_TIMESTAMP | TAG_TIMESTAMP | parse "YYYY-MM-DD HH:MM:SS" to microseconds |
103+
| MYSQL_TYPE_TIME | TAG_TIME | parse "HH:MM:SS" to microseconds |
104+
| NULL | TAG_NULL | value_null() |
105+
106+
---
107+
108+
## PgSQLRemoteExecutor
109+
110+
```cpp
111+
class PgSQLRemoteExecutor : public RemoteExecutor {
112+
public:
113+
PgSQLRemoteExecutor(Arena& arena);
114+
~PgSQLRemoteExecutor();
115+
116+
void add_backend(const BackendConfig& config);
117+
ResultSet execute(const char* backend_name, StringRef sql) override;
118+
DmlResult execute_dml(const char* backend_name, StringRef sql) override;
119+
void disconnect_all();
120+
121+
private:
122+
struct Connection {
123+
BackendConfig config;
124+
PGconn* conn = nullptr;
125+
bool connected = false;
126+
};
127+
128+
std::unordered_map<std::string, Connection> backends_;
129+
Arena& arena_;
130+
131+
Connection& get_or_connect(const std::string& name);
132+
ResultSet pg_result_to_resultset(PGresult* res);
133+
Value pg_field_to_value(const char* data, int length, Oid type, bool is_null);
134+
};
135+
```
136+
137+
### PostgreSQL type OID to Value conversion
138+
139+
| PgSQL OID | Value tag | Conversion |
140+
|---|---|---|
141+
| INT2OID (21) / INT4OID (23) / INT8OID (20) | TAG_INT64 | strtoll |
142+
| FLOAT4OID (700) / FLOAT8OID (701) | TAG_DOUBLE | strtod |
143+
| NUMERICOID (1700) | TAG_STRING | string copy |
144+
| TEXTOID (25) / VARCHAROID (1043) / BPCHAROID (1042) | TAG_STRING | arena copy |
145+
| BOOLOID (16) | TAG_BOOL | "t" = true, "f" = false |
146+
| DATEOID (1082) | TAG_DATE | parse "YYYY-MM-DD" |
147+
| TIMESTAMPOID (1114) | TAG_DATETIME | parse datetime string |
148+
| TIMESTAMPTZOID (1184) | TAG_TIMESTAMP | parse with timezone |
149+
| TIMEOID (1083) | TAG_TIME | parse "HH:MM:SS" |
150+
| JSONOID (114) / JSONBOID (3802) | TAG_JSON | string copy |
151+
| NULL | TAG_NULL | value_null() |
152+
153+
---
154+
155+
## MultiRemoteExecutor
156+
157+
```cpp
158+
class MultiRemoteExecutor : public RemoteExecutor {
159+
public:
160+
MultiRemoteExecutor(Arena& arena);
161+
~MultiRemoteExecutor();
162+
163+
void add_backend(const BackendConfig& config);
164+
ResultSet execute(const char* backend_name, StringRef sql) override;
165+
DmlResult execute_dml(const char* backend_name, StringRef sql) override;
166+
void disconnect_all();
167+
168+
private:
169+
MySQLRemoteExecutor mysql_exec_;
170+
PgSQLRemoteExecutor pgsql_exec_;
171+
std::unordered_map<std::string, Dialect> backend_dialects_;
172+
};
173+
```
174+
175+
Routes each call to the correct protocol-specific executor based on the backend's registered dialect.
176+
177+
---
178+
179+
## Date/Time Parsing Utilities
180+
181+
Shared between both executors:
182+
183+
```cpp
184+
namespace sql_engine {
185+
namespace datetime_parse {
186+
int32_t parse_date(const char* s); // "YYYY-MM-DD" -> days since epoch
187+
int64_t parse_datetime(const char* s); // "YYYY-MM-DD HH:MM:SS[.uuuuuu]" -> microseconds
188+
int64_t parse_time(const char* s); // "HH:MM:SS[.uuuuuu]" -> microseconds
189+
int32_t days_since_epoch(int year, int month, int day); // calendar math
190+
} // namespace datetime_parse
191+
} // namespace sql_engine
192+
```
193+
194+
Simple, no timezone library dependency. Timezone handling deferred.
195+
196+
---
197+
198+
## Docker Test Infrastructure
199+
200+
### scripts/start_test_backends.sh
201+
202+
Starts MySQL 8 and PostgreSQL 16 containers on non-standard ports (13306, 15432) to avoid conflicts with local installations. Waits for readiness, loads test data.
203+
204+
### scripts/stop_test_backends.sh
205+
206+
Removes test containers.
207+
208+
### scripts/test_data_mysql.sql / test_data_pgsql.sql
209+
210+
Creates `users` table (id INT, name VARCHAR, age INT, dept VARCHAR) and `orders` table (id INT, user_id INT, total DECIMAL, status VARCHAR) with 5 users and 4 orders.
211+
212+
### Test skip logic
213+
214+
```cpp
215+
#define SKIP_IF_NO_MYSQL() if (!mysql_available()) { GTEST_SKIP() << "MySQL not available"; }
216+
#define SKIP_IF_NO_PGSQL() if (!pgsql_available()) { GTEST_SKIP() << "PostgreSQL not available"; }
217+
```
218+
219+
CI runs with Docker service containers. Local dev can skip integration tests gracefully.
220+
221+
---
222+
223+
## File Organization
224+
225+
```
226+
include/sql_engine/
227+
backend_config.h -- BackendConfig struct
228+
datetime_parse.h -- Date/time string parsing
229+
mysql_remote_executor.h -- MySQLRemoteExecutor
230+
pgsql_remote_executor.h -- PgSQLRemoteExecutor
231+
multi_remote_executor.h -- MultiRemoteExecutor
232+
233+
src/sql_engine/
234+
mysql_remote_executor.cpp -- MySQL implementation
235+
pgsql_remote_executor.cpp -- PostgreSQL implementation
236+
multi_remote_executor.cpp -- Routing implementation
237+
datetime_parse.cpp -- Date/time parsing
238+
239+
scripts/
240+
start_test_backends.sh -- Docker setup
241+
stop_test_backends.sh -- Docker teardown
242+
test_data_mysql.sql -- MySQL test schema + data
243+
test_data_pgsql.sql -- PostgreSQL test schema + data
244+
245+
tests/
246+
test_mysql_executor.cpp -- MySQL integration tests
247+
test_pgsql_executor.cpp -- PostgreSQL integration tests
248+
test_distributed_real.cpp -- Full distributed pipeline
249+
250+
.github/workflows/ci.yml -- Add integration test job with Docker services
251+
```
252+
253+
---
254+
255+
## Testing Strategy
256+
257+
### MySQL executor tests
258+
259+
SKIP_IF_NO_MYSQL for all tests.
260+
261+
- Connect to backend, verify connection
262+
- SELECT * FROM users -> 5 rows, correct column count
263+
- SELECT with WHERE -> filtered results
264+
- SELECT with ORDER BY + LIMIT -> correct order and count
265+
- Type conversion: INT, VARCHAR, DECIMAL, DATE, DATETIME, NULL
266+
- INSERT -> affected_rows = 1, SELECT back confirms
267+
- UPDATE -> affected_rows matches
268+
- DELETE -> affected_rows matches
269+
- Bad SQL -> error in DmlResult, no crash
270+
- Connection to wrong port -> error, no crash
271+
272+
### PostgreSQL executor tests
273+
274+
SKIP_IF_NO_PGSQL. Same structure plus:
275+
276+
- BOOLEAN -> TAG_BOOL
277+
- JSON -> TAG_JSON
278+
- TIMESTAMP WITH TIME ZONE -> TAG_TIMESTAMP
279+
280+
### Full distributed tests
281+
282+
SKIP_IF_NO_MYSQL and SKIP_IF_NO_PGSQL.
283+
284+
- Shard map with users split across 2 MySQL backends
285+
- Parse -> plan -> optimize -> distribute -> execute against real backends
286+
- Verify: SELECT * returns all rows from all shards
287+
- Verify: Aggregate merged correctly
288+
- Verify: Cross-backend JOIN correct
289+
- Verify: INSERT routed to correct shard
290+
- Verify: UPDATE scatter correct
291+
292+
### CI workflow
293+
294+
GitHub Actions with MySQL and PostgreSQL service containers. Integration tests run as a separate job after unit tests pass.
295+
296+
---
297+
298+
## Performance Targets
299+
300+
| Operation | Target |
301+
|---|---|
302+
| Connection establishment | <50ms (network) |
303+
| Simple SELECT (5 rows, LAN) | <1ms total |
304+
| Type conversion per row (10 cols) | <1us |
305+
| Date/time string parsing | <100ns per field |

0 commit comments

Comments
 (0)