|
| 1 | +# Backend Connections — Design Specification |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Replace the MockRemoteExecutor with real MySQL and PostgreSQL backend connections. The distributed query engine can now run queries against actual databases over the network. |
| 6 | + |
| 7 | +Sub-project 10. Depends on: distributed planner (sub-project 8), DML execution (sub-project 9). |
| 8 | + |
| 9 | +### Goals |
| 10 | + |
| 11 | +- **MySQLRemoteExecutor** — connect to MySQL backends via libmysqlclient, run queries, convert results to Row/Value |
| 12 | +- **PgSQLRemoteExecutor** — connect to PostgreSQL backends via libpq, run queries, convert results |
| 13 | +- **MultiRemoteExecutor** — unified executor that routes to MySQL or PostgreSQL by backend dialect |
| 14 | +- **Type conversion** — map MySQL/PostgreSQL wire types to our Value tags accurately |
| 15 | +- **Docker-based test infrastructure** — spin up real MySQL + PostgreSQL for integration tests |
| 16 | +- **Full end-to-end** — parse SQL, distribute, run against real backends, merged results |
| 17 | + |
| 18 | +### Constraints |
| 19 | + |
| 20 | +- C++17 |
| 21 | +- Link against libmysqlclient and libpq (system packages) |
| 22 | +- Single persistent connection per backend (no pool — prototype) |
| 23 | +- Sequential query execution (no parallelism — correct but not optimal) |
| 24 | +- Tests skip gracefully if databases not available |
| 25 | + |
| 26 | +### Non-Goals |
| 27 | + |
| 28 | +- Connection pooling (future — ProxySQL already has this) |
| 29 | +- Async/parallel query execution (future) |
| 30 | +- SSL/TLS connections (future) |
| 31 | +- Prepared statements over the wire (future) |
| 32 | +- Custom wire protocol implementation (use libraries) |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## BackendConfig |
| 37 | + |
| 38 | +```cpp |
| 39 | +struct BackendConfig { |
| 40 | + std::string name; // logical name: "shard_1", "analytics_db" |
| 41 | + std::string host; |
| 42 | + uint16_t port; |
| 43 | + std::string user; |
| 44 | + std::string password; |
| 45 | + std::string database; |
| 46 | + Dialect dialect; // MySQL or PostgreSQL |
| 47 | +}; |
| 48 | +``` |
| 49 | +
|
| 50 | +--- |
| 51 | +
|
| 52 | +## MySQLRemoteExecutor |
| 53 | +
|
| 54 | +```cpp |
| 55 | +class MySQLRemoteExecutor : public RemoteExecutor { |
| 56 | +public: |
| 57 | + MySQLRemoteExecutor(Arena& arena); |
| 58 | + ~MySQLRemoteExecutor(); |
| 59 | +
|
| 60 | + void add_backend(const BackendConfig& config); |
| 61 | + ResultSet execute(const char* backend_name, StringRef sql) override; |
| 62 | + DmlResult execute_dml(const char* backend_name, StringRef sql) override; |
| 63 | + void disconnect_all(); |
| 64 | +
|
| 65 | +private: |
| 66 | + struct Connection { |
| 67 | + BackendConfig config; |
| 68 | + MYSQL* conn = nullptr; |
| 69 | + bool connected = false; |
| 70 | + }; |
| 71 | +
|
| 72 | + std::unordered_map<std::string, Connection> backends_; |
| 73 | + Arena& arena_; |
| 74 | +
|
| 75 | + Connection& get_or_connect(const std::string& name); |
| 76 | + ResultSet mysql_result_to_resultset(MYSQL_RES* res); |
| 77 | + Value mysql_field_to_value(const char* data, unsigned long length, |
| 78 | + enum_field_types type, bool is_null); |
| 79 | +}; |
| 80 | +``` |
| 81 | + |
| 82 | +### Connection lifecycle |
| 83 | + |
| 84 | +- `add_backend()` stores the config. Does NOT connect yet (lazy). |
| 85 | +- `get_or_connect()` connects on first use via `mysql_real_connect()`. |
| 86 | +- If connection drops, attempts one reconnect before failing. |
| 87 | +- `disconnect_all()` closes all connections. Called in destructor. |
| 88 | + |
| 89 | +### MySQL type to Value conversion |
| 90 | + |
| 91 | +All values arrive as strings in text protocol mode. Conversion by field type: |
| 92 | + |
| 93 | +| MySQL C API type | Value tag | Conversion | |
| 94 | +|---|---|---| |
| 95 | +| MYSQL_TYPE_TINY/SHORT/LONG/LONGLONG | TAG_INT64 | strtoll(data) | |
| 96 | +| MYSQL_TYPE_FLOAT/DOUBLE | TAG_DOUBLE | strtod(data) | |
| 97 | +| MYSQL_TYPE_DECIMAL/NEWDECIMAL | TAG_STRING | arena string copy | |
| 98 | +| MYSQL_TYPE_STRING/VAR_STRING | TAG_STRING | arena string copy | |
| 99 | +| MYSQL_TYPE_BLOB | TAG_BYTES | arena copy | |
| 100 | +| MYSQL_TYPE_DATE | TAG_DATE | parse "YYYY-MM-DD" to days since epoch | |
| 101 | +| MYSQL_TYPE_DATETIME | TAG_DATETIME | parse "YYYY-MM-DD HH:MM:SS" to microseconds | |
| 102 | +| MYSQL_TYPE_TIMESTAMP | TAG_TIMESTAMP | parse "YYYY-MM-DD HH:MM:SS" to microseconds | |
| 103 | +| MYSQL_TYPE_TIME | TAG_TIME | parse "HH:MM:SS" to microseconds | |
| 104 | +| NULL | TAG_NULL | value_null() | |
| 105 | + |
| 106 | +--- |
| 107 | + |
| 108 | +## PgSQLRemoteExecutor |
| 109 | + |
| 110 | +```cpp |
| 111 | +class PgSQLRemoteExecutor : public RemoteExecutor { |
| 112 | +public: |
| 113 | + PgSQLRemoteExecutor(Arena& arena); |
| 114 | + ~PgSQLRemoteExecutor(); |
| 115 | + |
| 116 | + void add_backend(const BackendConfig& config); |
| 117 | + ResultSet execute(const char* backend_name, StringRef sql) override; |
| 118 | + DmlResult execute_dml(const char* backend_name, StringRef sql) override; |
| 119 | + void disconnect_all(); |
| 120 | + |
| 121 | +private: |
| 122 | + struct Connection { |
| 123 | + BackendConfig config; |
| 124 | + PGconn* conn = nullptr; |
| 125 | + bool connected = false; |
| 126 | + }; |
| 127 | + |
| 128 | + std::unordered_map<std::string, Connection> backends_; |
| 129 | + Arena& arena_; |
| 130 | + |
| 131 | + Connection& get_or_connect(const std::string& name); |
| 132 | + ResultSet pg_result_to_resultset(PGresult* res); |
| 133 | + Value pg_field_to_value(const char* data, int length, Oid type, bool is_null); |
| 134 | +}; |
| 135 | +``` |
| 136 | +
|
| 137 | +### PostgreSQL type OID to Value conversion |
| 138 | +
|
| 139 | +| PgSQL OID | Value tag | Conversion | |
| 140 | +|---|---|---| |
| 141 | +| INT2OID (21) / INT4OID (23) / INT8OID (20) | TAG_INT64 | strtoll | |
| 142 | +| FLOAT4OID (700) / FLOAT8OID (701) | TAG_DOUBLE | strtod | |
| 143 | +| NUMERICOID (1700) | TAG_STRING | string copy | |
| 144 | +| TEXTOID (25) / VARCHAROID (1043) / BPCHAROID (1042) | TAG_STRING | arena copy | |
| 145 | +| BOOLOID (16) | TAG_BOOL | "t" = true, "f" = false | |
| 146 | +| DATEOID (1082) | TAG_DATE | parse "YYYY-MM-DD" | |
| 147 | +| TIMESTAMPOID (1114) | TAG_DATETIME | parse datetime string | |
| 148 | +| TIMESTAMPTZOID (1184) | TAG_TIMESTAMP | parse with timezone | |
| 149 | +| TIMEOID (1083) | TAG_TIME | parse "HH:MM:SS" | |
| 150 | +| JSONOID (114) / JSONBOID (3802) | TAG_JSON | string copy | |
| 151 | +| NULL | TAG_NULL | value_null() | |
| 152 | +
|
| 153 | +--- |
| 154 | +
|
| 155 | +## MultiRemoteExecutor |
| 156 | +
|
| 157 | +```cpp |
| 158 | +class MultiRemoteExecutor : public RemoteExecutor { |
| 159 | +public: |
| 160 | + MultiRemoteExecutor(Arena& arena); |
| 161 | + ~MultiRemoteExecutor(); |
| 162 | +
|
| 163 | + void add_backend(const BackendConfig& config); |
| 164 | + ResultSet execute(const char* backend_name, StringRef sql) override; |
| 165 | + DmlResult execute_dml(const char* backend_name, StringRef sql) override; |
| 166 | + void disconnect_all(); |
| 167 | +
|
| 168 | +private: |
| 169 | + MySQLRemoteExecutor mysql_exec_; |
| 170 | + PgSQLRemoteExecutor pgsql_exec_; |
| 171 | + std::unordered_map<std::string, Dialect> backend_dialects_; |
| 172 | +}; |
| 173 | +``` |
| 174 | + |
| 175 | +Routes each call to the correct protocol-specific executor based on the backend's registered dialect. |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +## Date/Time Parsing Utilities |
| 180 | + |
| 181 | +Shared between both executors: |
| 182 | + |
| 183 | +```cpp |
| 184 | +namespace sql_engine { |
| 185 | +namespace datetime_parse { |
| 186 | + int32_t parse_date(const char* s); // "YYYY-MM-DD" -> days since epoch |
| 187 | + int64_t parse_datetime(const char* s); // "YYYY-MM-DD HH:MM:SS[.uuuuuu]" -> microseconds |
| 188 | + int64_t parse_time(const char* s); // "HH:MM:SS[.uuuuuu]" -> microseconds |
| 189 | + int32_t days_since_epoch(int year, int month, int day); // calendar math |
| 190 | +} // namespace datetime_parse |
| 191 | +} // namespace sql_engine |
| 192 | +``` |
| 193 | +
|
| 194 | +Simple, no timezone library dependency. Timezone handling deferred. |
| 195 | +
|
| 196 | +--- |
| 197 | +
|
| 198 | +## Docker Test Infrastructure |
| 199 | +
|
| 200 | +### scripts/start_test_backends.sh |
| 201 | +
|
| 202 | +Starts MySQL 8 and PostgreSQL 16 containers on non-standard ports (13306, 15432) to avoid conflicts with local installations. Waits for readiness, loads test data. |
| 203 | +
|
| 204 | +### scripts/stop_test_backends.sh |
| 205 | +
|
| 206 | +Removes test containers. |
| 207 | +
|
| 208 | +### scripts/test_data_mysql.sql / test_data_pgsql.sql |
| 209 | +
|
| 210 | +Creates `users` table (id INT, name VARCHAR, age INT, dept VARCHAR) and `orders` table (id INT, user_id INT, total DECIMAL, status VARCHAR) with 5 users and 4 orders. |
| 211 | +
|
| 212 | +### Test skip logic |
| 213 | +
|
| 214 | +```cpp |
| 215 | +#define SKIP_IF_NO_MYSQL() if (!mysql_available()) { GTEST_SKIP() << "MySQL not available"; } |
| 216 | +#define SKIP_IF_NO_PGSQL() if (!pgsql_available()) { GTEST_SKIP() << "PostgreSQL not available"; } |
| 217 | +``` |
| 218 | + |
| 219 | +CI runs with Docker service containers. Local dev can skip integration tests gracefully. |
| 220 | + |
| 221 | +--- |
| 222 | + |
| 223 | +## File Organization |
| 224 | + |
| 225 | +``` |
| 226 | +include/sql_engine/ |
| 227 | + backend_config.h -- BackendConfig struct |
| 228 | + datetime_parse.h -- Date/time string parsing |
| 229 | + mysql_remote_executor.h -- MySQLRemoteExecutor |
| 230 | + pgsql_remote_executor.h -- PgSQLRemoteExecutor |
| 231 | + multi_remote_executor.h -- MultiRemoteExecutor |
| 232 | +
|
| 233 | +src/sql_engine/ |
| 234 | + mysql_remote_executor.cpp -- MySQL implementation |
| 235 | + pgsql_remote_executor.cpp -- PostgreSQL implementation |
| 236 | + multi_remote_executor.cpp -- Routing implementation |
| 237 | + datetime_parse.cpp -- Date/time parsing |
| 238 | +
|
| 239 | +scripts/ |
| 240 | + start_test_backends.sh -- Docker setup |
| 241 | + stop_test_backends.sh -- Docker teardown |
| 242 | + test_data_mysql.sql -- MySQL test schema + data |
| 243 | + test_data_pgsql.sql -- PostgreSQL test schema + data |
| 244 | +
|
| 245 | +tests/ |
| 246 | + test_mysql_executor.cpp -- MySQL integration tests |
| 247 | + test_pgsql_executor.cpp -- PostgreSQL integration tests |
| 248 | + test_distributed_real.cpp -- Full distributed pipeline |
| 249 | +
|
| 250 | +.github/workflows/ci.yml -- Add integration test job with Docker services |
| 251 | +``` |
| 252 | + |
| 253 | +--- |
| 254 | + |
| 255 | +## Testing Strategy |
| 256 | + |
| 257 | +### MySQL executor tests |
| 258 | + |
| 259 | +SKIP_IF_NO_MYSQL for all tests. |
| 260 | + |
| 261 | +- Connect to backend, verify connection |
| 262 | +- SELECT * FROM users -> 5 rows, correct column count |
| 263 | +- SELECT with WHERE -> filtered results |
| 264 | +- SELECT with ORDER BY + LIMIT -> correct order and count |
| 265 | +- Type conversion: INT, VARCHAR, DECIMAL, DATE, DATETIME, NULL |
| 266 | +- INSERT -> affected_rows = 1, SELECT back confirms |
| 267 | +- UPDATE -> affected_rows matches |
| 268 | +- DELETE -> affected_rows matches |
| 269 | +- Bad SQL -> error in DmlResult, no crash |
| 270 | +- Connection to wrong port -> error, no crash |
| 271 | + |
| 272 | +### PostgreSQL executor tests |
| 273 | + |
| 274 | +SKIP_IF_NO_PGSQL. Same structure plus: |
| 275 | + |
| 276 | +- BOOLEAN -> TAG_BOOL |
| 277 | +- JSON -> TAG_JSON |
| 278 | +- TIMESTAMP WITH TIME ZONE -> TAG_TIMESTAMP |
| 279 | + |
| 280 | +### Full distributed tests |
| 281 | + |
| 282 | +SKIP_IF_NO_MYSQL and SKIP_IF_NO_PGSQL. |
| 283 | + |
| 284 | +- Shard map with users split across 2 MySQL backends |
| 285 | +- Parse -> plan -> optimize -> distribute -> execute against real backends |
| 286 | +- Verify: SELECT * returns all rows from all shards |
| 287 | +- Verify: Aggregate merged correctly |
| 288 | +- Verify: Cross-backend JOIN correct |
| 289 | +- Verify: INSERT routed to correct shard |
| 290 | +- Verify: UPDATE scatter correct |
| 291 | + |
| 292 | +### CI workflow |
| 293 | + |
| 294 | +GitHub Actions with MySQL and PostgreSQL service containers. Integration tests run as a separate job after unit tests pass. |
| 295 | + |
| 296 | +--- |
| 297 | + |
| 298 | +## Performance Targets |
| 299 | + |
| 300 | +| Operation | Target | |
| 301 | +|---|---| |
| 302 | +| Connection establishment | <50ms (network) | |
| 303 | +| Simple SELECT (5 rows, LAN) | <1ms total | |
| 304 | +| Type conversion per row (10 cols) | <1us | |
| 305 | +| Date/time string parsing | <100ns per field | |
0 commit comments