Skip to content

Commit f0f2915

Browse files
committed
fix(distributed): routing strategies for sharded tables (issues 03, 09)
Resolves issue 09 (P0) and lands its prerequisite (issue 03, P1) in one cohesive change because the strategy syntax extension naturally lives in the shared tool config parser introduced by issue 03. Issue 09 (P0) — single-shard route by shard key returned wrong (empty) results for some integer keys. Root cause: ShardMap used std::hash<int64_t> which on libstdc++ is the identity function, so route(K) collapsed to K mod num_shards. The two-shard demo loaded data by range (1-5 -> shard1, 6-10 -> shard2), which is misaligned with hash routing for 6 of 10 keys. Wrong-results-without-error path. Fixes: - Replace std::hash<int64_t> with FNV-1a 64-bit. Deterministic across compilers, regression-guarded by ShardMapHashTest.NotIdentityFunction. - Add RoutingStrategy enum to TableShardConfig: HASH (default, back-compat), RANGE (sorted upper-inclusive entries; first match; values above max route to last entry's shard, matching PARTITION BY RANGE LESS THAN MAXVALUE), LIST (explicit value -> shard_index; supports int and string keys; misses fall back to shard 0). The enum is the deliberate seam where a future PROXYSQL_RULES strategy will plug in to delegate to ProxySQL's mysql_query_rules table when this engine is embedded inside ProxySQL. - Extend parse_shard_spec to accept an optional strategy qualifier: table:key:backend1,backend2 (HASH, implicit) table:key:hash:backend1,backend2 (HASH, explicit) table:key:range:5=shard1,10=shard2 (RANGE) table:key:list:1=a,2=a,6=b (LIST, int keys) table:key:list:us-east=a,us-west=b (LIST, string keys) Repeated backend names across entries are interned once. - Update scripts/run_sharding_demo.sh and scripts/test_sqlengine.sh to declare range routing matching the demo's data placement. The two previously failing assertions in test_sqlengine.sh sharded ('route to shard1 (Eve)', 'route returns one row') pass without further change. Suite now reports 16/16 sharded, 32/34 overall (the two remaining failures are issue 08, separate). Issue 03 (P1) — extract shared backend and shard configuration parsing. tool_config_parser.{h,cpp} centralises parse_backend_url and parse_shard_spec; sqlengine, mysql_server, bench_distributed and engine_stress_test all delegate to it instead of duplicating the logic. ~135 lines removed per tool. Tests: - New tests/test_shard_map.cpp — 14 cases covering HASH determinism, FNV-1a divergence from identity (regression guard), RANGE matching demo placement, RANGE above-max fallback, RANGE accepts unsorted input and sorts internally, RANGE rejects spurious string lookups, LIST int + string keys, LIST miss fallback, default-strategy is HASH, out-of-range shard_index clamping, unknown table. - 11 new cases in tests/test_ssl_config.cpp covering the new --shard syntax (HASH default, explicit hash/range/list, malformed entries, non-int range bound, empty body rejection, backend interning). - scripts/test_sqlengine.sh sharded: 16/16 with range routing. Docs: - docs/issues/09 marked Resolved with Resolution Notes section. - docs/issues/README.md status line updated. - docs/architecture-and-status.md adds a Routing Strategies subsection to section 6 and renumbers connection-pool to 6.4. - docs/sqlengine.md documents the strategy qualifier and updates the scatter/gather recipe to use range routing.
1 parent 1250196 commit f0f2915

16 files changed

Lines changed: 859 additions & 720 deletions

Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,8 @@ TEST_SRCS = $(TEST_DIR)/test_main.cpp \
8585
$(TEST_DIR)/test_datetime_funcs.cpp \
8686
$(TEST_DIR)/test_result_set.cpp \
8787
$(TEST_DIR)/test_ssl_config.cpp \
88-
$(TEST_DIR)/test_star_modifiers.cpp
88+
$(TEST_DIR)/test_star_modifiers.cpp \
89+
$(TEST_DIR)/test_shard_map.cpp
8990
TEST_OBJS = $(TEST_SRCS:.cpp=.o)
9091
TEST_TARGET = $(PROJECT_ROOT)/run_tests
9192

docs/architecture-and-status.md

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ Legend: ✅ done · 🟡 partial · 🟦 in working tree (uncommitted) · 📋 p
209209

210210
| Feature | Status | Where |
211211
| -------------------------------------------------------------- | ------ | ---------------------------------------------------------------------------- |
212-
| `ShardMap` — unsharded + hash-sharded tables (int + string keys) || `shard_map.h` |
212+
| `ShardMap` — unsharded tables + three routing strategies (HASH / RANGE / LIST) for sharded tables || `shard_map.h` (FNV-1a hash; RANGE int-keyed; LIST int + string) |
213213
| `RemoteExecutor` abstract interface || `remote_executor.h` |
214214
| `MySQLRemoteExecutor` (single connection per backend) || `mysql_remote_executor.h` |
215215
| `PgSQLRemoteExecutor` (single connection per backend) || `pgsql_remote_executor.h` |
@@ -428,19 +428,31 @@ flowchart LR
428428
CPN --> BN[(MySQL/Pg backend N)]
429429
```
430430

431-
### 6.2 Distribution patterns
431+
### 6.2 Routing strategies
432+
433+
`TableShardConfig::strategy` selects how a value of the shard key maps to a shard index.
434+
435+
| Strategy | Resolution | Key types | Out-of-range / miss |
436+
| -------- | -------------------------------------------------------------------------------- | --------- | -------------------------------- |
437+
| `HASH` | FNV-1a 64-bit of the key bytes, modulo `num_shards` | int + str | n/a (always defined) |
438+
| `RANGE` | Sorted `(upper_inclusive, shard_index)` entries; first match wins | int only | last entry's shard (MAXVALUE) |
439+
| `LIST` | Explicit `value -> shard_index` map | int + str | shard 0 |
440+
441+
The `--shard` flag accepts the strategy as an optional qualifier (back-compat: `table:key:s1,s2` defaults to `HASH`). The `RoutingStrategy` enum is the seam where a future `PROXYSQL_RULES` strategy will plug in to delegate to ProxySQL's `mysql_query_rules`.
442+
443+
### 6.3 Distribution patterns
432444

433445
| SQL shape | Rewrite |
434446
| ---------------------------------------------- | ----------------------------------------------------------------------- |
435447
| `SELECT … FROM unsharded` | Single REMOTE_SCAN to the pinned backend. |
436-
| `SELECT … FROM sharded WHERE shard_key = K` | Routed REMOTE_SCAN to the one shard. |
448+
| `SELECT … FROM sharded WHERE shard_key = K` | Routed REMOTE_SCAN to the one shard chosen by the configured strategy. |
437449
| `SELECT … FROM sharded` (no key predicate) | Scatter REMOTE_SCAN to every shard, gather locally. |
438450
| `SELECT col, AGG(x) … GROUP BY col` | Two-phase: per-shard partial aggregate + local MERGE_AGGREGATE. |
439451
| `SELECT … ORDER BY k LIMIT n` | Per-shard ORDER BY + LIMIT n; local MERGE_SORT + LIMIT. |
440452
| `SELECT … FROM sharded JOIN sharded ON …` | Materialize one side, hash-join locally (HASH_JOIN + REMOTE_SCAN inputs). |
441453
| `INSERT/UPDATE/DELETE on sharded` | Distributed DML: one REMOTE_SCAN per affected shard, in DML mode. |
442454

443-
### 6.3 Connection pool + pinning
455+
### 6.4 Connection pool + pinning
444456

445457
```mermaid
446458
flowchart TB
@@ -572,8 +584,8 @@ Source: `docs/issues/README.md`. Status reflects the working tree on 2026-04-17.
572584

573585
1. **[01] Distributed 2PC must require safe session pinning** — 🟦 in working tree.
574586
*Add `allows_unpinned_distributed_2pc()` flag. Default false. Pooled executors without pinning cannot enlist.*
575-
9. **[09] Single-shard route by shard key misdirects for some keys**📋 open. Surfaced 2026-04-18 by `scripts/test_sqlengine.sh sharded`.
576-
*`WHERE id = 5` against `users:id:shard1,shard2` returns empty rows; `id = 7` works. Wrong-results-without-error path. Either the router and the demo's data placement disagree, or `DistributedPlanner` ignores the shard key.*
587+
9. **[09] Single-shard route by shard key misdirects for some keys**✅ resolved 2026-04-18.
588+
*Root cause: `std::hash<int64_t>` on libstdc++ is identity, so route `= K mod num_shards`; demo data placement was range-based. Fix: replace with FNV-1a 64-bit hash and add a `RoutingStrategy` enum to `TableShardConfig` (HASH default, RANGE, LIST). The new `--shard "table:key:range:5=s1,10=s2"` syntax matches the demo's range placement. Engine is now ProxySQL-shaped — a future `PROXYSQL_RULES` strategy can join the same enum.*
577589

578590
### P1 — important correctness & maintainability
579591

docs/issues/09-shard-key-route-misdirection.md

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
## Status
88

9-
Open. Surfaced by `scripts/test_sqlengine.sh sharded` on 2026-04-18.
9+
Resolved 2026-04-18.
1010

1111
## Problem
1212

@@ -98,6 +98,47 @@ The "route to the wrong shard and return empty" mode must not exist.
9898
docker rm -f parsersql-shard1 parsersql-shard2 2>/dev/null
9999
./scripts/start_sharding_demo.sh
100100
make build-sqlengine
101-
./scripts/test_sqlengine.sh sharded # expect 16/16
101+
./scripts/test_sqlengine.sh sharded # 16/16
102102
make test # full unit suite
103103
```
104+
105+
## Resolution Notes
106+
107+
Both root causes addressed.
108+
109+
**Hash function.** Replaced `std::hash<int64_t>` (libstdc++ identity → `K mod num_shards`) with FNV-1a 64-bit applied to all 8 bytes of the key. Deterministic across compilers, doesn't have the small-key collision pattern, regression-guarded by `ShardMapHashTest.NotIdentityFunction`.
110+
111+
**Routing strategy.** Added `RoutingStrategy` enum to `TableShardConfig` with three values:
112+
113+
- `HASH` (default) — FNV-1a + modulo. Backward compatible; the existing `--shard "table:key:b1,b2"` syntax keeps producing this.
114+
- `RANGE` — ordered `(upper_inclusive, shard_index)` entries; first match wins; values exceeding the largest bound route to the last entry's shard (matches the MySQL `PARTITION BY RANGE LESS THAN MAXVALUE` idiom). Integer-keyed only today.
115+
- `LIST` — explicit `value -> shard_index` map. Supports both int and string keys. Misses fall back to shard 0.
116+
117+
**Demo aligned with routing.** `scripts/start_sharding_demo.sh` loads users 1-5 to shard1 and 6-10 to shard2; `scripts/run_sharding_demo.sh` and `scripts/test_sqlengine.sh` now declare `--shard "users:id:range:5=shard1,10=shard2"` (and the analogous `orders` line) so the routing matches the data placement. The two previously failing route assertions in `scripts/test_sqlengine.sh sharded` (`route to shard1 (Eve)`, `route returns one row`) pass without further change.
118+
119+
**ProxySQL alignment.** The strategy enum is the deliberate seam for a future `PROXYSQL_RULES` value that delegates to ProxySQL's `mysql_query_rules` table. The existing planner code paths in `distributed_planner.h` (`literal_to_shard_index`) call through `ShardMap` unchanged, so adding a new strategy is a one-place addition rather than a cross-cutting refactor.
120+
121+
**Shard-spec syntax extension.** `parse_shard_spec` in `src/sql_engine/tool_config_parser.cpp` accepts an optional strategy qualifier:
122+
123+
```
124+
table:key:backend1,backend2 # HASH (implicit, back-compat)
125+
table:key:hash:backend1,backend2 # HASH (explicit)
126+
table:key:range:5=shard1,10=shard2 # RANGE
127+
table:key:list:1=shard1,6=shard2 # LIST (int keys)
128+
table:key:list:us-east=a,us-west=b # LIST (string keys)
129+
```
130+
131+
A repeated backend in `range:` or `list:` is interned once into `cfg.shards`.
132+
133+
## Test Coverage
134+
135+
- `tests/test_shard_map.cpp` — 14 cases covering HASH determinism, FNV-1a divergence from identity (regression guard), RANGE matching demo placement, RANGE above-max fallback, RANGE accepts unsorted input and sorts internally, RANGE rejects spurious string lookups, LIST int + string keys, LIST miss fallback, default-strategy is HASH, out-of-range shard_index clamping, unknown table.
136+
- `tests/test_ssl_config.cpp` — 11 added cases for the new `--shard` syntax (HASH default, explicit hash/range/list, malformed entries, non-int range bound, empty body rejection, backend interning).
137+
- `scripts/test_sqlengine.sh sharded` — 16/16 with the new range routing.
138+
139+
## Future Work
140+
141+
- Add a `PROXYSQL_RULES` strategy that consults `mysql_query_rules`-shaped data when the engine is embedded inside ProxySQL.
142+
- Allow RANGE on string keys (lexicographic upper bounds) if a real use case arises.
143+
- Allow multiple key columns per table for composite sharding.
144+
- Allow a "shard not found" sentinel so unrouteable values can be made an error rather than silently routed.

docs/issues/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This directory is the local working backlog for implementation gaps identified f
77
### P0
88

99
1. [Distributed 2PC must require safe session pinning](01-distributed-2pc-safe-session-pinning.md) — implemented in current working tree
10-
9. [Single-shard route by shard key misdirects for some keys](09-shard-key-route-misdirection.md)open; surfaced 2026-04-18 by `scripts/test_sqlengine.sh sharded`
10+
9. [Single-shard route by shard key misdirects for some keys](09-shard-key-route-misdirection.md)resolved 2026-04-18 (FNV-1a hash + RoutingStrategy enum: HASH/RANGE/LIST)
1111

1212
### P1
1313

docs/sqlengine.md

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -55,16 +55,24 @@ pgsql://app:secret@db1:5432/orders?name=primary&ssl_mode=REQUIRED&ssl_ca=/etc/ss
5555
### Shard spec syntax
5656

5757
```text
58-
TABLE:SHARD_KEY:BACKEND1,BACKEND2,...
58+
TABLE:SHARD_KEY:[STRATEGY:]ROUTING_BODY
5959
```
6060

61-
Backend names refer to the `name=` value from the backend URLs. A table with one backend is unsharded but pinned. Two or more backends turns on hash-based sharding by `SHARD_KEY`.
61+
`STRATEGY` is optional and selects a `RoutingStrategy` from `ShardMap`. Backend names refer to the `name=` value from the backend URLs. The strategy determines what `ROUTING_BODY` looks like:
6262

63-
Example:
63+
| Strategy form | Body | Example |
64+
| ------------- | ----------------------------------------------------- | -------------------------------------------------------- |
65+
| (omitted) | comma-separated backends — defaults to `HASH` | `users:id:shard1,shard2,shard3` |
66+
| `hash:` | comma-separated backends | `users:id:hash:shard1,shard2` |
67+
| `range:` | `upper=backend,upper=backend,...` (sorted internally) | `users:id:range:5=shard1,10=shard2,MAXVALUE on last` |
68+
| `list:` | `value=backend,value=backend,...` (int or string) | `users:id:list:1=shard1,6=shard2` or `users:region:list:us-east=a,us-west=b` |
6469

65-
```text
66-
--shard "users:id:shard1,shard2,shard3"
67-
```
70+
Notes:
71+
72+
- `hash` (default) uses **FNV-1a 64-bit** modulo `num_shards`. Deterministic across compilers.
73+
- `range` is integer-keyed only. Values above the largest `upper` route to the last entry's backend (MySQL `LESS THAN MAXVALUE` idiom).
74+
- `list` accepts both int and string keys. A miss falls back to shard 0.
75+
- Repeated backends across `range:` or `list:` entries are interned once, so the order of distinct backends in the resulting shard list is the order of first appearance.
6876

6977
### Multiple flags
7078

@@ -184,13 +192,13 @@ Useful smoke test that the executor and connection pool can talk to a real backe
184192

185193
### 5.4 Sharded query with scatter/gather
186194

187-
Two backends, one sharded table:
195+
Two backends, one sharded table. Pick the routing strategy that matches how the data was loaded — for the demo set (1-5 on shard1, 6-10 on shard2) that's `range`, not `hash`:
188196

189197
```bash
190198
./sqlengine \
191199
--backend "mysql://root:test@127.0.0.1:13306/testdb?name=shard1" \
192200
--backend "mysql://root:test@127.0.0.1:13307/testdb?name=shard2" \
193-
--shard "users:id:shard1,shard2"
201+
--shard "users:id:range:5=shard1,10=shard2"
194202
```
195203

196204
```text

0 commit comments

Comments
 (0)