Skip to content

Commit 8082675

Browse files
committed
Add design spec for catalog (sub-project 3)
1 parent bbec36a commit 8082675

1 file changed

Lines changed: 198 additions & 0 deletions

File tree

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# SQL Engine Catalog — Design Specification
2+
3+
## Overview
4+
5+
The catalog provides schema metadata (tables, columns, types) to the query engine. It is an abstract interface that embedders implement for their specific use case (in-memory, learned from traffic, loaded from config, etc.). A reference in-memory implementation is included.
6+
7+
This is sub-project 3 of the query engine. It depends on sub-project 1 (type system) and connects to sub-project 2 (expression evaluator) via a resolver utility.
8+
9+
### Goals
10+
11+
- **Abstract Catalog interface** — pure virtual, read-only, embedders implement it
12+
- **InMemoryCatalog reference implementation** — programmatic schema definition for testing and simple use cases
13+
- **Resolver utility** — bridges Catalog + row values to the expression evaluator's callback interface
14+
- **Minimal metadata** — tables, columns, types, nullable flag. No indexes, constraints, or views yet.
15+
16+
### Constraints
17+
18+
- C++17
19+
- Uses `SqlType` from the type system and `StringRef` from the parser
20+
- Virtual interface (not template) — embedders provide implementations at runtime
21+
- Thread-safe reads (const methods). Mutation is the implementation's concern.
22+
23+
### Non-Goals
24+
25+
- Index metadata (deferred to optimizer sub-project)
26+
- Views, constraints, foreign keys
27+
- Catalog persistence (loading/saving)
28+
- Traffic-learning catalog (ProxySQL-specific implementation, not part of the library)
29+
30+
---
31+
32+
## Data Structures
33+
34+
```cpp
35+
namespace sql_engine {
36+
37+
struct ColumnInfo {
38+
sql_parser::StringRef name;
39+
SqlType type;
40+
uint16_t ordinal; // 0-based position in table
41+
bool nullable;
42+
};
43+
44+
struct TableInfo {
45+
sql_parser::StringRef schema_name; // empty if default/no schema
46+
sql_parser::StringRef table_name;
47+
const ColumnInfo* columns;
48+
uint16_t column_count;
49+
};
50+
51+
// Convenience for building columns programmatically
52+
struct ColumnDef {
53+
const char* name;
54+
SqlType type;
55+
bool nullable = true;
56+
};
57+
58+
} // namespace sql_engine
59+
```
60+
61+
---
62+
63+
## Catalog Interface
64+
65+
```cpp
66+
class Catalog {
67+
public:
68+
virtual ~Catalog() = default;
69+
70+
// Find a table by unqualified name. Returns nullptr if not found.
71+
virtual const TableInfo* get_table(sql_parser::StringRef name) const = 0;
72+
73+
// Find a table by qualified name (schema.table). Returns nullptr if not found.
74+
virtual const TableInfo* get_table(sql_parser::StringRef schema,
75+
sql_parser::StringRef table) const = 0;
76+
77+
// Find a column in a table by name. Returns nullptr if not found.
78+
virtual const ColumnInfo* get_column(const TableInfo* table,
79+
sql_parser::StringRef column_name) const = 0;
80+
};
81+
```
82+
83+
**Design decisions:**
84+
- **Pure virtual** — the library does not own the schema. Embedders provide it.
85+
- **Raw const pointers** — catalog owns the data, callers get read-only views. No ownership transfer. Same pattern as `FunctionRegistry::lookup()`.
86+
- **Case-insensitive name matching**`get_table` and `get_column` should match case-insensitively (SQL identifiers are case-insensitive by default in both MySQL and PostgreSQL for unquoted names).
87+
88+
---
89+
90+
## InMemoryCatalog — Reference Implementation
91+
92+
```cpp
93+
class InMemoryCatalog : public Catalog {
94+
public:
95+
// Add a table with columns
96+
void add_table(const char* schema, const char* table,
97+
std::initializer_list<ColumnDef> columns);
98+
99+
// Remove a table
100+
void drop_table(const char* schema, const char* table);
101+
102+
// Catalog interface
103+
const TableInfo* get_table(sql_parser::StringRef name) const override;
104+
const TableInfo* get_table(sql_parser::StringRef schema,
105+
sql_parser::StringRef table) const override;
106+
const ColumnInfo* get_column(const TableInfo* table,
107+
sql_parser::StringRef column_name) const override;
108+
};
109+
```
110+
111+
**Internal storage:**
112+
- `std::unordered_map<std::string, TableData>` keyed by lowercase table name (for case-insensitive lookup)
113+
- For qualified names: key is `"schema.table"` (lowercase)
114+
- `TableData` owns the `TableInfo` and a `std::vector<ColumnInfo>` for the columns
115+
- Column strings are stored as `std::string` inside `TableData`, with `StringRef` pointing into them
116+
117+
**Column lookup:** Linear scan within a table (tables rarely have >100 columns). A hash map per table is unnecessary overhead.
118+
119+
---
120+
121+
## Catalog Resolver — Bridge to Expression Evaluator
122+
123+
```cpp
124+
// Create a column resolver callback from catalog + table + row values
125+
// Returns a std::function<Value(StringRef)> suitable for evaluate_expression()
126+
inline auto make_resolver(const Catalog& catalog,
127+
const TableInfo* table,
128+
const Value* row_values) {
129+
return [&catalog, table, row_values](sql_parser::StringRef col_name) -> Value {
130+
const ColumnInfo* col = catalog.get_column(table, col_name);
131+
if (!col) return value_null();
132+
return row_values[col->ordinal];
133+
};
134+
}
135+
```
136+
137+
This bridges:
138+
- **Catalog** (knows column names → ordinals)
139+
- **Row values** (array indexed by ordinal)
140+
- **Expression evaluator** (needs `StringRef → Value` callback)
141+
142+
No allocation, no virtual calls in the hot path — the resolver is a lambda that captures pointers.
143+
144+
---
145+
146+
## File Organization
147+
148+
```
149+
include/sql_engine/
150+
catalog.h — Catalog interface, TableInfo, ColumnInfo, ColumnDef
151+
in_memory_catalog.h — InMemoryCatalog implementation (header-only or with .cpp)
152+
catalog_resolver.h — make_resolver() utility
153+
154+
src/sql_engine/
155+
in_memory_catalog.cpp — InMemoryCatalog method implementations
156+
157+
tests/
158+
test_catalog.cpp — All catalog tests
159+
```
160+
161+
---
162+
163+
## Testing Strategy
164+
165+
### InMemoryCatalog tests
166+
- Add table, get by name → found
167+
- Add table with schema, get by qualified name → found
168+
- Get table not found → nullptr
169+
- Get column by name → correct ColumnInfo (type, ordinal, nullable)
170+
- Get column not found → nullptr
171+
- Drop table, get → nullptr
172+
- Multiple tables, correct isolation
173+
- Case-insensitive table/column lookup
174+
175+
### Resolver integration tests
176+
- Create catalog with "users" table (id INT, name VARCHAR, age INT)
177+
- Create row values: [42, "John", 30]
178+
- Use make_resolver to create callback
179+
- Call evaluate_expression on `age > 18` → true
180+
- Call evaluate_expression on `name` → "John"
181+
- Unknown column → NULL
182+
183+
### End-to-end test
184+
- Parse `SELECT name FROM users WHERE age > 18`
185+
- Navigate AST to WHERE clause expression
186+
- Evaluate with catalog resolver + row values
187+
- Verify result
188+
189+
---
190+
191+
## Performance Targets
192+
193+
| Operation | Target |
194+
|---|---|
195+
| get_table (hash lookup) | <50ns |
196+
| get_column (linear scan, ~10 columns) | <30ns |
197+
| make_resolver (lambda creation) | <5ns |
198+
| Resolver callback (ordinal lookup) | <20ns |

0 commit comments

Comments
 (0)