Skip to content

Commit 01fd6ec

Browse files
committed
docs: prior-session audit + contributor workflow + Makefile target fixes
Final batch of prior-session docs left in the working tree: - docs/current-state.md: implementation audit dated 2026-04-15. Snapshot of strengths, risks, missing pieces, and the recommendation that became the issue 03 / shared config parser work. Cross-referenced from architecture-and-status.md (which supersedes it as the entry point). - docs/superpowers/specs/2026-04-15-implementation-gap-backlog-design.md: the design that established the local docs/issues/ backlog (P0/P1/P2). - AGENTS.md: contributor workflow, naming, and validation expectations. - README.md: adds a Documentation Map section pointing at the four docs above + benchmarks + the historical specs/plans tree. Fixes the Makefile target names in the Tools table that had drifted (build-mysql-server -> mysql-server, build-engine-stress -> engine-stress, build-bench-distributed -> bench-distributed). - CLAUDE.md: same Makefile target name fixes. Also removes tmp_apply_patch_check.txt (a one-line "hello" stray file from an interrupted edit cycle). No code changes.
1 parent 9f090e5 commit 01fd6ec

5 files changed

Lines changed: 145 additions & 4 deletions

File tree

AGENTS.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Repository Guidelines
2+
3+
## Project Structure & Module Organization
4+
Core parser headers live in `include/sql_parser/` and parser implementations in `src/sql_parser/`. SQL engine, remote execution, and transaction interfaces live in `include/sql_engine/` with implementations in `src/sql_engine/`. Tests are in `tests/`, mostly as focused `test_<area>.cpp` files plus `corpus_test.cpp` for large parser corpora. Developer tools live in `tools/`, automation scripts in `scripts/`, benchmark reports in `docs/benchmarks/`, and vendored dependencies in `third_party/`.
5+
6+
## Build, Test, and Development Commands
7+
Use the `Makefile` as the source of truth:
8+
9+
- `make all` builds `libsqlparser.a` and runs the full GoogleTest suite.
10+
- `make test` rebuilds `run_tests` and executes all tests locally.
11+
- `make lib` builds just the static library.
12+
- `make build-sqlengine` builds the interactive CLI as `./sqlengine`.
13+
- `make build-corpus-test` builds `./corpus_test` for external SQL corpus validation.
14+
- `make bench` runs the benchmark binary; use it for parser or executor performance changes.
15+
- `make clean` removes generated objects and binaries.
16+
17+
## Coding Style & Naming Conventions
18+
This repository is C++17 with warnings enabled via `-Wall -Wextra`. Match the existing style: 4-space indentation, opening braces on the same line, and concise comments only where the code is not obvious. Use `PascalCase` for types, `snake_case` for functions and methods, `UPPER_SNAKE_CASE` for include guards and macros, and keep file names module-oriented such as `parser.cpp`, `distributed_txn.h`, and `test_select.cpp`. There is no repo-wide formatter config outside vendored code, so follow surrounding files closely.
19+
20+
## Testing Guidelines
21+
Tests use GoogleTest through `tests/test_main.cpp`. Add coverage in the nearest existing `test_<feature>.cpp`, or create a new file with that pattern if the area is new. Prefer small, focused `TEST` or `TEST_F` cases that mirror the production module name. Run `make test` before opening a PR; for grammar or dialect work, also run `make build-corpus-test`.
22+
23+
## Commit & Pull Request Guidelines
24+
Recent history uses short conventional prefixes such as `feat:`, `fix:`, `test:`, `docs:`, and `chore:`. Keep commit titles imperative and specific, for example `feat: add UTC normalization for PgSQL timestamps`. PRs should target `main`, explain parser/engine behavior changes, list the commands you ran, and link related issues. Include benchmark or corpus-test notes when performance or SQL coverage changes. Do not commit generated `.o` files, binaries, or benchmark artifacts.

CLAUDE.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,9 @@ make bench # Build + run benchmarks
2323
make bench-compare # Run comparison vs libpg_query (requires libpg_query built)
2424
make build-corpus-test # Build corpus test harness
2525
make build-sqlengine # Build interactive SQL engine CLI
26-
make build-mysql-server # Build MySQL wire-protocol server
26+
make mysql-server # Build MySQL wire-protocol server
27+
make engine-stress # Build direct-API stress harness
28+
make bench-distributed # Build distributed benchmark tool
2729
make clean # Remove all build artifacts
2830
```
2931

README.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,15 @@ auto report = recovery.recover();
211211
// report.recovered_commit, recovered_rollback, still_in_doubt, ...
212212
```
213213
214+
## Documentation Map
215+
216+
- `README.md` — product overview, quick start, build, tools, and test entry points.
217+
- [`docs/current-state.md`](docs/current-state.md) — current implementation audit, strengths, risks, missing docs, and recommended next step.
218+
- `CLAUDE.md` — maintainer/agent architecture notes with file-level extension guidance.
219+
- `AGENTS.md` — contributor workflow, naming, and validation expectations.
220+
- `docs/benchmarks/` — latest benchmark reports plus reproduction instructions.
221+
- `docs/superpowers/specs/` and `docs/superpowers/plans/` — historical design and planning artifacts; useful for rationale, but not the source of truth for current behavior.
222+
214223
## Architecture
215224
216225
```
@@ -360,10 +369,10 @@ auto report = recovery.recover();
360369
| Tool | Build | Purpose |
361370
|---|---|---|
362371
| `sqlengine` | `make build-sqlengine` | Interactive SQL CLI; stdin, one-shot, or REPL; optional backends and sharding |
363-
| `mysql_server` | `make build-mysql-server` | MySQL wire-protocol server fronted by the ParserSQL engine |
372+
| `mysql_server` | `make mysql-server` | MySQL wire-protocol server fronted by the ParserSQL engine |
364373
| `corpus_test` | `make build-corpus-test` | Read SQL from stdin/files, parse each, report OK/PARTIAL/ERROR |
365-
| `engine_stress_test` | `make build-engine-stress` | Direct-API engine stress test |
366-
| `bench_distributed` | `make build-bench-distributed` | Distributed query benchmark + pipeline breakdown |
374+
| `engine_stress_test` | `make engine-stress` | Direct-API engine stress test |
375+
| `bench_distributed` | `make bench-distributed` | Distributed query benchmark + pipeline breakdown |
367376
| `run_bench` | `make bench` | Google-Benchmark micro-benchmarks |
368377
| `run_tests` | `make test` | 1,160 Google-Test unit tests |
369378

docs/current-state.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Current State
2+
3+
## Documentation Inventory
4+
5+
The repository already has useful documentation, but it is spread across several audiences:
6+
7+
- `README.md` is the public overview and quick-start document.
8+
- `CLAUDE.md` is the most detailed architecture guide today; it is accurate in broad strokes, but it is written for coding agents and maintainers rather than new contributors.
9+
- `AGENTS.md` covers contributor workflow and repository conventions.
10+
- `docs/benchmarks/` contains benchmark outputs and reproduction notes.
11+
- `docs/superpowers/specs/` and `docs/superpowers/plans/` preserve design intent and implementation plans from earlier work.
12+
13+
## Implementation Snapshot
14+
15+
As of April 15, 2026, the codebase is a real four-layer system rather than just a parser prototype:
16+
17+
1. Parser in `include/sql_parser/` and `src/sql_parser/`
18+
2. Query engine in `include/sql_engine/`
19+
3. Distributed execution and remote backends in `include/sql_engine/` and `src/sql_engine/`
20+
4. Transaction management, including 2PC, durable WAL, and recovery
21+
22+
Operational entry points exist for interactive use and experiments: `sqlengine`, `mysql_server`, `bench_distributed`, `engine_stress_test`, and `corpus_test`.
23+
24+
Fresh verification on April 15, 2026:
25+
26+
- `./run_tests --gtest_brief=1`
27+
- Result: 1,197 tests ran, 1,160 passed, 37 skipped because live MySQL/PostgreSQL backends were not available locally
28+
29+
## Strengths
30+
31+
- Clear subsystem boundaries: parser, engine, distributed layer, and transactions are easy to identify from the directory layout.
32+
- Strong unit-test signal: 1,160 passing tests plus CI across Linux and macOS.
33+
- Useful performance discipline: benchmark tooling, published benchmark reports, and corpus validation are already part of the repository workflow.
34+
- Good internal architecture notes: `CLAUDE.md` gives maintainers practical file-level guidance for extending the system.
35+
36+
## Weaknesses and Risks
37+
38+
- Public docs had drifted from the `Makefile`; several tool build targets were named incorrectly until this update.
39+
- Documentation is fragmented. The most detailed design knowledge lives in `CLAUDE.md` and historical spec/plan files, not in one current contributor-facing document.
40+
- Several critical components are large, concentrated files or headers, especially `include/sql_engine/distributed_planner.h`, `include/sql_engine/plan_executor.h`, `src/sql_parser/parser.cpp`, and `tools/mysql_server.cpp`.
41+
- Backend URL parsing and related setup logic are duplicated across `tools/sqlengine.cpp`, `tools/mysql_server.cpp`, `tools/bench_distributed.cpp`, `tools/engine_stress_test.cpp`, and mirrored again in `tests/test_ssl_config.cpp`.
42+
- Some remote/distributed verification paths depend on live services, so local default test runs still skip meaningful backend coverage.
43+
44+
## What Is Missing
45+
46+
- A contributor-oriented local setup guide for running MySQL and PostgreSQL integration paths with the existing `scripts/`.
47+
- One authoritative architecture/status document before this file; maintainers had to reconstruct “current truth” from README, CLAUDE, code comments, and old plans.
48+
- A documented list of known limitations and non-goals for the parser, executor, and distributed transaction path.
49+
- A prioritized roadmap tying the current implementation to the next engineering milestone.
50+
51+
## Recommended Next Step
52+
53+
The highest-leverage next step is to consolidate backend/tool configuration into one shared module and document one supported local integration workflow around it.
54+
55+
Why this should go first:
56+
57+
- It removes copy-pasted parsing/setup logic from four tools and one test helper.
58+
- It reduces the chance that SSL, backend naming, or shard parsing diverges between entry points.
59+
- It creates a stable base for stronger end-to-end tests and clearer contributor setup docs.
60+
61+
Suggested scope for that next phase:
62+
63+
1. Extract backend URL and shard parsing into a shared utility under `include/sql_engine/` or `tools/`.
64+
2. Update `sqlengine`, `mysql_server`, `bench_distributed`, `engine_stress_test`, and `tests/test_ssl_config.cpp` to use the shared code.
65+
3. Add a short “local backend test workflow” doc that uses the existing `scripts/start_test_backends.sh` and related helpers.
66+
4. Add one smoke-level verification path that exercises a live backend with the shared configuration code.
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Implementation Gap Backlog Design
2+
3+
**Goal:** Create a local, detailed issue backlog for the known implementation gaps and start execution from the highest-priority correctness issue.
4+
5+
**Scope decision:** The implementation gaps are too broad to execute as one plan. They are decomposed into local issues in `docs/issues/`, with immediate execution limited to the first `P0` item.
6+
7+
## Backlog Structure
8+
9+
- Use `docs/issues/README.md` as the prioritized index
10+
- Use one Markdown file per issue for problem statement, evidence, scope, acceptance criteria, and verification
11+
- Keep the issue docs local-first so work can proceed without GitHub issue setup
12+
13+
## Priority
14+
15+
1. `P0`: distributed 2PC must require safe session pinning
16+
2. `P1`: deterministic 2PC phase timeouts
17+
3. `P1`: shared backend and shard config parsing
18+
4. `P1`: join execution coverage / early rejection alignment
19+
5. `P2`: expression and type semantic gaps
20+
6. `P2`: parser gaps around `SELECT ... INTO` and recursive CTE handling
21+
7. `P2`: CTE integration into the main `Session` path
22+
23+
CTE work is explicitly held at `P2` for now.
24+
25+
## First Execution Target
26+
27+
The first implementation target is distributed 2PC safety. The current code explicitly allows an unpinned fallback path even though the same code comments state that this can silently corrupt pooled real-backend 2PC behavior. That is the highest-risk correctness issue and should fail closed.
28+
29+
## Intended Change Shape
30+
31+
- Extend the remote executor contract so executors can declare whether unpinned distributed 2PC fallback is safe
32+
- Keep pinned-session executors working as-is
33+
- Keep single-connection executors and selected mocks usable by explicit opt-in, not implicit fallback
34+
- Update distributed transaction and session tests to match the hardened contract
35+
36+
## Non-Goals For This Pass
37+
38+
- No attempt to solve all backlog items in one change
39+
- No large transaction subsystem rewrite
40+
- No CTE redesign in this phase

0 commit comments

Comments
 (0)