The repository already has useful documentation, but it is spread across several audiences:
README.mdis the public overview and quick-start document.CLAUDE.mdis the most detailed architecture guide today; it is accurate in broad strokes, but it is written for coding agents and maintainers rather than new contributors.AGENTS.mdcovers contributor workflow and repository conventions.docs/benchmarks/contains benchmark outputs and reproduction notes.docs/superpowers/specs/anddocs/superpowers/plans/preserve design intent and implementation plans from earlier work.
As of April 15, 2026, the codebase is a real four-layer system rather than just a parser prototype:
- Parser in
include/sql_parser/andsrc/sql_parser/ - Query engine in
include/sql_engine/ - Distributed execution and remote backends in
include/sql_engine/andsrc/sql_engine/ - Transaction management, including 2PC, durable WAL, and recovery
Operational entry points exist for interactive use and experiments: sqlengine, mysql_server, bench_distributed, engine_stress_test, and corpus_test.
Fresh verification on April 15, 2026:
./run_tests --gtest_brief=1- Result: 1,197 tests ran, 1,160 passed, 37 skipped because live MySQL/PostgreSQL backends were not available locally
- Clear subsystem boundaries: parser, engine, distributed layer, and transactions are easy to identify from the directory layout.
- Strong unit-test signal: 1,160 passing tests plus CI across Linux and macOS.
- Useful performance discipline: benchmark tooling, published benchmark reports, and corpus validation are already part of the repository workflow.
- Good internal architecture notes:
CLAUDE.mdgives maintainers practical file-level guidance for extending the system.
- Public docs had drifted from the
Makefile; several tool build targets were named incorrectly until this update. - Documentation is fragmented. The most detailed design knowledge lives in
CLAUDE.mdand historical spec/plan files, not in one current contributor-facing document. - Several critical components are large, concentrated files or headers, especially
include/sql_engine/distributed_planner.h,include/sql_engine/plan_executor.h,src/sql_parser/parser.cpp, andtools/mysql_server.cpp. - Backend URL parsing and related setup logic are duplicated across
tools/sqlengine.cpp,tools/mysql_server.cpp,tools/bench_distributed.cpp,tools/engine_stress_test.cpp, and mirrored again intests/test_ssl_config.cpp. - Some remote/distributed verification paths depend on live services, so local default test runs still skip meaningful backend coverage.
- A contributor-oriented local setup guide for running MySQL and PostgreSQL integration paths with the existing
scripts/. - One authoritative architecture/status document before this file; maintainers had to reconstruct “current truth” from README, CLAUDE, code comments, and old plans.
- A documented list of known limitations and non-goals for the parser, executor, and distributed transaction path.
- A prioritized roadmap tying the current implementation to the next engineering milestone.
The highest-leverage next step is to consolidate backend/tool configuration into one shared module and document one supported local integration workflow around it.
Why this should go first:
- It removes copy-pasted parsing/setup logic from four tools and one test helper.
- It reduces the chance that SSL, backend naming, or shard parsing diverges between entry points.
- It creates a stable base for stronger end-to-end tests and clearer contributor setup docs.
Suggested scope for that next phase:
- Extract backend URL and shard parsing into a shared utility under
include/sql_engine/ortools/. - Update
sqlengine,mysql_server,bench_distributed,engine_stress_test, andtests/test_ssl_config.cppto use the shared code. - Add a short “local backend test workflow” doc that uses the existing
scripts/start_test_backends.shand related helpers. - Add one smoke-level verification path that exercises a live backend with the shared configuration code.