Skip to content

feat(cluster): storage root — ledger, catalog, and graphs on the StorageAdapter (RFC-006 PR 2/3)#190

Merged
aaltshuler merged 3 commits into
mainfrom
feat/cluster-storage-root-v2
Jun 11, 2026
Merged

feat(cluster): storage root — ledger, catalog, and graphs on the StorageAdapter (RFC-006 PR 2/3)#190
aaltshuler merged 3 commits into
mainfrom
feat/cluster-storage-root-v2

Conversation

@aaltshuler

Copy link
Copy Markdown
Collaborator

The core of the object-storage migration, landing as the focused diff the modularization (#188) was done for. Two commits:

Commit 1 — the backend port

LocalStateBackendClusterStore over the engine's StorageAdapter (in store.rs, the module that now owns every stored byte). Byte-compatible on the file backend — layout, CAS semantics, diagnostics, lock-release timing — proven by the entire pre-existing suite passing unchanged. The ledger CAS keeps its public sha256: vocabulary while the physical swap is token-conditioned (ETag If-Match on S3 via #186's primitives; the pre-port temp/rename semantics locally). The lock becomes a create-only put — genuinely cross-machine on object stores for the first time. Async ripple: status/force-unlock/read_serving_snapshot and the server's load_server_settings chain go async (mechanical CLI/test ripple).

Commit 2 — the storage: root

version: 1
storage: s3://omnigraph-local/clusters/intel   # optional; default = the config dir, byte-compatible

One URI decides where everything the cluster stores lives: ledger, lock, catalog, sidecars, approvals, and the derived graph roots (<storage>/graphs/<id>.omni). Declared config stays in the working tree (Terraform's config-local/state-remote split); credentials are env-only. Graph-root derivation, the delete executor (prefix delete), the sweep's existence probes, payload write/verify/read, and the serving snapshot all flow through the store — the last raw-fs holdouts are gone, and invariants.md's deny-list gains the rule keeping it that way.

Tests

  • 98 in-crate (3 new: default-layout byte-compat; a file:// root relocating the entire cluster — ledger+catalog+graphs under the new root, nothing under the config dir, serving snapshot follows; invalid-root validation) + 9 failpoints
  • cargo test --workspace --locked — green, 46 suites
  • The s3:// flavor is exercised end-to-end in PR 3's gated RustFS suite (the conditional-put and schema-apply-on-S3 foundations are already pinned by feat(storage,policy): object-store primitives for the cluster port (RFC-006 PR 1/3) #186's gated tests)

🤖 Generated with Claude Code

aaltshuler and others added 2 commits June 11, 2026 14:11
LocalStateBackend becomes ClusterStore: every stored byte — state ledger,
lock, recovery sidecars, approval artifacts — now flows through the
engine's StorageAdapter, making file:// and s3:// one code path. Behavior
on the file backend is byte-compatible (layout, CAS semantics, diagnostics,
lock release timing) and the entire pre-existing suite passes unchanged.

Mechanics: the ledger CAS keeps its public sha256 vocabulary while the
physical swap is token-conditioned (ETag If-Match on S3 via PR #186's
primitives; content-token + temp/rename locally — the pre-port semantics);
the lock is a create-only put (genuinely cross-machine on object stores)
with deterministic drop-release locally and best-effort spawned release on
S3; sidecars/approvals address by URI (SweepOutcome and the executors carry
strings); sweep row-1 retirement joins the uniform deferred post-CAS
cleanup. ClusterStore also gains the catalog-payload and graph-root
methods that commit 2 wires in.

Async ripple: status/force-unlock/serving-snapshot and the server's
settings loader chain go async (CLI dispatch and ~20 test hosts follow,
mechanically). tokio joins the cluster crate's runtime deps for the lock
guard's handle.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…locatable

cluster.yaml gains an optional storage: URI deciding where everything the
cluster STORES lives: the state ledger, lock, content-addressed catalog,
recovery sidecars, approval artifacts, and the derived graph roots
(<storage>/graphs/<id>.omni). Absent, it defaults to the config directory
itself — the original layout, byte-compatible, so pre-existing clusters and
the whole test suite are untouched. Declared configuration always stays in
the working tree (Terraform's config-local/state-remote split); credentials
are env-only, never in cluster.yaml.

Every command resolves its store from the declared root (a bad root is a
loud invalid_storage_root). Graph-root derivation, the delete executor
(prefix delete via the adapter), the sweep's existence probes, the catalog
payload write/verify/read paths, and the serving snapshot all flow through
ClusterStore — the last raw-fs holdouts for stored state are gone, and the
deny-list gains the rule that keeps it that way.

Tests: default-layout byte-compat, a file:// root relocating the entire
cluster (ledger+catalog+graphs under the new root, nothing under the config
dir, serving snapshot follows), invalid-root validation. 98 in-crate + 9
failpoints + full workspace gate green. The s3:// flavor lands with PR 3's
gated RustFS e2e.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaltshuler has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Caught by the first live s3 smoke: StateLockGuard's spawned async delete
dies with the runtime when a short-lived CLI process exits right after the
command — import's lock survived into the next command as state_lock_held.
On the multi-thread runtime (the CLI, and the gated s3 tests)
block_in_place waits for the delete to complete; current-thread runtimes
keep the spawn fallback with force-unlock as the documented recovery, same
as a crash.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaltshuler has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@aaltshuler aaltshuler merged commit 4e526b3 into main Jun 11, 2026
7 checks passed
@aaltshuler aaltshuler deleted the feat/cluster-storage-root-v2 branch June 11, 2026 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant