feat(cluster): storage root — ledger, catalog, and graphs on the StorageAdapter (RFC-006 PR 2/3)#190
Merged
Merged
Conversation
LocalStateBackend becomes ClusterStore: every stored byte — state ledger, lock, recovery sidecars, approval artifacts — now flows through the engine's StorageAdapter, making file:// and s3:// one code path. Behavior on the file backend is byte-compatible (layout, CAS semantics, diagnostics, lock release timing) and the entire pre-existing suite passes unchanged. Mechanics: the ledger CAS keeps its public sha256 vocabulary while the physical swap is token-conditioned (ETag If-Match on S3 via PR #186's primitives; content-token + temp/rename locally — the pre-port semantics); the lock is a create-only put (genuinely cross-machine on object stores) with deterministic drop-release locally and best-effort spawned release on S3; sidecars/approvals address by URI (SweepOutcome and the executors carry strings); sweep row-1 retirement joins the uniform deferred post-CAS cleanup. ClusterStore also gains the catalog-payload and graph-root methods that commit 2 wires in. Async ripple: status/force-unlock/serving-snapshot and the server's settings loader chain go async (CLI dispatch and ~20 test hosts follow, mechanically). tokio joins the cluster crate's runtime deps for the lock guard's handle. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…locatable cluster.yaml gains an optional storage: URI deciding where everything the cluster STORES lives: the state ledger, lock, content-addressed catalog, recovery sidecars, approval artifacts, and the derived graph roots (<storage>/graphs/<id>.omni). Absent, it defaults to the config directory itself — the original layout, byte-compatible, so pre-existing clusters and the whole test suite are untouched. Declared configuration always stays in the working tree (Terraform's config-local/state-remote split); credentials are env-only, never in cluster.yaml. Every command resolves its store from the declared root (a bad root is a loud invalid_storage_root). Graph-root derivation, the delete executor (prefix delete via the adapter), the sweep's existence probes, the catalog payload write/verify/read paths, and the serving snapshot all flow through ClusterStore — the last raw-fs holdouts for stored state are gone, and the deny-list gains the rule that keeps it that way. Tests: default-layout byte-compat, a file:// root relocating the entire cluster (ledger+catalog+graphs under the new root, nothing under the config dir, serving snapshot follows), invalid-root validation. 98 in-crate + 9 failpoints + full workspace gate green. The s3:// flavor lands with PR 3's gated RustFS e2e. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
aaltshuler has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Caught by the first live s3 smoke: StateLockGuard's spawned async delete dies with the runtime when a short-lived CLI process exits right after the command — import's lock survived into the next command as state_lock_held. On the multi-thread runtime (the CLI, and the gated s3 tests) block_in_place waits for the delete to complete; current-thread runtimes keep the spawn fallback with force-unlock as the documented recovery, same as a crash. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
aaltshuler has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
This was referenced Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The core of the object-storage migration, landing as the focused diff the modularization (#188) was done for. Two commits:
Commit 1 — the backend port
LocalStateBackend→ClusterStoreover the engine'sStorageAdapter(instore.rs, the module that now owns every stored byte). Byte-compatible on the file backend — layout, CAS semantics, diagnostics, lock-release timing — proven by the entire pre-existing suite passing unchanged. The ledger CAS keeps its publicsha256:vocabulary while the physical swap is token-conditioned (ETag If-Match on S3 via #186's primitives; the pre-port temp/rename semantics locally). The lock becomes a create-only put — genuinely cross-machine on object stores for the first time. Async ripple:status/force-unlock/read_serving_snapshotand the server'sload_server_settingschain go async (mechanical CLI/test ripple).Commit 2 — the
storage:rootOne URI decides where everything the cluster stores lives: ledger, lock, catalog, sidecars, approvals, and the derived graph roots (
<storage>/graphs/<id>.omni). Declared config stays in the working tree (Terraform's config-local/state-remote split); credentials are env-only. Graph-root derivation, the delete executor (prefix delete), the sweep's existence probes, payload write/verify/read, and the serving snapshot all flow through the store — the last raw-fs holdouts are gone, andinvariants.md's deny-list gains the rule keeping it that way.Tests
file://root relocating the entire cluster — ledger+catalog+graphs under the new root, nothing under the config dir, serving snapshot follows; invalid-root validation) + 9 failpointscargo test --workspace --locked— green, 46 suitess3://flavor is exercised end-to-end in PR 3's gated RustFS suite (the conditional-put and schema-apply-on-S3 foundations are already pinned by feat(storage,policy): object-store primitives for the cluster port (RFC-006 PR 1/3) #186's gated tests)🤖 Generated with Claude Code