Phase 2.5: multi-master write safety - bucket-root CAS (FM-1) + EIP-712 identity (FM-4)#41
Open
ehsan6sha wants to merge 19 commits into
Open
Phase 2.5: multi-master write safety - bucket-root CAS (FM-1) + EIP-712 identity (FM-4)#41ehsan6sha wants to merge 19 commits into
ehsan6sha wants to merge 19 commits into
Conversation
Multi-stage build of the fula-gateway bin (cargo build --release -p fula-cli) on rust:1-bookworm with a slim Debian runtime. Env-driven config only; /data volume for durable state (pin queue, registry CID). Consumed by the federated-master installer in functionland/pinning-service (gateway compose profile, auto-enabled when the image exists). Closes #29 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…gateway The pin queue, registry CID, and local-retain backlog default to /var/lib/fula-gateway (config.rs); the /data volume was wrong - without a mount there, pin retries/crash recovery silently degrade to fire-and-forget. Found by the federated-master e2e (startup WARNs). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…mote-cid mapping PUTs
Gateway (flag FULA_REMOTE_CID_PUT, default OFF):
- GET /fula/capabilities (public): advertises {remoteCidPut} so clients
probe before using optional protocols - an old/flag-off master is never
sent an empty-body PUT it would misstore as a real zero-byte chunk
- put_object: empty-body PUT with x-amz-meta-fula-remote-cid records only
the key->cid mapping after validating the CID is raw(0x55)+blake3(0x1e)
and the block is PRESENT in the blockstore (absent => retryable 409 so
the client falls back to full bytes); x-amz-meta-fula-remote-size
carries the chunk's true byte count so billing/list never see 0;
both headers added to the control-header filter (never persisted)
Client (Config.ingest_endpoints, default empty = byte-identical legacy):
- supports_remote_cid_put(): one-shot capability probe, ANY failure=false
- put_object_chunked_internal: per chunk, bytes stream to the ingest node
(PUT /v0/block?cid=<precomputed> - the node verifies before storing),
then the mapping PUT (with pinning variant when creds present, keeping
per-object pin requests); ANY failure on the route falls back to the
legacy full-bytes PUT. Probe once per upload; native-only (wasm32 keeps
legacy path); ETag=cid keeps the existing self-verify unchanged
Part of #31 (cross-repo: functionland/fula-ota#74 fula-ingest)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Four cases: old master (no capabilities) never receives the new protocol; capable master routes every chunk's bytes through the ingest node (master sees ONLY empty-body mapping PUTs; declared sizes sum to the bytes the ingest stored; ingest asserts declared cid == blake3 of body); dead ingest falls back transparently and the upload succeeds; empty ingest_endpoints is byte-identical legacy. Part of #31 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ked loop put_object_chunked (the FxFiles-path public API) duplicates the chunk loop of put_object_chunked_internal (E51 mirroring pattern) - the route existed only in the internal one, so public-API uploads silently stayed on the legacy byte path. Caught by the wiremock matrix (positive case: ingest got 0). Same probe gating + fallback semantics. Part of #31 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-trip via a real master + fula-ingest: chunked upload routed through the ingest node then read back via the master (CID on); v8-off legacy parity leg; FULA_BIG=1 adds the 1 GiB case (~4096 chunks). Consumed by fula-ota tests/e2e/phase-2/60-fidelity.sh which also asserts server-side that kubo block counts grew. Part of #31 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…this branch (PR #30 lands first)
…must 409 fast kubo block lookup for an ABSENT cid is an unbounded bitswap/DHT search; the mapping-PUT handler hung for minutes per missing block (caught by drill I10: curl timed out). Timeout => treat as absent => retryable 409, client falls back to a full-bytes PUT. 5s amply covers a bitswap pull from the ingest peer. Part of #31 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ect_flat) get_object does not exist on EncryptedClient; the canonical pairing is put_object_flat (which dispatches >768KiB payloads into put_object_chunked_internal - the route under test) + get_object_flat, exactly what offline_e2e/FxFiles use. Part of #31 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nt masters Phase 2.5 core (fula-api#32). Default OFF; byte-identical when dark. fula-core (storage-agnostic): - new root_pointer module: RootPointerStore trait (cas_root/get_root) + CasOutcome + in-memory test store; race-semantics unit tests - Bucket::flush(): with an arbiter attached, CAS old_root -> new_root BEFORE publishing to the in-process cache; a lost race surfaces as PreconditionFailed (the retryable contract SDKs already handle for conditional writes); no-op when the root did not change - BucketManager: OnceLock-held arbiter (wireable after Arc-ing, lock-free reads); open_bucket_for_user consults get_root so a bucket moved by ANOTHER master is opened at the shared root, not this process's stale cache (arbiter-unreachable degrades to cached root - flush CAS still arbitrates); opened buckets get the arbiter attached fula-cli: - root_store_pg: PgRootStore on the EXISTING pins-DB sqlx pool (zero new deps) - single-statement INSERT..ON CONFLICT..DO UPDATE..WHERE root_cid=expected upsert-CAS; bucket_roots table ships as pinning-service migration 020 - flag FULA_BUCKET_ROOT_CAS + config field; state wires the arbiter when flag AND Postgres are present (loud warning if flag-on without DB) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ntity)
Phase 2.5 (fula-api#32). Self-certifying bearer: fula-eip712.<b64url
payload>.<b64url 65-byte sig> over EIP-712 typed data (domain "Fula
Gateway" v1; FulaAuth{wallet,iat,exp}). Any master verifies offline -
no shared secret, no session table - so ONE identity works on every
federated master; the session user id IS the lowercase wallet (the same
wallet the staking/MasterRegistry phases key on). Scope fixed to
storage:*; lifetime capped 24h; iat skew 300s; revocation deny-list
applies to the raw token unchanged. Additive: prefix + FULA_EIP712_AUTH
flag route it; legacy JWT path untouched. /fula/capabilities advertises
eip712Auth. Deps: k256 (ecdsa) + sha3 - minimal RustCrypto, only
exercised behind the flag.
Unit tests: valid sig -> wallet session w/ write scope; wrong-wallet
claim rejected; tampered payload rejected; expired/overlong rejected;
two independent verifications agree (the portability property).
Known v0 limitation (documented in-module): per-PUT registry pinning
forwards the bearer to the pinning service, which 401s non-session
tokens - full parity needs the pinning service accepting this scheme
(follow-up); Stage-B storage-only operators are unaffected.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…l puts Three current-behaviour pins quantifying the O(N^2) bulk-upload cost: - node layer: store() re-PUTs an entire unchanged InMemory tree because persisted children are never downgraded to Stored (the only Stored constructor is the load path, Pointer::from_wire) - forest layer: per-flush PUT count and bytes grow with bucket size for the put_object_flat pattern (upsert + flush per file, one directory): flush #10 = 1 PUT / 10.8 KB -> flush #150 = 17 PUTs / 161.7 KB; 150 x 4 KB files = 1630 node PUTs / 12.2 MB of index re-uploads - contrast: marginal flush after 150 uploads is 17 PUTs in the long-lived session vs 3 PUTs for a fresh forest loaded from the same manifest - the amplification is session-accumulated InMemory state, not inherent tree cost Invert these pins into regression guards when #34 is fixed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…guards supersede the local pin tests)
End-to-end CAS against the real bucket_roots table (migration 020): claim-on-first-flush, stale-root flush conflicts reporting current, loser-retry-at-shared-root wins, version increments; plus an 8-way concurrent race asserting EXACTLY ONE winner. The exact arbitration two live gateways perform on flush. Gated on POSTGRES_* (skips without). Part of #32 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…-core dep) root_pointer tests only need distinct CIDs; wrapping a seed-filled 32-byte digest as multihash 0x1e avoids adding blake3 to fula-core. Compile error E0433 from the unit gate. Part of #32 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Member
Author
Phase 2.5 — final verification complete (all green on the test stack)Live two-gateway drill (
Combined Phase 2.5 evidence:
Plus the storage-only Stage-B operator profile ( Phase 2.5 is feature-complete and verified. Ready for your review. 🤖 Generated with Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 2.5 — multi-master write safety: bucket-root CAS (FM-1) + EIP-712 portable identity (FM-4)
Makes CONCURRENT federated masters SAFE where Stage A (#46) only gave fenced failover. Everything flag-gated, default OFF — single-master behavior byte-identical when dark.
FM-1 — bucket-root compare-and-swap
The per-bucket root pointer was guarded only by in-process locks (
fula-core/src/bucket.rs), so two gateways writing one bucket could silently lose updates.fula-core: storage-agnosticRootPointerStoretrait (cas_root/get_root+CasOutcome);Bucket::flush()CASesold_root → new_rootbefore publishing to the in-process cache (lost race ⇒PreconditionFailed, the retryable 412 contract SDKs already handle);open_bucket_for_useropens at the shared root when another master moved it (arbiter-unreachable degrades to cached root — flush CAS still arbitrates).fula-cli:PgRootStoreon the existing pins-DB sqlx pool (single-statement upsert-CAS; zero new deps); flagFULA_BUCKET_ROOT_CAS, wired only when flag AND Postgres present (loud warning otherwise).bucket_rootsships as pinning-service migration 020 (additive).FM-4 — EIP-712 portable identity
Self-certifying wallet-signature bearer (
fula-eip712.<payload>.<sig>over EIP-712 typed data) — any master verifies offline with no shared secret, so one identity works on every federated master; session user-id IS the wallet (the same wallet staking/MasterRegistry key on). FlagFULA_EIP712_AUTH, prefix-routed; legacy JWT path untouched. Deps:k256+sha3(minimal RustCrypto, only behind the flag).Tests (all green on the test stack)
PgRootStorevs REAL Postgres 2/2 — claim → stale-flush conflict (reports current) → loser-retry-at-shared-root wins → version increments; + 8-way concurrent CAS asserts exactly one winner. This is the exact arbitration two live gateways perform on flush.Scope notes (honest)
join-as-master.shstorage-only profile (Stage-B operators) is a follow-up on this branch.In simple words
Two backend servers can now safely accept changes to the same user's data at the same time without ever erasing each other's work — each save must win a tiny atomic "I'm building on the latest version" check in the shared ledger, and the loser is told to reload and retry. And you can prove who you are with your wallet's signature on any server, no central ticket needed.
Closes #32
🤖 Generated with Claude Code