Skip to content

Phase 2.5: multi-master write safety - bucket-root CAS (FM-1) + EIP-712 identity (FM-4)#41

Open
ehsan6sha wants to merge 19 commits into
mainfrom
phase-2.5-multimaster
Open

Phase 2.5: multi-master write safety - bucket-root CAS (FM-1) + EIP-712 identity (FM-4)#41
ehsan6sha wants to merge 19 commits into
mainfrom
phase-2.5-multimaster

Conversation

@ehsan6sha

Copy link
Copy Markdown
Member

Phase 2.5 — multi-master write safety: bucket-root CAS (FM-1) + EIP-712 portable identity (FM-4)

Makes CONCURRENT federated masters SAFE where Stage A (#46) only gave fenced failover. Everything flag-gated, default OFF — single-master behavior byte-identical when dark.

FM-1 — bucket-root compare-and-swap

The per-bucket root pointer was guarded only by in-process locks (fula-core/src/bucket.rs), so two gateways writing one bucket could silently lose updates.

  • fula-core: storage-agnostic RootPointerStore trait (cas_root/get_root + CasOutcome); Bucket::flush() CASes old_root → new_root before publishing to the in-process cache (lost race ⇒ PreconditionFailed, the retryable 412 contract SDKs already handle); open_bucket_for_user opens at the shared root when another master moved it (arbiter-unreachable degrades to cached root — flush CAS still arbitrates).
  • fula-cli: PgRootStore on the existing pins-DB sqlx pool (single-statement upsert-CAS; zero new deps); flag FULA_BUCKET_ROOT_CAS, wired only when flag AND Postgres present (loud warning otherwise).
  • Table: bucket_roots ships as pinning-service migration 020 (additive).

FM-4 — EIP-712 portable identity

Self-certifying wallet-signature bearer (fula-eip712.<payload>.<sig> over EIP-712 typed data) — any master verifies offline with no shared secret, so one identity works on every federated master; session user-id IS the wallet (the same wallet staking/MasterRegistry key on). Flag FULA_EIP712_AUTH, prefix-routed; legacy JWT path untouched. Deps: k256 + sha3 (minimal RustCrypto, only behind the flag).

Tests (all green on the test stack)

  • FM-1 race-semantics unit 2/2
  • FM-1 PgRootStore vs REAL Postgres 2/2 — claim → stale-flush conflict (reports current) → loser-retry-at-shared-root wins → version increments; + 8-way concurrent CAS asserts exactly one winner. This is the exact arbitration two live gateways perform on flush.
  • FM-4 EIP-712 unit 5/5 — valid sig → wallet session; wrong-wallet/tampered/expired/overlong rejected; two independent verifications agree (the portability property).

Scope notes (honest)

  • The full two-live-gateway drill is intentionally represented by the real-Postgres CAS integration test — it proves the identical arbitration deterministically without the dual-gateway timing flake.
  • join-as-master.sh storage-only profile (Stage-B operators) is a follow-up on this branch.
  • FM-4 v0: per-PUT registry pinning forwards the bearer to the pinning service (JWT-session-only today) — full parity needs the pinning service accepting EIP-712 too; storage-only Stage-B operators are unaffected.

In simple words

Two backend servers can now safely accept changes to the same user's data at the same time without ever erasing each other's work — each save must win a tiny atomic "I'm building on the latest version" check in the shared ledger, and the loser is told to reload and retry. And you can prove who you are with your wallet's signature on any server, no central ticket needed.

Closes #32

🤖 Generated with Claude Code

ehsan6sha and others added 19 commits June 11, 2026 19:34
Multi-stage build of the fula-gateway bin (cargo build --release -p
fula-cli) on rust:1-bookworm with a slim Debian runtime. Env-driven
config only; /data volume for durable state (pin queue, registry CID).
Consumed by the federated-master installer in functionland/pinning-service
(gateway compose profile, auto-enabled when the image exists).

Closes #29

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…gateway

The pin queue, registry CID, and local-retain backlog default to
/var/lib/fula-gateway (config.rs); the /data volume was wrong - without a
mount there, pin retries/crash recovery silently degrade to
fire-and-forget. Found by the federated-master e2e (startup WARNs).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…mote-cid mapping PUTs

Gateway (flag FULA_REMOTE_CID_PUT, default OFF):
- GET /fula/capabilities (public): advertises {remoteCidPut} so clients
  probe before using optional protocols - an old/flag-off master is never
  sent an empty-body PUT it would misstore as a real zero-byte chunk
- put_object: empty-body PUT with x-amz-meta-fula-remote-cid records only
  the key->cid mapping after validating the CID is raw(0x55)+blake3(0x1e)
  and the block is PRESENT in the blockstore (absent => retryable 409 so
  the client falls back to full bytes); x-amz-meta-fula-remote-size
  carries the chunk's true byte count so billing/list never see 0;
  both headers added to the control-header filter (never persisted)

Client (Config.ingest_endpoints, default empty = byte-identical legacy):
- supports_remote_cid_put(): one-shot capability probe, ANY failure=false
- put_object_chunked_internal: per chunk, bytes stream to the ingest node
  (PUT /v0/block?cid=<precomputed> - the node verifies before storing),
  then the mapping PUT (with pinning variant when creds present, keeping
  per-object pin requests); ANY failure on the route falls back to the
  legacy full-bytes PUT. Probe once per upload; native-only (wasm32 keeps
  legacy path); ETag=cid keeps the existing self-verify unchanged

Part of #31 (cross-repo: functionland/fula-ota#74 fula-ingest)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Four cases: old master (no capabilities) never receives the new
protocol; capable master routes every chunk's bytes through the ingest
node (master sees ONLY empty-body mapping PUTs; declared sizes sum to
the bytes the ingest stored; ingest asserts declared cid == blake3 of
body); dead ingest falls back transparently and the upload succeeds;
empty ingest_endpoints is byte-identical legacy.

Part of #31

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ked loop

put_object_chunked (the FxFiles-path public API) duplicates the chunk
loop of put_object_chunked_internal (E51 mirroring pattern) - the route
existed only in the internal one, so public-API uploads silently stayed
on the legacy byte path. Caught by the wiremock matrix (positive case:
ingest got 0). Same probe gating + fallback semantics.

Part of #31

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Round-trip via a real master + fula-ingest: chunked upload routed
through the ingest node then read back via the master (CID on);
v8-off legacy parity leg; FULA_BIG=1 adds the 1 GiB case (~4096
chunks). Consumed by fula-ota tests/e2e/phase-2/60-fidelity.sh which
also asserts server-side that kubo block counts grew.

Part of #31

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…must 409 fast

kubo block lookup for an ABSENT cid is an unbounded bitswap/DHT search;
the mapping-PUT handler hung for minutes per missing block (caught by
drill I10: curl timed out). Timeout => treat as absent => retryable 409,
client falls back to a full-bytes PUT. 5s amply covers a bitswap pull
from the ingest peer.

Part of #31

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ect_flat)

get_object does not exist on EncryptedClient; the canonical pairing is
put_object_flat (which dispatches >768KiB payloads into
put_object_chunked_internal - the route under test) + get_object_flat,
exactly what offline_e2e/FxFiles use.

Part of #31

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nt masters

Phase 2.5 core (fula-api#32). Default OFF; byte-identical when dark.

fula-core (storage-agnostic):
- new root_pointer module: RootPointerStore trait (cas_root/get_root) +
  CasOutcome + in-memory test store; race-semantics unit tests
- Bucket::flush(): with an arbiter attached, CAS old_root -> new_root
  BEFORE publishing to the in-process cache; a lost race surfaces as
  PreconditionFailed (the retryable contract SDKs already handle for
  conditional writes); no-op when the root did not change
- BucketManager: OnceLock-held arbiter (wireable after Arc-ing, lock-free
  reads); open_bucket_for_user consults get_root so a bucket moved by
  ANOTHER master is opened at the shared root, not this process's stale
  cache (arbiter-unreachable degrades to cached root - flush CAS still
  arbitrates); opened buckets get the arbiter attached

fula-cli:
- root_store_pg: PgRootStore on the EXISTING pins-DB sqlx pool (zero new
  deps) - single-statement INSERT..ON CONFLICT..DO UPDATE..WHERE
  root_cid=expected upsert-CAS; bucket_roots table ships as
  pinning-service migration 020
- flag FULA_BUCKET_ROOT_CAS + config field; state wires the arbiter when
  flag AND Postgres are present (loud warning if flag-on without DB)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ntity)

Phase 2.5 (fula-api#32). Self-certifying bearer: fula-eip712.<b64url
payload>.<b64url 65-byte sig> over EIP-712 typed data (domain "Fula
Gateway" v1; FulaAuth{wallet,iat,exp}). Any master verifies offline -
no shared secret, no session table - so ONE identity works on every
federated master; the session user id IS the lowercase wallet (the same
wallet the staking/MasterRegistry phases key on). Scope fixed to
storage:*; lifetime capped 24h; iat skew 300s; revocation deny-list
applies to the raw token unchanged. Additive: prefix + FULA_EIP712_AUTH
flag route it; legacy JWT path untouched. /fula/capabilities advertises
eip712Auth. Deps: k256 (ecdsa) + sha3 - minimal RustCrypto, only
exercised behind the flag.

Unit tests: valid sig -> wallet session w/ write scope; wrong-wallet
claim rejected; tampered payload rejected; expired/overlong rejected;
two independent verifications agree (the portability property).

Known v0 limitation (documented in-module): per-PUT registry pinning
forwards the bearer to the pinning service, which 401s non-session
tokens - full parity needs the pinning service accepting this scheme
(follow-up); Stage-B storage-only operators are unaffected.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…l puts

Three current-behaviour pins quantifying the O(N^2) bulk-upload cost:

- node layer: store() re-PUTs an entire unchanged InMemory tree because
  persisted children are never downgraded to Stored (the only Stored
  constructor is the load path, Pointer::from_wire)
- forest layer: per-flush PUT count and bytes grow with bucket size for
  the put_object_flat pattern (upsert + flush per file, one directory):
  flush #10 = 1 PUT / 10.8 KB -> flush #150 = 17 PUTs / 161.7 KB;
  150 x 4 KB files = 1630 node PUTs / 12.2 MB of index re-uploads
- contrast: marginal flush after 150 uploads is 17 PUTs in the
  long-lived session vs 3 PUTs for a fresh forest loaded from the same
  manifest - the amplification is session-accumulated InMemory state,
  not inherent tree cost

Invert these pins into regression guards when #34 is fixed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… and OOM-killed run #4

dmesg: oom-kill live_ingest_e2e total-vm 3.26GB anon-rss 3.18GB on the
7.8GB e2e box. The blake3 want-hash captured before the move carries
everything verification needs.

Part of #31

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
End-to-end CAS against the real bucket_roots table (migration 020):
claim-on-first-flush, stale-root flush conflicts reporting current,
loser-retry-at-shared-root wins, version increments; plus an 8-way
concurrent race asserting EXACTLY ONE winner. The exact arbitration two
live gateways perform on flush. Gated on POSTGRES_* (skips without).

Part of #32

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…-core dep)

root_pointer tests only need distinct CIDs; wrapping a seed-filled 32-byte
digest as multihash 0x1e avoids adding blake3 to fula-core. Compile error
E0433 from the unit gate.

Part of #32

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ehsan6sha

Copy link
Copy Markdown
Member Author

Phase 2.5 — final verification complete (all green on the test stack)

Live two-gateway drill (tests/e2e/phase-2.5/90-two-gateway-cas.sh, two real gateway processes vs one Postgres/kubo/cluster) — 2/2:

  • CAS OFF (control): lost update reproduced — a stale second gateway clobbered the first's write (split bucket). The hazard is real.
  • CAS ON (FM-1): no lost update — both gateways see both objects (objX AND objY). Fixed end-to-end.

Combined Phase 2.5 evidence:

Test Result
FM-1 race-semantics unit ✅ 2/2
FM-1 PgRootStore vs real Postgres (claim → conflict → retry → version++; 8-way one-winner) ✅ 2/2
FM-4 EIP-712 unit (incl. portability) ✅ 5/5
Live two-gateway lost-update drill ✅ 2/2

Plus the storage-only Stage-B operator profile (join-as-storage-node.sh + docker-compose.storage.yml): validated on the box — compose parses, CLUSTER_FOLLOWERMODE=true (replicates, no write authority), trusts masters-not-self, ingest quota-gated, adopt-or-halt refuses to stomp a master's containers.

Phase 2.5 is feature-complete and verified. Ready for your review.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase 2.5: multi-master write safety - bucket-root CAS (FM-1) + EIP-712 portable identity (FM-4)

1 participant