Skip to content

Phase 2.5: migration 020 - bucket_roots shared root-pointer table (FM-1)#48

Open
ehsan6sha wants to merge 12 commits into
mainfrom
phase-2.5-multimaster
Open

Phase 2.5: migration 020 - bucket_roots shared root-pointer table (FM-1)#48
ehsan6sha wants to merge 12 commits into
mainfrom
phase-2.5-multimaster

Conversation

@ehsan6sha

Copy link
Copy Markdown
Member

Phase 2.5 — migration 020: bucket_roots shared root-pointer table (FM-1)

The arbiter table for multi-master bucket-root compare-and-swap. With more than one federated master serving S3 writes, each bucket's root CID needs a store every master can CAS against; the gateways' in-process locks can't see each other.

The fula-gateway (FULA_BUCKET_ROOT_CAS, functionland/fula-api#41) performs a single-statement upsert-CAS:

INSERT .. ON CONFLICT (owner_id, bucket) DO UPDATE
  SET root_cid = $new, version = version + 1
  WHERE bucket_roots.root_cid = $expected

Purely additive — nothing reads or writes this table until the gateway flag is enabled; flag-off masters are unaffected. Guarded .down.sql (refuses to drop while the flag is in use).

Verified green: the gateway's PgRootStore integration test runs against this exact table on the live stack DB — claim → stale-conflict → retry-wins → version increment, plus an 8-way concurrent "exactly one winner" race (functionland/fula-api#41).

Part of the Phase 2.5 multi-master-safety milestone.

🤖 Generated with Claude Code

ehsan6sha and others added 12 commits June 11, 2026 19:19
…c deposit credit, cron leader-lease

Federated masters (Phase 1.5, Stage A) groundwork. All flag-gated, default
OFF - single-master behavior is byte-identical when dark:

- migration 018: partial UNIQUE index (user_id, reference_id) WHERE
  tx_type=hourly_deduction (CONCURRENTLY; pre-checks duplicates; .down.sql)
- deductionJob (BILLING_IDEMPOTENCY=true): deterministic reference_id
  hour:YYYY-MM-DDTHH (UTC) and the history INSERT becomes the dedup gate
  (ON CONFLICT DO NOTHING) BEFORE the balance update - N masters deduct
  exactly once per (user, hour)
- blockScanner (BILLING_IDEMPOTENCY=true): deposit insert + creditUserTx +
  claimed_at in ONE transaction - a crash can no longer strand a
  recorded-but-uncredited tx; creditService gains creditUserTx (caller-owned
  transaction; creditUser delegates, zero behavior change)
- leaderLease (CRON_LEADER_LEASE=true): Postgres session advisory lock on a
  dedicated client gates every cron tick; holder crash frees the lock so a
  standby master takes over on its next tick; SIGTERM releases explicitly
- tests: hour-bucket determinism/UTC/collision, fee formula unchanged,
  flag-off no-op gate (DB-free; multi-master paths covered by Phase 1.5 e2e)

Part of #46

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… replication sweep

Phase 1.5 Stage A deliverables (#46):
- docker/master: compose stack (postgres-pinning, pinning OpenAPI from
  main_postgres.go, pinning-webui with cron family) - host networking +
  127.0.0.1 binds like prod, healthchecks + restart policies + label-scoped
  watchtower; optional fula-gateway profile (auto-enabled when image exists)
- update-scripts/join-as-master.sh v0: capstone installer first cut -
  detect/adopt-or-halt (adopts the Phase-1 kubo+cluster writer; halts on a
  foreign postgres-pinning), ordered migrations with halt-on-error + marker,
  idempotent re-runs, params persisted to .env (phase-common pattern)
- update-scripts/pinset-snapshot.sh: signed (ed25519) authoritative pinset
  dumps + --verify/--restore/--install-cron (early FM-3 restore path)
- update-scripts/replication-sweep.sh: below-REPL_MIN detection + recover +
  alert log + --strict for drills (closes the S4 sweep gap)
- test seams: processUserDeduction exported; cron intervals env-overridable
  (SCANNER_INTERVAL_MS/DEDUCTION_INTERVAL_MS, defaults unchanged)
- tests: fm2-billing-integration (live-Postgres; skips cleanly without DB) -
  concurrent same-hour deduction races deduct once; replayed deposit credits
  once and leaves no recorded-but-uncredited state

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ver, snapshots, sweep

D1 stack health, D2 migration-018 presence, D3 two webui masters -> one
leader + one standby + exactly one hourly_deduction per (user, hour),
D4 kill -9 leader -> standby acquires lease, STILL one row (idempotency
under failover), D5 live-Postgres vitest integration, D6 snapshot
take/verify/tamper-reject/unpin+restore, D7 sweep clean -> forced
under-replication detected (--strict) + alerted -> reconverged clean.

Part of #46

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…_PORT default)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… auto-generate + persist

Found by the live installer run (webui FATALs without them; container
crash-looped silently). join-as-master.sh now generates both once
(openssl rand) and persists to .env; compose fails fast with a clear
message if absent; drill webuis receive them too.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Found by the live Phase 1.5 e2e run:
- migration 019: user_wallets.wallet_address DROP NOT NULL - post-PII
  linkWallet stores hash-only (wallet_address=NULL) so EVERY fresh install
  rejected wallet links; 012 relaxed user_email but missed this column
  (guarded .down.sql). Real fresh-deploy bug, not test-only.
- compose: mount fula-gateway-state at /var/lib/fula-gateway - the gateway
  durable state paths are hardcoded there; without the volume the S2 pin
  queue silently degrades to fire-and-forget and the bucket registry resets
  on restart.
- drills: D5 runner no longer swallows vitest exit (pipefail + explicit
  2-passed check); D6 takes the FIRST cid from the snapshot stream
  (was capturing a multiline list) and polls 60s for the restore.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
join-as-master.sh now installs the pinset-snapshot (6h) and
replication-sweep (30min) cron entries and takes the first snapshot
immediately - the restore path exists from minute one, per the
safeguards invariant (S4/S6 must be scheduled, not manual).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…pass count

The tests passed (2/2 on live Postgres) but the colored output broke the
literal match - strip escapes, then assert.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Arbiter table for multi-master bucket-root CAS (fula-api#32; consumed by
the fula-gateway when FULA_BUCKET_ROOT_CAS is on). Purely additive -
nothing touches it until the gateway flag is enabled. Guarded .down.sql.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rofile

A vetted third party provides STORAGE + verified byte ingress with NO
cluster write authority and NO database/billing: follower kubo +
ipfs-cluster in FOLLOWERMODE trusting ONLY the master peer ids (never
itself, so it replicates but cannot mutate the pinset - mass-unpin blast
radius stays first-party until FM-3) + fula-ingest (verified blake3 CID,
quota-gated on a master storage API). Dockerized compose (healthchecks +
restart + watchtower); idempotent, adopt-or-halt, .env-persisted, auto-
resolves master identity from the pools/masters API. This is the FM-1+FM-4
Stage-B gate's operator tooling.

Part of #48 / fula-api#32

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…te test

Two gateways (separate registries) vs one Postgres/kubo/cluster. CAS OFF:
stale gateway B clobbers A => split bucket (lost update reproduced). CAS
ON: B opens at the shared root first => both objects on both gateways (no
lost update). No concurrency-timing flake - exploits the stale-registry
hazard directly. Needs fula-gateway:p25 (built from phase-2.5-multimaster).

Part of #48 / fula-api#32

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@ehsan6sha

Copy link
Copy Markdown
Member Author

Phase 2.5 — final verification complete (all green on the test stack)

Live two-gateway drill (tests/e2e/phase-2.5/90-two-gateway-cas.sh, two real gateway processes vs one Postgres/kubo/cluster) — 2/2:

  • CAS OFF (control): lost update reproduced — a stale second gateway clobbered the first's write (split bucket). The hazard is real.
  • CAS ON (FM-1): no lost update — both gateways see both objects (objX AND objY). Fixed end-to-end.

Combined Phase 2.5 evidence:

Test Result
FM-1 race-semantics unit ✅ 2/2
FM-1 PgRootStore vs real Postgres (claim → conflict → retry → version++; 8-way one-winner) ✅ 2/2
FM-4 EIP-712 unit (incl. portability) ✅ 5/5
Live two-gateway lost-update drill ✅ 2/2

Plus the storage-only Stage-B operator profile (join-as-storage-node.sh + docker-compose.storage.yml): validated on the box — compose parses, CLUSTER_FOLLOWERMODE=true (replicates, no write authority), trusts masters-not-self, ingest quota-gated, adopt-or-halt refuses to stomp a master's containers.

Phase 2.5 is feature-complete and verified. Ready for your review.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant