fix: v1->v7 migration tolerates a gc'd (410 Gone) backup source (0.6.12)#44
Merged
Conversation
Large-file uploads failed on GC-damaged buckets: a large upload tips the forest past the v7 sharding threshold, triggering the v1->v7 migration, whose first step is a server-side copy_object of the v1 index to a backup key. A server-side COPY must read the source, but that index object's backing CID was garbage-collected (one-off manual `ipfs repo gc`; HEAD returns the ETag, a content read 410s) -> copy fails -> migration defers -> upload fails after all chunks already uploaded. Fix (Option B, advisor + gemini endorsed): treat a 410/Gone backup copy as "source content already gone, nothing to back up" -> skip the backup and proceed. v7 is rebuilt faithfully from the already-loaded in-memory v1 forest (monolithic = whole-or-nothing, no entry dropped); with auto-gc off the fresh v7 nodes persist; once migrated the bucket never hits this path again. Every OTHER copy error still defers -- a transient must not masquerade as gc'd. Safe because: the v1 source is already gc'd (no restore point could exist anyway); the backup is read only by try_v1_backup_fallback, which returns None gracefully when absent and triggers only if a future v7 manifest is unreadable. - error.rs: ClientError::is_gone() -- narrow match on Gone/HTTP410/410 only. - encryption.rs: migrate_v1_to_v7_internal Step 4 skips backup on is_gone(), defers on every other error. - test: is_gone_matches_only_410_gone (narrow-boundary guard). Workspace 0.6.11 -> 0.6.12. Migration-completes-despite-410 validated on the live gateway (real large-upload retry); the migration path is real-server-only (advisory lock + heartbeat), so it isn't wiremock-mockable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ehsan6sha
added a commit
that referenced
this pull request
Jun 18, 2026
* Streaming upload P1: plan-mode ChunkedEncoder (no-AEAD pass 1) + plan doc Foundation for web streaming + resumable large-file uploads (see docs/web-streaming-resumable-upload-plan.md). Adds a plan-only mode to ChunkedEncoder (into_plan_only): it still generates the per-chunk nonce, feeds the plaintext to the BAO + content hashers, and advances the chunk count, but skips the AEAD encrypt and retains NO ciphertext. The chunking / BAO / nonce code is shared verbatim with the encrypting path, so plan-mode and a full encode produce identical root_hash / content_hash / num_chunks for the same input (the random per-chunk nonces differ, which is fine). This is pass 1 of the streaming upload: commit the integrity root + nonce list without holding ciphertext, then pass 2 re-encrypts each chunk from its stored nonce (deterministic AEAD => identical ciphertext => idempotent PUT). Tests (fula-crypto): - test_plan_mode_matches_full_encode: root/content_hash/num_chunks parity vs a full encode; plan-mode retains no ciphertext. - test_plan_mode_nonces_reencrypt_and_decrypt_roundtrip: commit nonces -> re-encrypt each chunk from its stored nonce -> decode byte-exact. This is the resume-safety core (deterministic re-encryption from committed nonces). Full fula-crypto suite green (452 passed); wasm32 build green via fula-flutter. No change to the encrypting path's behavior (existing encode/decode tests pass). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * P2 (wip): streaming_put_chunk -- pass-2 per-chunk encrypt-from-stored-nonce + PUT First verifiable piece of the streaming upload core (see docs/web-streaming-resumable-upload-plan.md + task #44 notes). Adds EncryptedClient::streaming_put_chunk: encrypts ONE caller-supplied (pushed) plaintext chunk with the nonce committed in pass 1, then PUTs it -- mirroring the native resume re-encrypt (~8520) and the chunk-PUT closure in put_object_chunked_internal (transient retry_idempotent, pinning, post-PUT CID self-verify). Deterministic AES-GCM => identical ciphertext => idempotent content-addressed PUT, safe to retry or repeat on resume. AAD binds ciphertext to (storage_key, chunk_index). Compiles native (dead-code warning expected -- wired up by the streaming session + FRB handle in following commits). No change to existing paths. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * P2 (wip): fula-client streaming upload core (begin / finalize_plan / put_chunk / finish) Push-model streaming-upload methods on EncryptedClient (pub for the FRB handle + integration tests), completing the fula-client side of the OOM fix: - streaming_begin: prelude (ensure_forest_loaded, generate DEK, derive flat storage_key incl. the v7 shard-salt path, HPKE-wrap DEK, KEK version). Read-only vs live state; mirrors the head of put_object_flat_deferred_locked. - streaming_finalize_plan: end of pass 1 -- finalize the plan-only encoder, build PrivateMetadata (size + content_hash from the pushed plaintext) + encrypted metadata; returns ChunkedFileMetadata (nonces + BAO root) + the metas. - streaming_put_chunk: pass-2 per-chunk encrypt-from-stored-nonce + PUT. - streaming_finish: index PUT (header-safe) + forest register + flush, under the per-bucket write lock. register_streaming_upload_in_forest mirrors the wasm-proven upsert in put_object_flat_deferred_locked (v7/monolithic, WAL auto-skipped on wasm, orphan cleanup of the prior upload). Peak memory is bounded by what the caller holds in flight, not file size (pass 1 holds ~1 chunk; pass 2 only the pushed chunk). Compiles native + wasm (via fula-flutter). Not yet FRB-wired or end-to-end tested -- the stateful-mock round-trip test (P2 gate) + the FRB handle follow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * P2: streaming upload round-trip test (the P2 gate) -- PASSES byte-exact Hermetic test driving the streaming methods exactly as the FRB handle will (streaming_begin -> plan-only encoder -> streaming_finalize_plan -> streaming_put_chunk loop -> streaming_finish) against a STATEFUL wiremock that stores PUT bodies and serves them on GET, then downloads via get_object_flat and asserts BYTE-EXACT recovery. No network / no credentials -- runs in CI. Proves the streaming path produces a normally downloadable, decryptable object: pass-1 commits nonces without ciphertext, pass-2 re-encrypts each chunk from its stored nonce, the index + forest register + flush land correctly, and the standard download reconstructs the exact bytes (incl. the 0.6.13 body-fallback recovering chunk nonces). walkable-v8 post-PUT CID self-verify also exercised. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * P2 (done): FRB push-model streaming-upload handle + finalize the core Adds the FRB handle that exposes the streaming upload to Dart (fula-flutter forest.rs), completing P2 of the web streaming + resumable upload plan: - StreamingUploadHandle (opaque) + StreamingPlanInfo (num_chunks, chunk_size). - streaming_upload_begin / _plan_chunk / _finalize_plan / _upload_chunk / _finish. Dart slices the file from a Blob and drives the two passes; the handle never holds the whole file. The std::sync::Mutex is held only for brief sync critical sections (never across an .await), so concurrent _upload_chunk calls for distinct indices run their PUTs in parallel (Dart bounds concurrency). Pure Dart->Rust calls + handle state -- no Rust->Dart callback. - streaming_put_chunk now reads walkable_v8 from config (one fewer param). - cid added as a direct fula-flutter dep (handle stores per-chunk CID hints). Verified: fula-flutter compiles native (the Send+Sync gate for FRB opaques) AND wasm32; the fula-client round-trip test still passes byte-exact. The handle drives the exact sequence that test proves end-to-end. Dart bindings are FRB-codegen'd at publish; the live wasm path is validated at P6 (browser e2e). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * P2: real-server streaming upload e2e -- PASSES byte-exact (50MB / 200 chunks) Real-gateway e2e (#[ignore]; run with the Mode A creds): drives the streaming sequence (begin -> plan-only encoder -> finalize_plan -> put_chunk loop -> finish) for a 50 MB / 200-chunk file, downloads via get_object_flat, asserts byte-exact. 200 chunks @ 256 KB pushes the index metadata past the 16 KB header budget, so it also exercises header_safe_enc_metadata stripping + body/forest fallback on the real server. Verified on the production gateway: 52,428,800 bytes round-tripped byte-exact in 108s. Complements the hermetic streaming_upload_roundtrip.rs (mock). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes large-file uploads failing on GC-damaged buckets. Workspace
0.6.11 → 0.6.12.Symptom
On a bucket whose forest grew past the v7 sharding threshold, a large (multi-chunk) upload uploads every chunk (200 OK) and then fails at finalize with "failed to upload large file." The trace shows:
followed by the SDK cleaning up the just-uploaded chunks.
Root cause
The large upload tips the forest over the v7 sharding threshold, triggering the one-time v1→v7 migration. Its first write (Step 4) is a server-side
copy_objectof the current forest index to a timestamped backup — and a server-side COPY must read the source. That index object's backing IPFS CID was garbage-collected (a one-off manualipfs repo gc;HEADstill returns the ETag, but a content read 410s), so the copy fails → migration returnsDeferredTransientError→ upload fails. (Confirmed via read-only inspection of the gateway: auto-gc is off, so the damage is static.)Fix (Option B — tolerate Gone)
In the migration's backup step, treat a 410/Gone copy failure as "the source content is already gone, so there is nothing to back up" → log + skip the backup and proceed. v7 is rebuilt faithfully from the already-loaded in-memory v1 forest (monolithic = whole-or-nothing, so no entry is dropped), and with auto-gc off the fresh content-addressed v7 nodes persist. Every other copy error (transient 5xx, throttling, auth, network) still defers — a transient must never masquerade as "gc'd".
Once migrated, the bucket is healthy v7 and never hits this path again — it stops referencing the gc'd v1 blob entirely.
error.rs: newClientError::is_gone()— narrow match onGone/HTTP410/410only.encryption.rs:migrate_v1_to_v7_internalStep 4 skips the backup onis_gone(), defers otherwise.Why skipping the backup is safe
try_v1_backup_fallback, which triggers solely if a future v7 manifest becomes unreadable — and it returnsNonegracefully when no backup exists (surfaces the original error, no crash).Both the built-in advisor and gemini-advisor independently recommended B over re-serializing a backup from memory (Option A), which adds crypto-path code + tech debt for an obsolete v1 format.
Tests
error.rs:is_gone_matches_only_410_gone— Gone/HTTP410/410 ⇒ true; NoSuchKey/PreconditionFailed/HTTP412/InternalError/HTTP500/SlowDown/NotFound ⇒ false (proves the narrow boundary — gate against a transient being skipped).fula-client --features test-fault-injectionsuite green.videos-v8bucket after publish — the migration only runs against the real server (advisory lock + heartbeat), so this scenario isn't wiremock-mockable; the migration E2E tests are#[ignore]real-server for the same reason.🤖 Generated with Claude Code