fix: rollback and delete GCS blob if asset DB write fails#701
Closed
jirhiker wants to merge 5 commits into
Closed
Conversation
If session.commit() (or .add/.refresh) raises after gcs_upload has already written the file, the bucket would retain an orphaned object with no Asset row pointing at it. Wrap the DB write in try/except: rollback the session and best-effort delete the blob, then re-raise. A failure to delete the blob is logged but does not mask the original exception. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0d30a1de1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens POST /asset/upload-and-record to prevent orphaned GCS objects when the database write fails after a successful upload, by attempting to roll back the DB transaction and best-effort deleting the uploaded blob.
Changes:
- Wraps the
Asset+ association DB write/commit/refresh in atry/except. - On DB failure, performs
session.rollback()and attempts a best-effort GCS blob delete (logged on failure), then re-raises the original exception.
…Asset still references it gcs_upload deduplicates by content hash: when an object with the hash-named blob already exists, it skips the upload and returns the existing blob_name. The previous cleanup-on-failure path would delete that shared object, breaking Assets owned by other Things that point at the same hash-named blob. - gcs_upload now also returns a `created` flag indicating whether this call actually wrote the blob. All callers updated to unpack three values. - In upload_and_record_asset, on DB-write failure: only delete the blob when (a) this request created it AND (b) no Asset row references storage_path == blob_name after rollback. If another Asset still references it, log and leave it alone. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
session.refresh(asset) runs AFTER session.commit() has succeeded. Keeping it inside the cleanup except block meant a transient read failure during refresh would trigger blob deletion while the Asset row was already committed, leaving the DB pointing at a missing GCS object. Move refresh outside the try so cleanup is scoped to pre-commit failures only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Wrap session.rollback() in its own try/except. A rollback failure is logged but no longer masks the original DB exception, which is re-raised at the end of the handler. - Wrap the reference-count query in its own try/except. If the SELECT itself fails after rollback, we cannot confirm the blob is unreferenced, so default to keeping it (sentinel object → not None). Better to leave a possible orphan blob than delete one another Asset still points at. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wrap the cleanup block in upload_and_record_asset in one outer try/except so any failure inside cleanup (rollback, reference query, bucket.blob(), delete) is logged but cannot mask the original commit exception that the request needs to surface. Add three direct-call tests in tests/test_asset.py that mock the session and bucket: - test_upload_and_record_cleans_blob_when_created_and_db_fails: asserts rollback runs, blob.delete is called, and the original commit exception is re-raised. - test_upload_and_record_skips_blob_delete_when_blob_was_preexisting: with gcs_upload reporting created=False (hash dedup hit), the cleanup must NOT delete the (possibly shared) blob. - test_upload_and_record_reraises_original_exception_when_rollback_fails: rollback raising must not replace the original commit exception. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment on lines
+315
to
+339
| try: | ||
| still_referenced = session.scalars( | ||
| select(Asset).where(Asset.storage_path == blob_name) | ||
| ).first() | ||
| except Exception: | ||
| logger.warning( | ||
| "Could not verify blob references; skipping cleanup for %s", | ||
| blob_name, | ||
| exc_info=True, | ||
| ) | ||
| still_referenced = object() # sentinel: assume referenced | ||
| if still_referenced is None: | ||
| try: | ||
| await run_in_threadpool(bucket.blob(blob_name).delete) | ||
| except Exception: | ||
| logger.warning( | ||
| "Failed to clean up uploaded blob after DB failure: %s", | ||
| blob_name, | ||
| exc_info=True, | ||
| ) | ||
| else: | ||
| logger.info( | ||
| "Skipping blob cleanup; another Asset still references %s", | ||
| blob_name, | ||
| ) |
Comment on lines
23
to
+27
| thing: Thing, | ||
| name: str, | ||
| **asset_args, | ||
| ) -> tuple[str, str]: | ||
| uri, blob_name = gcs_upload(ff, bucket) | ||
| uri, blob_name, _created = gcs_upload(ff, bucket) |
Comment on lines
+534
to
+535
| session_mock = MagicMock() | ||
| session_mock.get.return_value = MagicMock(id=1) # Thing exists |
Comment on lines
+617
to
+618
| session_mock = MagicMock() | ||
| session_mock.get.return_value = MagicMock(id=1) |
Comment on lines
+526
to
+529
| """ | ||
| Drive upload_and_record_asset with a mocked session whose commit() raises. | ||
| Returns (raised_exc, session_mock, bucket_mock, blob_mock). | ||
| """ |
Comment on lines
+577
to
+581
| def test_upload_and_record_cleans_blob_when_created_and_db_fails(): | ||
| """ | ||
| DB commit failure path: rollback runs, blob.delete is called, original | ||
| commit exception is re-raised (cleanup must not mask it). | ||
| """ |
Member
Author
|
Changes migrated into #702. Closing in favor of that PR. |
jirhiker
added a commit
that referenced
this pull request
Jun 8, 2026
### Why Two related groups of fixes, combined here (asset changes migrated from #701, which is now closed): **CD workflow indentation** - `steps:` blocks in `CD_staging.yml`, `CD_production.yml`, `CD_testing.yml` had list items at the same column as `steps:`. Valid YAML but unconventional; confused review tooling. **Asset upload orphan-blob safety** (`POST /asset/upload-and-record`) - File was uploaded to GCS before validating `thing_id` and before the DB write; a failure left an orphaned blob with no Asset row. ### How **Workflows** - Reindent all three CD workflows to standard 2-space style (`git diff -w` ≈ no-op on deploy logic). - Join a broken multi-line `git tag -m "..."` in CD_testing.yml. - Includes Copilot autofix commits on the workflow files. **Asset upload** - `gcs_upload` now returns `(uri, blob_name, created)`; `created=False` on hash-dedup hit. All callers updated. - On DB-write failure: rollback, then delete the blob **only** when this request created it AND no Asset row still references it (blobs are content-hash-named and can be shared across Things). - Entire cleanup path wrapped so rollback / reference-query / delete failures are logged but never mask the original commit exception. - `session.refresh()` moved outside the cleanup block (post-commit; must not trigger deletion). - 3 new tests in `tests/test_asset.py` cover the failure paths. ### Notes - **Merge with "Create a merge commit"** if it touches release flow; otherwise standard squash is fine since this is a feature branch into staging. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
POST /asset/upload-and-recordcallsgcs_uploadbeforesession.commit(). If the commit (orsession.add/session.refresh) raises, the file stays in the bucket with noAssetrow pointing at it — same orphan class as the 409 case fixed in #698, but on the DB-failure path.How
try/except(api/asset.py:283).session.rollback()+ best-effortbucket.blob(blob_name).delete()in the threadpool, then re-raise the original exception.logger.warning(..., exc_info=True)and do not mask the original error.Notes
gcs_uploadand a failing commit.