Skip to content

fix: rollback and delete GCS blob if asset DB write fails#701

Closed
jirhiker wants to merge 5 commits into
stagingfrom
fix/asset-upload-cleanup-on-db-failure
Closed

fix: rollback and delete GCS blob if asset DB write fails#701
jirhiker wants to merge 5 commits into
stagingfrom
fix/asset-upload-cleanup-on-db-failure

Conversation

@jirhiker

@jirhiker jirhiker commented Jun 8, 2026

Copy link
Copy Markdown
Member

Why

  • POST /asset/upload-and-record calls gcs_upload before session.commit(). If the commit (or session.add/session.refresh) raises, the file stays in the bucket with no Asset row pointing at it — same orphan class as the 409 case fixed in #698, but on the DB-failure path.

How

  • Wrap the DB write in try/except (api/asset.py:283).
  • On failure: session.rollback() + best-effort bucket.blob(blob_name).delete() in the threadpool, then re-raise the original exception.
  • Blob-delete failures are logged with logger.warning(..., exc_info=True) and do not mask the original error.

Notes

  • Best-effort cleanup only — if the GCS delete also fails (e.g. transient outage), the blob stays. Logged so it can be reaped later.
  • No new test added — would need to mock both gcs_upload and a failing commit.

If session.commit() (or .add/.refresh) raises after gcs_upload has
already written the file, the bucket would retain an orphaned object
with no Asset row pointing at it. Wrap the DB write in try/except:
rollback the session and best-effort delete the blob, then re-raise.
A failure to delete the blob is logged but does not mask the original
exception.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 8, 2026 17:24

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0d30a1de1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread api/asset.py Outdated
Comment thread api/asset.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens POST /asset/upload-and-record to prevent orphaned GCS objects when the database write fails after a successful upload, by attempting to roll back the DB transaction and best-effort deleting the uploaded blob.

Changes:

  • Wraps the Asset + association DB write/commit/refresh in a try/except.
  • On DB failure, performs session.rollback() and attempts a best-effort GCS blob delete (logged on failure), then re-raises the original exception.

Comment thread api/asset.py
jirhiker and others added 2 commits June 8, 2026 11:33
…Asset still references it

gcs_upload deduplicates by content hash: when an object with the
hash-named blob already exists, it skips the upload and returns the
existing blob_name. The previous cleanup-on-failure path would
delete that shared object, breaking Assets owned by other Things
that point at the same hash-named blob.

- gcs_upload now also returns a `created` flag indicating whether
  this call actually wrote the blob. All callers updated to unpack
  three values.
- In upload_and_record_asset, on DB-write failure: only delete the
  blob when (a) this request created it AND (b) no Asset row
  references storage_path == blob_name after rollback. If another
  Asset still references it, log and leave it alone.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
session.refresh(asset) runs AFTER session.commit() has succeeded.
Keeping it inside the cleanup except block meant a transient read
failure during refresh would trigger blob deletion while the Asset
row was already committed, leaving the DB pointing at a missing
GCS object. Move refresh outside the try so cleanup is scoped to
pre-commit failures only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 8, 2026 17:36
- Wrap session.rollback() in its own try/except. A rollback failure
  is logged but no longer masks the original DB exception, which is
  re-raised at the end of the handler.
- Wrap the reference-count query in its own try/except. If the SELECT
  itself fails after rollback, we cannot confirm the blob is
  unreferenced, so default to keeping it (sentinel object → not None).
  Better to leave a possible orphan blob than delete one another
  Asset still points at.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread api/asset.py
Comment thread api/asset.py
Wrap the cleanup block in upload_and_record_asset in one outer
try/except so any failure inside cleanup (rollback, reference query,
bucket.blob(), delete) is logged but cannot mask the original commit
exception that the request needs to surface.

Add three direct-call tests in tests/test_asset.py that mock the
session and bucket:

- test_upload_and_record_cleans_blob_when_created_and_db_fails:
  asserts rollback runs, blob.delete is called, and the original
  commit exception is re-raised.

- test_upload_and_record_skips_blob_delete_when_blob_was_preexisting:
  with gcs_upload reporting created=False (hash dedup hit), the
  cleanup must NOT delete the (possibly shared) blob.

- test_upload_and_record_reraises_original_exception_when_rollback_fails:
  rollback raising must not replace the original commit exception.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 8, 2026 17:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

Comment thread api/asset.py
Comment on lines +315 to +339
try:
still_referenced = session.scalars(
select(Asset).where(Asset.storage_path == blob_name)
).first()
except Exception:
logger.warning(
"Could not verify blob references; skipping cleanup for %s",
blob_name,
exc_info=True,
)
still_referenced = object() # sentinel: assume referenced
if still_referenced is None:
try:
await run_in_threadpool(bucket.blob(blob_name).delete)
except Exception:
logger.warning(
"Failed to clean up uploaded blob after DB failure: %s",
blob_name,
exc_info=True,
)
else:
logger.info(
"Skipping blob cleanup; another Asset still references %s",
blob_name,
)
Comment thread services/asset_helper.py
Comment on lines 23 to +27
thing: Thing,
name: str,
**asset_args,
) -> tuple[str, str]:
uri, blob_name = gcs_upload(ff, bucket)
uri, blob_name, _created = gcs_upload(ff, bucket)
Comment thread tests/test_asset.py
Comment on lines +534 to +535
session_mock = MagicMock()
session_mock.get.return_value = MagicMock(id=1) # Thing exists
Comment thread tests/test_asset.py
Comment on lines +617 to +618
session_mock = MagicMock()
session_mock.get.return_value = MagicMock(id=1)
Comment thread tests/test_asset.py
Comment on lines +526 to +529
"""
Drive upload_and_record_asset with a mocked session whose commit() raises.
Returns (raised_exc, session_mock, bucket_mock, blob_mock).
"""
Comment thread tests/test_asset.py
Comment on lines +577 to +581
def test_upload_and_record_cleans_blob_when_created_and_db_fails():
"""
DB commit failure path: rollback runs, blob.delete is called, original
commit exception is re-raised (cleanup must not mask it).
"""
@jirhiker

jirhiker commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

Changes migrated into #702. Closing in favor of that PR.

@jirhiker jirhiker closed this Jun 8, 2026
@jirhiker jirhiker deleted the fix/asset-upload-cleanup-on-db-failure branch June 8, 2026 18:05
jirhiker added a commit that referenced this pull request Jun 8, 2026
### Why
Two related groups of fixes, combined here (asset changes migrated from
#701, which is now closed):

**CD workflow indentation**
- `steps:` blocks in `CD_staging.yml`, `CD_production.yml`,
`CD_testing.yml` had list items at the same column as `steps:`. Valid
YAML but unconventional; confused review tooling.

**Asset upload orphan-blob safety** (`POST /asset/upload-and-record`)
- File was uploaded to GCS before validating `thing_id` and before the
DB write; a failure left an orphaned blob with no Asset row.

### How
**Workflows**
- Reindent all three CD workflows to standard 2-space style (`git diff
-w` ≈ no-op on deploy logic).
- Join a broken multi-line `git tag -m "..."` in CD_testing.yml.
- Includes Copilot autofix commits on the workflow files.

**Asset upload**
- `gcs_upload` now returns `(uri, blob_name, created)`; `created=False`
on hash-dedup hit. All callers updated.
- On DB-write failure: rollback, then delete the blob **only** when this
request created it AND no Asset row still references it (blobs are
content-hash-named and can be shared across Things).
- Entire cleanup path wrapped so rollback / reference-query / delete
failures are logged but never mask the original commit exception.
- `session.refresh()` moved outside the cleanup block (post-commit; must
not trigger deletion).
- 3 new tests in `tests/test_asset.py` cover the failure paths.

### Notes
- **Merge with "Create a merge commit"** if it touches release flow;
otherwise standard squash is fine since this is a feature branch into
staging.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants