Skip to content

feat(media): stamp upload attribution on media objects and sidecars#1507

Closed
baxen wants to merge 4 commits into
mainfrom
baxen/media-upload-attribution
Closed

feat(media): stamp upload attribution on media objects and sidecars#1507
baxen wants to merge 4 commits into
mainfrom
baxen/media-upload-attribution

Conversation

@baxen

@baxen baxen commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Context

This PR persists upload attribution at upload time.

What changed

S3 object metadata on the blob PUT (all three upload paths — image, generic file, streaming video):

  • x-amz-meta-buzz-uploader-id — authenticated Blossom uploader pubkey (hex, matching what the relay stores)
  • x-amz-meta-buzz-community-id — host-resolved community UUID

A HEAD on any newly uploaded blob now returns attribution without touching relay internals.

Sidecar (BlobMeta): the same fields as nullable uploader_id / community_id with serde defaults, so pre-attribution sidecars still deserialize and absent values are omitted (not null).

MediaStorage: new put_with_metadata (builder-based single PUT) and put_file_with_metadata. The streaming video path attaches metadata via bucket-level extra_headers rather than the stream builder because rust-s3 only forwards builder headers on the small-file branch; extra_headers are also applied to InitiateMultipartUpload, so metadata survives files above the 8 MiB chunk threshold.

Tests

  • x-amz-meta-* header construction: prefixing, control-character value rejection, invalid key rejection
  • Sidecar backward compatibility: old sidecars (no attribution keys) parse, absent fields are omitted from output, populated fields round-trip
  • cargo test -p buzz-media (44 passed), cargo clippy -p buzz-media -p buzz-relay --all-targets clean

Update — 2026-07-03

Addressed review feedback:

  • Hoisted the S3 attribution metadata names into shared constants and added an attribution_meta(...) helper so the uploader/community keys cannot drift between blob, video, and thumbnail writes.
  • Extended BlobHeadMeta to include the S3 user metadata map returned by head_with_metadata, with a unit test proving buzz-uploader-id / buzz-community-id surface from the rust-s3 HeadObjectResult and a MinIO integration assertion for live round-trip coverage.

Moderation note: blob object metadata is an attribution hint. Because blobs are shared CAS across tenants, object metadata is last-writer-wins across tenant re-uploads, and the same-community idempotent short-circuit keeps the first uploader's stamp. The community-scoped sidecar and MediaUploaded audit log (actor_pubkey / object_id) remain authoritative.

Additional local verification:

  • cargo test -p buzz-media (45 passed, 1 ignored)
  • cargo clippy -p buzz-media --all-targets -- -D warnings

Update — 2026-07-04

Added human-readable labels for moderation consumers:

  • x-amz-meta-buzz-uploader-name / sidecar uploader_name from the uploader's configured profile display name, when known.
  • x-amz-meta-buzz-community-host / sidecar community_host from the full server-resolved tenant host (for example moderation.example.com).

These labels are sanitized/bounded header-safe readability hints only. The authoritative fields remain buzz-uploader-id, buzz-community-id, the community-scoped sidecar, and the MediaUploaded audit log.

Additional local verification:

  • cargo test -p buzz-media (46 passed, 1 ignored)
  • cargo check -p buzz-relay -p buzz-media
  • cargo clippy -p buzz-media -p buzz-relay --all-targets -- -D warnings

Update — 2026-07-04 (follow-up)

Changed the community readability label from host-prefix alias to the full server-resolved tenant hostname, per review:

  • x-amz-meta-buzz-community-host
  • sidecar community_host

Re-ran local verification after the change:

  • cargo test -p buzz-media (46 passed, 1 ignored)
  • cargo check -p buzz-relay -p buzz-media
  • cargo clippy -p buzz-media -p buzz-relay --all-targets -- -D warnings

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71b5ee4d7f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@block block deleted a comment from chatgpt-codex-connector Bot Jul 3, 2026
Media uploads now carry upload attribution so operators and out-of-band
consumers can attribute any stored object without relay internals:

- S3 object metadata on the blob and thumbnail PUTs:
  x-amz-meta-buzz-uploader-id (authenticated Blossom uploader pubkey,
  hex) and x-amz-meta-buzz-community-id (host-resolved community UUID),
  readable from a bare HEAD on the object.
- The same fields on the BlobMeta sidecar (uploader_id / community_id),
  nullable with serde defaults so older sidecars still parse.

The community always comes from the server-resolved TenantContext
(row-zero host binding), never from client input. All three upload
paths are covered: image, generic file, and streaming video (the video
path attaches metadata via bucket extra_headers so it survives
multipart uploads).

Note: blobs are shared content-addressed storage across communities, so
a re-upload of identical bytes under another tenant overwrites the
object metadata with the most recent uploader; the community-scoped
sidecar remains the authoritative per-tenant record.
@baxen baxen force-pushed the baxen/media-upload-attribution branch from 71b5ee4 to 783c549 Compare July 3, 2026 22:34

@tlongwell-block tlongwell-block left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed against the #1321 tenant-fence and #1444 Blossom-auth refactors (with Wren) — this is clean: attribution derives only from the authenticated kind:24242 pubkey and the server-resolved TenantContext, never client input; sidecars stay community-scoped; all three upload paths plus derived thumbnails are covered. Verified in rust-s3 0.37 source that bucket-level x-amz-meta-* extra headers are forwarded to InitiateMultipartUpload (request_trait.rs headers()), so the >8 MiB multipart claim holds.

Two non-blocking asks:

  1. Hoist the metadata key names. "buzz-uploader-id" / "buzz-community-id" are repeated as string literals at three call sites in upload.rs. A pair of consts + a small attribution_meta(uploader_id, community_id) helper would keep the key names from drifting and shrink the call sites.

  2. Prove the read side. The PR writes HEAD-readable metadata, but head_with_metadata / BlobHeadMeta still only surface size — nothing in-tree demonstrates x-amz-meta-buzz-uploader-id actually round-trips through our storage wrapper. Extending BlobHeadMeta with the metadata map (or a MinIO integration assertion) would bless the S3-HEAD moderation use case properly. Fine as a fast-follow if moderation tooling will HEAD S3 directly.

One semantic worth keeping in the PR description for moderation consumers: blob metadata is last-writer-wins across tenants (shared CAS), and the idempotent short-circuit means same-community re-uploads keep the first uploader's stamp. So blob HEAD is an attribution hint; the per-community sidecar and the MediaUploaded audit log (actor_pubkey/object_id) remain the authoritative records. The in-code comment already says this — 👍.

baxen added 3 commits July 3, 2026 16:38
Hoist upload attribution metadata keys and expose S3 user metadata from BlobHeadMeta so HEAD callers can verify the uploader/community stamp round-trips.

Co-authored-by: Bradley Axen <baxen@squareup.com>
Signed-off-by: Bradley Axen <baxen@squareup.com>
Stamp uploader display name and tenant host alias alongside authoritative uploader and community IDs so moderation HEAD metadata is easier to read.

Co-authored-by: Bradley Axen <baxen@squareup.com>
Signed-off-by: Bradley Axen <baxen@squareup.com>
Replace the community alias label with the full server-resolved tenant hostname in S3 metadata and sidecars.

Co-authored-by: Bradley Axen <baxen@squareup.com>
Signed-off-by: Bradley Axen <baxen@squareup.com>
@baxen

baxen commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator Author

Actually will need another approach, CAS complicates this

@baxen baxen closed this Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants