brotli-compress .socket.facts.json on upload#1291
Draft
Benjamin Barslev Nielsen (barslev) wants to merge 2 commits intov1.xfrom
Draft
brotli-compress .socket.facts.json on upload#1291Benjamin Barslev Nielsen (barslev) wants to merge 2 commits intov1.xfrom
Benjamin Barslev Nielsen (barslev) wants to merge 2 commits intov1.xfrom
Conversation
Compress `.socket.facts.json` with brotli at upload time, just before
`fetchCreateOrgFullScan` POSTs the multipart `/full-scans` body. Coana
keeps writing plain JSON; the local readers
(`extractTier1ReachabilityScanId`, `extractReachabilityErrors`) keep
reading plain JSON; only the on-the-wire bytes between socket-cli and
depscan change. depscan transparently decompresses at the api-v0
multipart ingest boundary in a coordinated change.
Why:
* Mono-repo `.socket.facts.json` files routinely exceed 60MB. On a
representative simple-npm fixture, brotli compresses 150,748 bytes
to 19,971 (~86.8% reduction); production mono-repos see a much
larger absolute saving.
* Coana has a second consumer downstream - teaching coana to write
`.br` would break it. Brotli on the wire-only path keeps coana's
contract invariant.
* `tier1ReachabilityScanId` and reachability-error reporting still
read plain JSON locally; no brotli round-trip on those paths.
* Compression is a transport detail owned by the upload site; cleanup
is one `finally` block.
Implementation:
* `src/utils/coana.mts` - new exported
`compressSocketFactsForUpload(scanPaths)` returning `{ paths, cleanup }`.
Per `.socket.facts.json` entry, `mkdtempSync` a fresh `.socket-br-XXX/`
directory inside the source's parent dir (NOT under `os.tmpdir()` -
see below), `brotliCompressSync` bytes to
`${td}/.socket.facts.json.br`, swap the path. Other paths pass through.
Missing facts paths pass through. Cleanup is idempotent with
`force: true`.
* `src/commands/scan/handle-create-new-scan.mts` - wraps the
`fetchCreateOrgFullScan` call in
`try { compress; upload; } finally { cleanup(); }`. Cleanup runs on
success, throw, or any abort path.
Why the temp lives next to the source:
The SDK computes `path.relative(cwd, brPath)` for the multipart name
field. depscan's multipart ingest (`addStreamEntry`) checks
`containsTraversal(unixified)` and bails on any `..` segment. A
tmpdir-based path resolves to `../../../var/folders/...`, gets dropped
to `unmatchedFiles`, and the SocketFacts content never lands in the
scan. Putting the temp inside `path.dirname(originalFactsPath)`
produces a relative path like `.socket-br-XXX/.socket.facts.json.br` -
traversal-free, ingests cleanly.
Tests:
* `src/utils/coana.test.mts` - 16 cases.
- `compressSocketFactsForUpload` x 5: round-trip JSON via
`brotliDecompressSync`, basename swap to `.socket.facts.json.br`,
non-facts paths pass through, missing facts paths pass through,
cleanup idempotent, mixed-entry order. Pins the contract that the
temp lives in a subdir of the source's parent (traversal-free).
- `extractTier1ReachabilityScanId` x 7: plain JSON, missing file,
missing field, null, empty/blank, trim, numeric coercion.
- `extractReachabilityErrors` x 4: extraction, missing file,
no-components, no-inner-reachability.
This change requires the matching depscan multipart decompression
patch on the receiving side; that change ships first.
Contributor
|
This is rad! |
Previous form wrapped each `.br` in a per-scan `mkdtempSync` subdir
under the source dir for concurrency isolation. That created a
directory-handling asymmetry on depscan's side: a wire path of
`dirA/.socket.facts.json.br` flattened to `.socket.facts.json` at
root via depscan's basename-strip, while plain `dirA/.socket.facts.json`
preserved the dir.
depscan dropped the basename-strip in the corresponding PR; switch
to writing `.br` as a sibling of the source so wire and storage
paths match for both branches. Net effect: a brotli upload from
`<source>/.socket.facts.json` lands at `<source>/.socket.facts.json`
in the manifest tarball - identical to plain.
Concurrent-scan note: coana already writes to a single source path,
so the source `.socket.facts.json` itself is racy under concurrent
runs against the same dir; the sibling `.br` doesn't introduce a
new race that wasn't already there.
* `src/utils/coana.mts`: `compressSocketFactsForUpload` writes
`${p}.br` next to the source; `cleanup()` does
`rm(brPath, { force: true })` per file. Drops `mkdtemp` import
that's no longer used.
* `src/utils/coana.test.mts`: directory-shape assertion replaced
with `swappedPath === ${input}.br` (sibling). First test now
also asserts the source survives `cleanup()`. 16 cases.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brotli-compress
.socket.facts.jsonat upload time. Coana keeps writing plain JSON; only the wire bytes between socket-cli and depscan change. ~85% reduction on the simple-npm fixture; production mono-repo payloads see 60+ MB → ~10 MB.How
compressSocketFactsForUpload(scanPaths)insrc/utils/coana.mts— async + streaming (createReadStream → createBrotliCompress → createWriteStreamviapipeline). For any.socket.facts.jsonin scanPaths, writes a sibling.socket.facts.json.brnext to the source file. Returns swapped paths + cleanup callback.handle-create-new-scan.mtswrapsfetchCreateOrgFullScanintry { compress; upload; } finally { cleanup; }.Why sibling-write (not
os.tmpdir(), not a temp subdir)The SDK passes
path.relative(cwd, brPath)as the multipart name. depscan'saddStreamEntryrejects entries with..traversal — a tmpdir path resolves to../../../var/folders/...and gets dropped tounmatchedFiles. Sibling-write keeps the relative path inside cwd, and keeps directory shape symmetric with the plain.socket.facts.jsonupload (depscan strips the.brsuffix at ingest, so<dir>/.socket.facts.json.brand<dir>/.socket.facts.jsonresolve to the same storage path).Concurrent scans against the same source dir already collide on
.socket.facts.jsonitself (coana writes to a single path), so the sibling.brdoesn't introduce a new race.Coordination
Requires the matching depscan multipart-decompress patch on the receiving side; land that first.
Tests
src/utils/coana.test.mts(sibling-path round-trip, passthrough, idempotent cleanup that leaves the source intact, plus the existingextractTier1ReachabilityScanId/extractReachabilityErrorsmatrix).socket scan create --reachagainstsimple-npm:scan_type=socket_tier1,unmatchedFiles: [], root manifest entries are.socket.facts.jsonandpackage.json(storage path matches the plain upload — dir-preservation symmetry holds).