Implement Bun lockfile parser and enhance scanner output rendering#14
Implement Bun lockfile parser and enhance scanner output rendering#14miccy merged 30 commits intodev/preview-v2from
Conversation
…ng, and refactor scanner output to string rendering
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds OpenAI-backed advisory ingestion with Zod validation, expands a filesystem-backed threat catalog with many structured JSON records, refactors the CLI to delegate to a bundled/runCli entry, and implements pnpm and Bun lockfile parsers plus scanner/formatter/type tightening across the scanner. Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI as "CLI\n(runCli)"
participant Scanner as "Scanner\n(scan)"
participant Parsers as "Parsers\n(parseLockfiles)"
participant Detectors as "Detectors\n(detectInjection)"
participant ThreatDB as "Threat DB\n(threats.ts)"
participant Engine as "Engine\n(ingestAdvisory)"
participant OpenAI as "OpenAI API"
participant Validator as "Zod\nvalidateThreatObject"
participant Output as "Formatter\n(formatText/sarif/json)"
User->>CLI: run scan (e.g. scan . --format json)
CLI->>Scanner: scan(target)
Scanner->>Parsers: parseLockfiles(target)
Parsers->>Parsers: try npm, yarn, pnpm, bun, requirements
Parsers-->>Scanner: LockfilePackage[]
Scanner->>Detectors: detectInjection(packages, target)
Detectors->>ThreatDB: query malicious/phantom matches
ThreatDB-->>Detectors: matches
Detectors-->>Scanner: findings
alt optional advisory enrichment
CLI->>Engine: ingestAdvisory(advisory_text)
Engine->>OpenAI: POST /v1/chat/completions (JSON-mode)
OpenAI-->>Engine: JSON response
Engine->>Validator: validateThreatObject(candidate)
Validator-->>Engine: ThreatObject | null
Engine-->>ThreatDB: (optional) augment DB
end
Scanner->>Output: format result
Output-->>CLI: formatted output
CLI->>User: stdout / write file / exit code
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b1d41cfcb6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Pull request overview
This PR expands wormsCTRL’s supply-chain scanning and threat knowledge-base capabilities by adding additional lockfile parsers (pnpm + Bun), improving CLI/output rendering workflows, and introducing an engine-side advisory ingestion + validation skeleton alongside new threat catalog entries and documentation updates.
Changes:
- Added pnpm + Bun lockfile parsing and wired both into scanner dispatch and detectors.
- Introduced a new scanner CLI entry (
runCli) with--format/--output/--threatsand updated text/SARIF rendering. - Added engine threat-object schema validation (Zod) plus advisory ingestion/prompt scaffolding; added multiple documented threat JSON profiles and refreshed READMEs/CHANGELOG.
Reviewed changes
Copilot reviewed 31 out of 33 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/scanner/src/types.ts | Makes lockfile entries/packages more flexible via shared types. |
| packages/scanner/src/scan.ts | Adds parser dispatch for pnpm/Bun and switches CLI output to text renderer. |
| packages/scanner/src/parsers/yarn.ts | Allows parsing Yarn lockfile by path or directory. |
| packages/scanner/src/parsers/pnpm.ts | Adds pnpm-lock.yaml parser using js-yaml. |
| packages/scanner/src/parsers/npm.ts | Refactors npm lockfile resolver/parsing with path-or-dir support. |
| packages/scanner/src/parsers/js-yaml.d.ts | Adds minimal TS declarations for js-yaml. |
| packages/scanner/src/parsers/index.ts | Expands unified parseLockfiles to include pnpm and Bun. |
| packages/scanner/src/parsers/bun.ts | Adds Bun lockfile parsing (bun.lock) and bun.lockb fallback. |
| packages/scanner/src/output/text.ts | Converts text output to a string-rendering pipeline and keeps legacy printing API. |
| packages/scanner/src/output/sarif.ts | Tightens SARIF typing and return types. |
| packages/scanner/src/index.ts | Exposes runCli from scanner public exports. |
| packages/scanner/src/detectors/injection.ts | Improves typing for parsed package.json manifest. |
| packages/scanner/src/detectors/index.ts | Switches detector input to unified parseLockfiles dispatcher. |
| packages/scanner/src/cli.ts | New CLI implementation with format/output/threat catalog features. |
| packages/scanner/package.json | Adds js-yaml dependency for pnpm parsing. |
| packages/ioc/threats/xz-utils-2024.json | Adds a structured threat record for xz-utils backdoor incident. |
| packages/ioc/threats/ua-parser-js-2021.json | Adds a structured threat record for ua-parser-js compromise. |
| packages/ioc/threats/node-ipc-2022.json | Adds a structured threat record for node-ipc protestware incident. |
| packages/ioc/threats/event-stream-2018.json | Adds a structured threat record for event-stream compromise. |
| packages/ioc/threats/ctx-2022.json | Adds a structured threat record for ctx PyPI takeover incident. |
| packages/ioc/index.js | Loads threat catalog JSON dynamically and exposes it via KB APIs. |
| packages/engine/src/validate.ts | Adds Zod schema + validation helper for threat objects. |
| packages/engine/src/types.ts | Introduces ThreatObject and related enums/types. |
| packages/engine/src/prompt.ts | Adds a strict JSON-only extraction system prompt for ingestion. |
| packages/engine/src/ingest.ts | Adds advisory ingestion flow (URL/text), OpenAI extraction call, and schema validation. |
| packages/engine/src/index.ts | Exports new ingestion/prompt/validation utilities. |
| packages/engine/package.json | Adds zod dependency for schema validation. |
| cs/README.md | Adds Czech documentation mirroring architecture/usage/threat DB. |
| apps/cli/tsconfig.json | Adds Bun typing configuration to CLI TypeScript settings. |
| apps/cli/package.json | Switches dev script to Bun runtime. |
| apps/cli/bin/cli.js | Replaces old CLI implementation with a thin wrapper around scanner runCli. |
| README.md | Rewrites main README with architecture, usage, AI ingestion, and threat DB sections. |
| CHANGELOG.md | Adds an Unreleased section describing these new capabilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export function getThreatProfile(id) { | ||
| const map = { | ||
| 'shai-hulud-2.0': () => import('./archived/shai-hulud/threat-model.json'), | ||
| if (id === 'shai-hulud-2.0') { | ||
| return import('./archived/shai-hulud/threat-model.json') | ||
| } | ||
| return map[id]?.() ?? null | ||
|
|
There was a problem hiding this comment.
getThreatProfile returns a dynamic import(...) (Promise) for shai-hulud-2.0, but returns a plain object (or null) for all other IDs. This makes the API return type inconsistent and will break callers that expect a synchronous object. Consider making getThreatProfile consistently async (always returning a Promise) or returning the already-imported shaiHulud2 object synchronously for that ID.
| interface NpmLockfile { | ||
| lockfileVersion?: number | ||
| packages?: Record<string, LockfileEntry & { version?: string }> | ||
| packages?: Record<string, LockfilePackage & { version?: string }> | ||
| dependencies?: Record< |
There was a problem hiding this comment.
NpmLockfile.packages is typed as Record<string, LockfilePackage & { version?: string }>, but lockfile packages entries do not include name (and LockfilePackage requires it). This type is misleading and can cause incorrect assumptions in future edits. Prefer a dedicated type for raw package-lock entries (e.g., LockfileEntry & { version?: string; requires?: ... }).
| const start = Date.now() | ||
| const findings: Finding[] = [] | ||
| const findings = [] | ||
|
|
There was a problem hiding this comment.
findings is currently inferred as any[], which loses type-safety and can mask invalid finding objects being returned in ScanResult.findings. Import Finding and type this as const findings: Finding[] = [] so the returned ScanResult stays aligned with types.ts.
| const result = spawnSync('bun', ['pm', 'ls', '--all'], { | ||
| cwd: bunProjectDirectory(targetDirOrPath), | ||
| encoding: 'utf8', | ||
| stdio: ['ignore', 'pipe', 'ignore'], | ||
| }) |
There was a problem hiding this comment.
The bun.lockb fallback runs spawnSync('bun', ['pm','ls','--all']) without a timeout and without checking result.error. If bun pm ls hangs or the binary is missing, scans can block indefinitely or behave inconsistently. Consider adding a timeout, handling result.error, and/or providing a non-exec parsing strategy for bun.lockb to keep scanning safe and predictable.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b1d41cfcb6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…data and improve scanner path resolution
…egexp' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Miccy <code@miccy.dev>
|
@codex review |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 42 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| parsedPackages.push({ | ||
| ...parsedKey, | ||
| resolved: entry.resolution?.tarball ?? entry.resolution?.integrity, | ||
| engines: entry.engines, | ||
| requires: entry.dependencies, | ||
| lockfileVersion: lock.lockfileVersion ?? 'unknown', | ||
| }) |
There was a problem hiding this comment.
pnpm's resolution.integrity is being assigned into the normalized resolved field when tarball is absent. That loses the semantic distinction between a download URL (resolved) and a hash (integrity). Consider mapping resolution.tarball -> resolved and resolution.integrity -> integrity (leaving resolved undefined if no tarball is present).
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
packages/scanner/src/detectors/injection.ts (1)
5-23:⚠️ Potential issue | 🟠 MajorInclude
optionalDependenciesas declared dependencies.Valid packages declared in
optionalDependencieswill currently be treated as lockfile injections, creating false high-severity findings.🛠️ Proposed fix
interface PackageJsonManifest { dependencies?: Record<string, string> devDependencies?: Record<string, string> + optionalDependencies?: Record<string, string> peerDependencies?: Record<string, string> } @@ for (const deps of [ pkg.dependencies ?? {}, pkg.devDependencies ?? {}, + pkg.optionalDependencies ?? {}, pkg.peerDependencies ?? {}, ]) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/scanner/src/detectors/injection.ts` around lines 5 - 23, The loadPackageJsonDeps function currently ignores optionalDependencies, causing valid optional packages to be flagged as lockfile injections; update the PackageJsonManifest to include optionalDependencies?: Record<string,string> and include pkg.optionalDependencies (e.g. add it to the array iterated over in loadPackageJsonDeps or merge it into declared before returning) so optionalDependencies are treated the same as dependencies/devDependencies/peerDependencies when building the declared Set.packages/scanner/src/parsers/npm.ts (1)
63-68:⚠️ Potential issue | 🟠 MajorExtract package names from the last
node_modules/segment.Nested npm lock entries like
node_modules/a/node_modules/bcurrently becomea/node_modules/b, which can trigger false high-severity injection findings downstream. Use the last segment so the parsed package name isb.🐛 Proposed fix
for (const [pkgPath, entry] of Object.entries(lock.packages)) { if (!pkgPath) continue - const name = pkgPath.replace(/^node_modules\//, '') + const marker = 'node_modules/' + const markerIndex = pkgPath.lastIndexOf(marker) + const name = markerIndex === -1 ? pkgPath : pkgPath.slice(markerIndex + marker.length) pkgs.push({ name, version: entry.version ?? '',🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/scanner/src/parsers/npm.ts` around lines 63 - 68, The pkgPath extraction uses pkgPath.replace(/^node_modules\//, '') which leaves nested paths like "a/node_modules/b" instead of just "b"; update the name computation in the loop that iterates Object.entries(lock.packages) (variables: pkgPath, entry, and the pkgs.push call) to take the last node_modules segment (for example by splitting on "node_modules/" and taking the last element or using a regex like /^.*node_modules\//) so nested entries yield the final package name (handles scoped names intact); keep using entry.version ?? '' when pushing to pkgs.packages/scanner/src/output/text.ts (1)
63-128:⚠️ Potential issue | 🟡 MinorANSI escape codes end up in files and non-TTY pipes 🎨
Now that
renderTextreturns a string thatpackages/scanner/src/cli.ts(Lines 229–235) writes verbatim to either stdout or--output <path>, saved reports will contain raw\x1b[31m…sequences, and piping intoless, CI logs, or tooling that doesn't interpret ANSI will produce garbled output. Since this is a scanner whose whole point is to be scriptable/CI-friendly (alongsidejsonandsarif), that's a real UX issue for thetextformat.Two common remedies — either is fine:
- Accept a
color: booleanoption onrenderTextand let the caller decide (cli.tscan enable it only when!parsedArgs.output && process.stdout.isTTY).- Auto-detect via
process.stdout.isTTY/NO_COLORinsiderenderText.✨ Minimal patch: plumb a `color` flag through the renderer
-export function renderText(result: ScanResult, verbose = false): string { +export function renderText(result: ScanResult, verbose = false, color = true): string { + const c = (code: string) => (color ? code : '') + const red = c(RED), amber = c(AMBER), green = c(GREEN), cyan = c(CYAN) + const white = c(WHITE), gray = c(GRAY), bold = c(BOLD), dim = c(DIM), reset = c(RESET) const lines: string[] = [] ...And in
cli.ts:- default: - return formatText(result) + default: + return formatText(result, false, !parsedArgs.output && process.stdout.isTTY === true)(Signature of
formatText/renderTextwould need a matching third arg.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/scanner/src/output/text.ts` around lines 63 - 128, renderText currently always emits ANSI sequences; add a color boolean param to renderText (e.g., renderText(result: ScanResult, verbose = false, color = true) ) and update the function to only interpolate the ANSI constants (CYAN, RED, GREEN, AMBER, DIM, BOLD, WHITE, GRAY, RESET) when color is true (otherwise use empty strings), then update the caller in the CLI to pass color: process.stdout.isTTY && !parsedArgs.output (or otherwise based on NO_COLOR) so saved files and non-TTY pipes get plain text; keep all other behavior (sorting, summary, verbose/location handling) unchanged.
🧹 Nitpick comments (7)
packages/scanner/src/output/sarif.ts (1)
43-46: Export the SARIF report type used by the public formatter API.
formatSarifandtoSarifare exported withSarifReportas their return type, so the type should be exportable too.🛠️ Proposed fix
-interface SarifReport { +export interface SarifReport { version: '2.1.0' runs: SarifRun[] }Also applies to: 86-86, 144-144
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/scanner/src/output/sarif.ts` around lines 43 - 46, The SarifReport interface currently isn't exported but is used as the return type of the public functions formatSarif and toSarif; update the declaration to export the SarifReport interface (i.e., change "interface SarifReport" to "export interface SarifReport") and likewise export any other SARIF types referenced by those exported functions so the public formatter API's types are available to consumers (ensure you update the SarifReport declaration and any duplicate type declarations referenced by formatSarif/toSarif).packages/scanner/src/parsers/index.ts (1)
13-18: Deduplicate packages across lockfile parsers.Repos often carry multiple lockfiles during migrations; concatenating every parser result can produce duplicate scanner findings for the same
name@version.♻️ Proposed dedupe before returning aggregated packages
export function parseLockfiles(targetDir: string): LockfilePackage[] { - return [ + const packages = [ ...parseNpmLockfile(targetDir), ...parseYarnLockfile(targetDir), ...parsePnpmLockfile(targetDir), ...parseBunLockfile(targetDir), ] + + const seen = new Set<string>() + return packages.filter((pkg) => { + const key = `${pkg.name}@${pkg.version}` + if (seen.has(key)) { + return false + } + seen.add(key) + return true + }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/scanner/src/parsers/index.ts` around lines 13 - 18, The current return concatenates results from parseNpmLockfile, parseYarnLockfile, parsePnpmLockfile, and parseBunLockfile and can yield duplicate package entries; replace the direct spread return with logic that aggregates all parsed arrays, then deduplicates by a stable key (e.g., `${pkg.name}@${pkg.version}`) using a Set or Map to ensure only one entry per name@version is kept before returning the final array (operate on the combinedResults from parseNpmLockfile/parseYarnLockfile/parsePnpmLockfile/parseBunLockfile).packages/scanner/src/cli.ts (2)
179-190: Nit:threats.sort(...)mutates the catalog array 🪞
printThreatCatalogsorts the caller's array in place. Right now that's harmless becauseloadThreats()returns a fresh array per call — but if anyone ever memoizes that loader (e.g. to matchpackages/ioc/index.js), in-place sorting becomes a sneaky shared-state bug. Cheap to future-proof:- for (const threat of threats.sort((left, right) => right.year - left.year)) { + for (const threat of [...threats].sort((left, right) => right.year - left.year)) {(Or
threats.toSorted(...)if your target supports it.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/scanner/src/cli.ts` around lines 179 - 190, printThreatCatalog currently mutates its input by calling threats.sort(...); to avoid shared-state bugs, make a non-mutating copy before sorting (e.g., use a shallow copy like [...threats] and sort that, or use threats.toSorted(...) when supported) and iterate over the sorted copy instead; update the code around function printThreatCatalog and the usage of threats.sort to use the copied/immutable-sorted array.
120-137:--outputhappily accepts values that look like flags 🚩
--output --format=jsonwill treat--format=jsonas the output path, write./--format=jsonon disk, and silently swallow the--format. Cheap guard that matches typical CLI conventions:if (arg === '--output') { const nextArg = args[index + 1] - if (!nextArg) { + if (!nextArg || nextArg.startsWith('--')) { throw new Error('Missing value for --output.') } output = nextArg index++ continue }(Same consideration applies to
--formatwith a missing value, thoughparseFormatalready rejects unknown strings, so the blast radius is smaller there.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/scanner/src/cli.ts` around lines 120 - 137, The --output parsing accepts values that look like flags (e.g. "--output --format=json") and treats the next flag as the path; update the parsing in the CLI handler to validate that the nextArg or the value from the '--output=' form is not a flag (check startsWith('-') or startsWith('--')) and throw the existing "Missing value for --output." error if it is; adjust the logic around the block that sets output (the if (arg === '--output') branch that reads args[index + 1] and the if (arg.startsWith('--output=')) branch that slices the value) so both branches reject values beginning with a dash before assigning to output.packages/ioc/index.js (1)
21-69: Friendly tip: the threat catalog gets re-read from disk on every call 📚
loadThreatCatalog()does areaddirSync+ N ×readFileSync+JSON.parseevery single timegetArchivedThreats()orgetThreatProfile(id)is invoked. For a CLI that callsgetThreatProfilein a loop (e.g., one per finding), that's O(findings × files) redundant I/O. Caching the parse result in a module-level variable is a two-line fix and keeps the hot path allocation-free.⚡ Proposed memoized loader
-function loadThreatCatalog() { - try { - return readdirSync(THREATS_DIR) - .filter((entry) => entry.endsWith('.json')) - .map((entry) => JSON.parse(readFileSync(join(THREATS_DIR, entry), 'utf8'))) - } catch { - return [] - } -} +let cachedCatalog +function loadThreatCatalog() { + if (cachedCatalog) return cachedCatalog + try { + cachedCatalog = readdirSync(THREATS_DIR) + .filter((entry) => entry.endsWith('.json')) + .map((entry) => JSON.parse(readFileSync(join(THREATS_DIR, entry), 'utf8'))) + } catch { + cachedCatalog = [] + } + return cachedCatalog +}Also minor:
getArchivedThreats()will emit two entries if thethreats/directory ever contains a file withid: 'shai-hulud-2.0', since the static entry and the catalog entry aren't de-duplicated. Probably not an issue today, but afilter((t) => t.id !== 'shai-hulud-2.0')on the spread would make it future-proof.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/ioc/index.js` around lines 21 - 69, The loader currently re-reads and reparses the threats directory on every call: memoize loadThreatCatalog by storing its parsed array in a module-level variable (e.g., let cachedThreats = null) and have loadThreatCatalog return the cached value if present, otherwise read/parse and set the cache; update getArchivedThreats and getThreatProfile to use the cached loader. Additionally, when building the array in getArchivedThreats(), filter out any catalog entries with id 'shai-hulud-2.0' before spreading to avoid duplicate entries.packages/scanner/src/parsers/bun.ts (2)
18-28: Tiny edge case inparseDescriptor👀
lastIndexOf('@')correctly handles scoped packages (@scope/name@1.0.0), which is great. But Bun lockfile descriptors can also encode non-semver refs, e.g.uWebSockets.js@github:uNetworking/uWebSockets.js#6609a88(straight from the Bun blog example), which this will happily treat as{ name: 'uWebSockets.js', version: 'github:uNetworking/uWebSockets.js#6609a88' }. That's probably fine for name-based injection detection, but if the downstream detector ever assumesversionis semver-ish you'll get surprises — consider either normalizing git/url descriptors or tagging them with asourcefield.Not blocking — just something to keep on the radar.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/scanner/src/parsers/bun.ts` around lines 18 - 28, The parseDescriptor function currently splits name/version by the last '@' but treats all right-hand parts as a semver-like version; update parseDescriptor to detect non-registry refs (e.g., git/github url refs containing ":" or "#" or known prefixes like "github:") and return an augmented shape such as { name, version, source } where source is 'registry' for semver-like values and 'git'/'url' for git/URL refs (leave version as the raw ref). Ensure the function signature and all call sites that rely on parseDescriptor (parseDescriptor) are updated to accept the new source field or continue using version if source is absent.
30-58: Use a proper JSONC parser instead of regex-based extraction for better robustnessThe current regex-based parsing works fine with the actual bun.lock structure, but it's unnecessarily fragile. While the file currently has no comments and the
"packages"section correctly ends at EOF, JSONC officially supports line comments (//), block comments (/* */), and trailing commas—all of which could theoretically break the regex if the file structure changes.The entry extraction regex (
/^\s+"[^"]+":\s+\["([^"]+)",/gm) is also brittle; it assumes entries always have exactly 4 elements with the first being a string, but Bun could shift this format anytime.Suggested approach
function parseBunTextLockfile(content: string): ParsedPackage[] { - const match = content.match(/"packages":\s*{([\s\S]*)\n}\s*$/) - if (!match) { + let parsed: { packages?: Record<string, unknown[]> } + try { + parsed = JSON.parse(content) + } catch { return [] } - - const packagesSection = match[1] - const packageEntries = packagesSection.matchAll(/^\s+"[^"]+":\s+\["([^"]+)",/gm) - const parsedPackages: ParsedPackage[] = [] - - for (const packageEntry of packageEntries) { - const descriptor = packageEntry[1] - if (!descriptor) { - continue - } - + const packages = parsed.packages ?? {} + const parsedPackages: ParsedPackage[] = [] + for (const value of Object.values(packages)) { + const descriptor = Array.isArray(value) ? value[0] : undefined + if (typeof descriptor !== 'string') continue const parsedDescriptor = parseDescriptor(descriptor) - if (!parsedDescriptor) { - continue - } - - parsedPackages.push({ - ...parsedDescriptor, - lockfileVersion: 1, - }) + if (parsedDescriptor) parsedPackages.push({ ...parsedDescriptor, lockfileVersion: 1 }) } return parsedPackages }Since the code runs in a Bun runtime,
JSON.parsehandles the trailing commas natively. For cross-runtime support, use npm'sjsonc-parser.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/scanner/src/parsers/bun.ts` around lines 30 - 58, parseBunTextLockfile currently uses fragile regexes to extract the "packages" section and descriptors; replace that with a real JSONC parse and robust traversal: parse the lockfile content with a JSONC-capable parser (or JSON.parse when running under Bun), read the top-level "packages" object, iterate Object.values or Object.entries of that object, for each entry ensure the first element is a string descriptor before calling parseDescriptor(descriptor), skip entries that don't match expected shapes, and preserve adding lockfileVersion: 1 to each parsed package; this removes the regexes in parseBunTextLockfile and makes parsing resilient to comments, trailing commas, or format shifts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/docs/playwright.config.ts`:
- Around line 21-25: The Playwright webServer configuration (webServer.command
and reuseExistingServer) should run the dev server locally but use a production
build + preview in CI; update the webServer.command to branch on process.env.CI
(e.g., when CI is truthy run your build then preview command like "bun run build
&& bun run preview" or equivalent) and ensure reuseExistingServer is false in CI
so Playwright always spins up the built preview; modify the webServer block
(symbols: webServer, command, reuseExistingServer, timeout) to implement this
CI-aware command and behavior.
In `@package.json`:
- Around line 63-67: The package.json currently sets engines.node to ">=18"
while overrides.vite is forcing Vite 7; update the engines.node entry in
package.json (the "engines.node" field) to match Vite 7's minimum Node
requirement (e.g. use the semver range "^20.19.0 || >=22.12.0") so the workspace
Node engine aligns with overrides.vite and avoids install/build failures; after
updating, run an install to verify no other packages conflict with the new Node
range.
In `@packages/engine/src/ingest.ts`:
- Around line 23-29: The stripHtml function leaves script/style content when
closing tags contain spaces (e.g. "</script >"); update the script and style
regexes to robust patterns that match opening tags with attributes and closing
tags with optional whitespace by using e.g.
/<script\b[^>]*>[\s\S]*?<\/script\s*>/gi and
/<style\b[^>]*>[\s\S]*?<\/style\s*>/gi, keep the generic tag and whitespace
collapses afterward (refer to function stripHtml and its current replace chain)
so all embedded script/style blocks are removed before trimming.
In `@packages/engine/src/validate.ts`:
- Around line 20-23: The schema threatFileArtifactSchema currently permits empty
sha256 strings; tighten validation by either enforcing a 64-character hex digest
for the sha256 field (update threatFileArtifactSchema to validate sha256 with a
64-hex regex or exact length constraint) or make sha256 explicitly optional
(change sha256 to optional in threatFileArtifactSchema) and then update the
threat catalog entries (e.g., replace empty-string sha256 values with
omitted/absent fields) so unknown hashes are modeled as missing rather than
empty strings.
In `@packages/ioc/threats/event-stream-2018.json`:
- Around line 20-30: The file_artifacts entries contain empty sha256 values
which are invalid IOCs; update the two entries for filename "index.js" (path
"node_modules/flatmap-stream/index.js") and "package.json" (path
"node_modules/event-stream/package.json") by either supplying the correct
SHA-256 hashes in their sha256 fields or removing the sha256 property entirely
so the artifacts remain as path-only indicators; ensure you only change the
sha256 key for those objects and keep the filename and path values intact.
In `@packages/ioc/threats/node-ipc-2022.json`:
- Around line 22-36: The JSON entries set "sha256": null which violates the
non-nullable ThreatFileArtifact.sha256 and the Zod threatObjectSchema used by
validateThreatObject; fix by making the artifacts consistent with the schema:
either replace null with an empty string "" in the JSON files (e.g., the
file_artifacts entries in node-ipc-2022.json and ua-parser-js-2021.json) or
update the schema in packages/engine/src/validate.ts (and the ThreatFileArtifact
type in packages/engine/src/types.ts) to z.string().nullable() / string | null
so the validator accepts null — pick one approach and apply it across all three
threat JSON files and the schema/type so they remain consistent.
In `@packages/scanner/src/cli.ts`:
- Around line 225-235: The writeFileSync call in runCli (when parsedArgs.output
is set) can throw and currently lets runCli reject with a stack trace; wrap the
output path resolution and writeFileSync invocation in a try/catch, and on error
print a friendly `[worms-ctrl]` error message (including the error.message and
the attempted parsedArgs.output/format) to stderr and return the same numeric
exit code pattern used elsewhere in runCli (mirror parseScanArgs handling) so
callers receive the formatted error instead of a raw exception.
---
Outside diff comments:
In `@packages/scanner/src/detectors/injection.ts`:
- Around line 5-23: The loadPackageJsonDeps function currently ignores
optionalDependencies, causing valid optional packages to be flagged as lockfile
injections; update the PackageJsonManifest to include optionalDependencies?:
Record<string,string> and include pkg.optionalDependencies (e.g. add it to the
array iterated over in loadPackageJsonDeps or merge it into declared before
returning) so optionalDependencies are treated the same as
dependencies/devDependencies/peerDependencies when building the declared Set.
In `@packages/scanner/src/output/text.ts`:
- Around line 63-128: renderText currently always emits ANSI sequences; add a
color boolean param to renderText (e.g., renderText(result: ScanResult, verbose
= false, color = true) ) and update the function to only interpolate the ANSI
constants (CYAN, RED, GREEN, AMBER, DIM, BOLD, WHITE, GRAY, RESET) when color is
true (otherwise use empty strings), then update the caller in the CLI to pass
color: process.stdout.isTTY && !parsedArgs.output (or otherwise based on
NO_COLOR) so saved files and non-TTY pipes get plain text; keep all other
behavior (sorting, summary, verbose/location handling) unchanged.
In `@packages/scanner/src/parsers/npm.ts`:
- Around line 63-68: The pkgPath extraction uses
pkgPath.replace(/^node_modules\//, '') which leaves nested paths like
"a/node_modules/b" instead of just "b"; update the name computation in the loop
that iterates Object.entries(lock.packages) (variables: pkgPath, entry, and the
pkgs.push call) to take the last node_modules segment (for example by splitting
on "node_modules/" and taking the last element or using a regex like
/^.*node_modules\//) so nested entries yield the final package name (handles
scoped names intact); keep using entry.version ?? '' when pushing to pkgs.
---
Nitpick comments:
In `@packages/ioc/index.js`:
- Around line 21-69: The loader currently re-reads and reparses the threats
directory on every call: memoize loadThreatCatalog by storing its parsed array
in a module-level variable (e.g., let cachedThreats = null) and have
loadThreatCatalog return the cached value if present, otherwise read/parse and
set the cache; update getArchivedThreats and getThreatProfile to use the cached
loader. Additionally, when building the array in getArchivedThreats(), filter
out any catalog entries with id 'shai-hulud-2.0' before spreading to avoid
duplicate entries.
In `@packages/scanner/src/cli.ts`:
- Around line 179-190: printThreatCatalog currently mutates its input by calling
threats.sort(...); to avoid shared-state bugs, make a non-mutating copy before
sorting (e.g., use a shallow copy like [...threats] and sort that, or use
threats.toSorted(...) when supported) and iterate over the sorted copy instead;
update the code around function printThreatCatalog and the usage of threats.sort
to use the copied/immutable-sorted array.
- Around line 120-137: The --output parsing accepts values that look like flags
(e.g. "--output --format=json") and treats the next flag as the path; update the
parsing in the CLI handler to validate that the nextArg or the value from the
'--output=' form is not a flag (check startsWith('-') or startsWith('--')) and
throw the existing "Missing value for --output." error if it is; adjust the
logic around the block that sets output (the if (arg === '--output') branch that
reads args[index + 1] and the if (arg.startsWith('--output=')) branch that
slices the value) so both branches reject values beginning with a dash before
assigning to output.
In `@packages/scanner/src/output/sarif.ts`:
- Around line 43-46: The SarifReport interface currently isn't exported but is
used as the return type of the public functions formatSarif and toSarif; update
the declaration to export the SarifReport interface (i.e., change "interface
SarifReport" to "export interface SarifReport") and likewise export any other
SARIF types referenced by those exported functions so the public formatter API's
types are available to consumers (ensure you update the SarifReport declaration
and any duplicate type declarations referenced by formatSarif/toSarif).
In `@packages/scanner/src/parsers/bun.ts`:
- Around line 18-28: The parseDescriptor function currently splits name/version
by the last '@' but treats all right-hand parts as a semver-like version; update
parseDescriptor to detect non-registry refs (e.g., git/github url refs
containing ":" or "#" or known prefixes like "github:") and return an augmented
shape such as { name, version, source } where source is 'registry' for
semver-like values and 'git'/'url' for git/URL refs (leave version as the raw
ref). Ensure the function signature and all call sites that rely on
parseDescriptor (parseDescriptor) are updated to accept the new source field or
continue using version if source is absent.
- Around line 30-58: parseBunTextLockfile currently uses fragile regexes to
extract the "packages" section and descriptors; replace that with a real JSONC
parse and robust traversal: parse the lockfile content with a JSONC-capable
parser (or JSON.parse when running under Bun), read the top-level "packages"
object, iterate Object.values or Object.entries of that object, for each entry
ensure the first element is a string descriptor before calling
parseDescriptor(descriptor), skip entries that don't match expected shapes, and
preserve adding lockfileVersion: 1 to each parsed package; this removes the
regexes in parseBunTextLockfile and makes parsing resilient to comments,
trailing commas, or format shifts.
In `@packages/scanner/src/parsers/index.ts`:
- Around line 13-18: The current return concatenates results from
parseNpmLockfile, parseYarnLockfile, parsePnpmLockfile, and parseBunLockfile and
can yield duplicate package entries; replace the direct spread return with logic
that aggregates all parsed arrays, then deduplicates by a stable key (e.g.,
`${pkg.name}@${pkg.version}`) using a Set or Map to ensure only one entry per
name@version is kept before returning the final array (operate on the
combinedResults from
parseNpmLockfile/parseYarnLockfile/parsePnpmLockfile/parseBunLockfile).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a42a846e-3692-4373-899d-927807654b94
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (41)
.gitignoreCHANGELOG.mdREADME.mdapps/cli/bin/cli.jsapps/cli/build.mjsapps/cli/package.jsonapps/cli/tsconfig.jsonapps/docs/playwright-report/index.htmlapps/docs/playwright.config.tsapps/docs/test-results/.last-run.jsonapps/docs/tests/docs.e2e.tscs/README.mdpackage.jsonpackages/engine/package.jsonpackages/engine/src/index.tspackages/engine/src/ingest.tspackages/engine/src/prompt.tspackages/engine/src/types.tspackages/engine/src/validate.tspackages/ioc/index.jspackages/ioc/threats/ctx-2022.jsonpackages/ioc/threats/event-stream-2018.jsonpackages/ioc/threats/node-ipc-2022.jsonpackages/ioc/threats/ua-parser-js-2021.jsonpackages/ioc/threats/xz-utils-2024.jsonpackages/kb/src/chunker.tspackages/scanner/package.jsonpackages/scanner/src/cli.tspackages/scanner/src/detectors/index.tspackages/scanner/src/detectors/injection.tspackages/scanner/src/index.tspackages/scanner/src/output/sarif.tspackages/scanner/src/output/text.tspackages/scanner/src/parsers/bun.tspackages/scanner/src/parsers/index.tspackages/scanner/src/parsers/js-yaml.d.tspackages/scanner/src/parsers/npm.tspackages/scanner/src/parsers/pnpm.tspackages/scanner/src/parsers/yarn.tspackages/scanner/src/scan.tspackages/scanner/src/types.ts
💤 Files with no reviewable changes (1)
- apps/docs/test-results/.last-run.json
| webServer: { | ||
| command: 'bun run build && bun run preview', | ||
| command: 'bun run dev', | ||
| url: 'http://localhost:4321', | ||
| reuseExistingServer: !process.env.CI, | ||
| timeout: 120000, | ||
| timeout: 300000, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the Playwright config keeps production preview coverage in CI.
config="$(fd -a playwright.config.ts apps/docs | head -n 1)"
if [ -z "$config" ]; then
echo "apps/docs Playwright config not found"
exit 1
fi
echo "Config: $config"
rg -n -C 2 "command:|webServer|process\.env\.CI|bun run build|bun run preview|bun run dev" "$config"Repository: miccy/wormsCTRL
Length of output: 506
🏁 Script executed:
cat -n apps/docs/package.json | grep -A 20 '"scripts"'Repository: miccy/wormsCTRL
Length of output: 802
🏁 Script executed:
# Also check if there are other test or build steps in the repo's CI
find . -name ".github" -o -name ".gitlab-ci" -o -name "Makefile" | head -10Repository: miccy/wormsCTRL
Length of output: 68
🏁 Script executed:
# Look for any other build or test configurations
fd -t f "astro.config|vitest|jest" apps/docsRepository: miccy/wormsCTRL
Length of output: 85
🏁 Script executed:
# Check GitHub Actions workflows for Playwright and build testing
find .github/workflows -type f -name "*.yml" -o -name "*.yaml" | head -10Repository: miccy/wormsCTRL
Length of output: 299
🏁 Script executed:
# Look for any references to astro build or preview in CI
rg -l "astro build|astro preview" .githubRepository: miccy/wormsCTRL
Length of output: 41
🏁 Script executed:
# Check what the main CI workflow does
cat -n .github/workflows/ci.yml | head -60Repository: miccy/wormsCTRL
Length of output: 1356
🏁 Script executed:
# Also check deploy-docs to see if it tests build/preview
cat -n .github/workflows/deploy-docs.yml | head -50Repository: miccy/wormsCTRL
Length of output: 1348
🏁 Script executed:
# Check full ci.yml for playwright/docs tests
cat -n .github/workflows/ci.yml | grep -i "playwright\|docs\|e2e\|test"Repository: miccy/wormsCTRL
Length of output: 262
🏁 Script executed:
# Check if there's a specific docs test job in ci.yml
wc -l .github/workflows/ci.ymlRepository: miccy/wormsCTRL
Length of output: 86
🏁 Script executed:
# Look at the full playwright.config.ts to understand the context better
cat -n apps/docs/playwright.config.tsRepository: miccy/wormsCTRL
Length of output: 868
🏁 Script executed:
# Check root package.json for docs:build script
cat -n package.json | grep -A 5 -B 5 "docs:build"Repository: miccy/wormsCTRL
Length of output: 712
🏁 Script executed:
# Search entire repo for any explicit playwright test execution in CI or elsewhere
rg -r "playwright test|test.*e2e" .github --type yamlRepository: miccy/wormsCTRL
Length of output: 516
🏁 Script executed:
# Check if there are actual test files
find apps/docs/tests -type f -name "*.e2e.ts" 2>/dev/null | head -5Repository: miccy/wormsCTRL
Length of output: 86
🏁 Script executed:
# Check if playwright tests might be run somewhere else in workflows
rg "npm run test|bun run test|test:" .github/workflowsRepository: miccy/wormsCTRL
Length of output: 97
Use production build for Playwright in CI to catch build failures.
The webServer command currently runs Astro's dev server for both local and CI environments. Since tests are configured with CI-specific behavior (retries, worker count), the webServer should also adapt: dev locally tests active compilation, but CI should test the actual production build and preview to catch astro build or deployment issues early.
🛠️ Suggested CI-aware command
webServer: {
- command: 'bun run dev',
+ command: process.env.CI ? 'bun run build && bun run preview' : 'bun run dev',
url: 'http://localhost:4321',
reuseExistingServer: !process.env.CI,
timeout: 300000,
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| webServer: { | |
| command: 'bun run build && bun run preview', | |
| command: 'bun run dev', | |
| url: 'http://localhost:4321', | |
| reuseExistingServer: !process.env.CI, | |
| timeout: 120000, | |
| timeout: 300000, | |
| webServer: { | |
| command: process.env.CI ? 'bun run build && bun run preview' : 'bun run dev', | |
| url: 'http://localhost:4321', | |
| reuseExistingServer: !process.env.CI, | |
| timeout: 300000, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/docs/playwright.config.ts` around lines 21 - 25, The Playwright
webServer configuration (webServer.command and reuseExistingServer) should run
the dev server locally but use a production build + preview in CI; update the
webServer.command to branch on process.env.CI (e.g., when CI is truthy run your
build then preview command like "bun run build && bun run preview" or
equivalent) and ensure reuseExistingServer is false in CI so Playwright always
spins up the built preview; modify the webServer block (symbols: webServer,
command, reuseExistingServer, timeout) to implement this CI-aware command and
behavior.
| "node": ">=18" | ||
| }, | ||
| "overrides": { | ||
| "vite": "^7" | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Repository Node engine:"
jq -r '.engines.node' package.json
echo "Vite 7.0.0 Node engine from npm registry:"
curl -fsSL https://registry.npmjs.org/vite/7.0.0 | jq -r '.engines.node'Repository: miccy/wormsCTRL
Length of output: 151
Update engines.node to reflect Vite 7's minimum requirements.
The overrides.vite: "^7" forces Vite 7 into the workspace, but the root engines.node: ">=18" creates a mismatch. Vite 7 requires Node ^20.19.0 || >=22.12.0, so Node 18 users will hit installation or build failures. Align the Node engine requirement with the Vite override:
Suggested fix
"engines": {
- "node": ">=18"
+ "node": "^20.19.0 || >=22.12.0"
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "node": ">=18" | |
| }, | |
| "overrides": { | |
| "vite": "^7" | |
| }, | |
| "node": "^20.19.0 || >=22.12.0" | |
| }, | |
| "overrides": { | |
| "vite": "^7" | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@package.json` around lines 63 - 67, The package.json currently sets
engines.node to ">=18" while overrides.vite is forcing Vite 7; update the
engines.node entry in package.json (the "engines.node" field) to match Vite 7's
minimum Node requirement (e.g. use the semver range "^20.19.0 || >=22.12.0") so
the workspace Node engine aligns with overrides.vite and avoids install/build
failures; after updating, run an install to verify no other packages conflict
with the new Node range.
| try { | ||
| const response = await fetch(trimmedInput) | ||
| if (!response.ok) { | ||
| console.warn( | ||
| `[engine] Failed to fetch advisory URL: ${response.status} ${response.statusText}` | ||
| ) | ||
| return null | ||
| } | ||
|
|
||
| const html = await response.text() | ||
| const text = stripHtml(html) | ||
| return text ? `Source URL: ${trimmedInput}\n\n${text}` : null |
There was a problem hiding this comment.
Add SSRF, timeout, and size guards around outbound fetches.
Line 43 fetches user-provided URLs without host restrictions, timeout, or response-size limits, and Line 88 has the same timeout risk for the OpenAI call. This can hang ingestion, hit internal network targets, or load/send oversized pages.
🛡️ Suggested guard shape
+const MAX_ADVISORY_BYTES = 1_000_000
+const FETCH_TIMEOUT_MS = 10_000
+
+function isBlockedAdvisoryHostname(hostname: string): boolean {
+ const lower = hostname.toLowerCase()
+ return (
+ lower === 'localhost' ||
+ lower.endsWith('.localhost') ||
+ /^127\./.test(lower) ||
+ /^10\./.test(lower) ||
+ /^192\.168\./.test(lower) ||
+ /^172\.(1[6-9]|2\d|3[0-1])\./.test(lower)
+ )
+}
+
async function resolveAdvisoryInput(input: string): Promise<string | null> {
const trimmedInput = input.trim()
@@
if (!isUrl(trimmedInput)) {
return trimmedInput
}
try {
- const response = await fetch(trimmedInput)
+ const parsedUrl = new URL(trimmedInput)
+ if (isBlockedAdvisoryHostname(parsedUrl.hostname)) {
+ console.warn('[engine] Refusing to fetch advisory URL from a private or local host.')
+ return null
+ }
+
+ const controller = new AbortController()
+ const timeout = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS)
+ const response = await fetch(trimmedInput, { signal: controller.signal })
+ clearTimeout(timeout)
if (!response.ok) {
console.warn(
`[engine] Failed to fetch advisory URL: ${response.status} ${response.statusText}`
)
return null
}
+
+ const contentLength = Number(response.headers.get('content-length') ?? 0)
+ if (contentLength > MAX_ADVISORY_BYTES) {
+ console.warn('[engine] Advisory URL response is too large to ingest safely.')
+ return null
+ }
const html = await response.text()Also applies to: 87-103
| const threatFileArtifactSchema = z.object({ | ||
| filename: z.string().min(1), | ||
| sha256: z.string(), | ||
| path: z.string().min(1).optional(), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "validate.ts" -path "*/engine/src/*" | head -5Repository: miccy/wormsCTRL
Length of output: 92
🏁 Script executed:
find . -name "ctx-2022.json" | head -5Repository: miccy/wormsCTRL
Length of output: 95
🏁 Script executed:
ls -la packages/engine/src/ 2>/dev/null | head -20Repository: miccy/wormsCTRL
Length of output: 583
🏁 Script executed:
head -30 packages/engine/src/validate.tsRepository: miccy/wormsCTRL
Length of output: 766
🏁 Script executed:
cat packages/ioc/threats/ctx-2022.json | head -100Repository: miccy/wormsCTRL
Length of output: 2179
Tighten sha256 validation or make the field optional.
Line 22 accepts empty strings, allowing placeholder file artifacts to pass validation. The threat catalog already contains empty sha256 values (e.g., in ctx-2022.json), which means unknown/missing hashes are currently indistinguishable from invalid data. Either require a 64-char hex digest or model unknown hashes as intentionally absent/optional and update the catalog accordingly.
🛡️ Proposed schema direction
const threatFileArtifactSchema = z.object({
filename: z.string().min(1),
- sha256: z.string(),
+ sha256: z.string().regex(/^[a-f0-9]{64}$/i).optional(),
path: z.string().min(1).optional(),
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const threatFileArtifactSchema = z.object({ | |
| filename: z.string().min(1), | |
| sha256: z.string(), | |
| path: z.string().min(1).optional(), | |
| const threatFileArtifactSchema = z.object({ | |
| filename: z.string().min(1), | |
| sha256: z.string().regex(/^[a-f0-9]{64}$/i).optional(), | |
| path: z.string().min(1).optional(), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/engine/src/validate.ts` around lines 20 - 23, The schema
threatFileArtifactSchema currently permits empty sha256 strings; tighten
validation by either enforcing a 64-character hex digest for the sha256 field
(update threatFileArtifactSchema to validate sha256 with a 64-hex regex or exact
length constraint) or make sha256 explicitly optional (change sha256 to optional
in threatFileArtifactSchema) and then update the threat catalog entries (e.g.,
replace empty-string sha256 values with omitted/absent fields) so unknown hashes
are modeled as missing rather than empty strings.
| "file_artifacts": [ | ||
| { | ||
| "filename": "index.js", | ||
| "sha256": "", | ||
| "path": "node_modules/flatmap-stream/index.js" | ||
| }, | ||
| { | ||
| "filename": "package.json", | ||
| "sha256": "", | ||
| "path": "node_modules/event-stream/package.json" | ||
| } |
There was a problem hiding this comment.
Avoid empty hash indicators in the threat catalog.
sha256: "" is not a valid SHA-256 IOC. Please populate the hashes, or omit the field if these artifacts are intended to be path-only indicators.
🛠️ Possible fix if hashes are unknown
{
"filename": "index.js",
- "sha256": "",
"path": "node_modules/flatmap-stream/index.js"
},
{
"filename": "package.json",
- "sha256": "",
"path": "node_modules/event-stream/package.json"
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "file_artifacts": [ | |
| { | |
| "filename": "index.js", | |
| "sha256": "", | |
| "path": "node_modules/flatmap-stream/index.js" | |
| }, | |
| { | |
| "filename": "package.json", | |
| "sha256": "", | |
| "path": "node_modules/event-stream/package.json" | |
| } | |
| "file_artifacts": [ | |
| { | |
| "filename": "index.js", | |
| "path": "node_modules/flatmap-stream/index.js" | |
| }, | |
| { | |
| "filename": "package.json", | |
| "path": "node_modules/event-stream/package.json" | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/ioc/threats/event-stream-2018.json` around lines 20 - 30, The
file_artifacts entries contain empty sha256 values which are invalid IOCs;
update the two entries for filename "index.js" (path
"node_modules/flatmap-stream/index.js") and "package.json" (path
"node_modules/event-stream/package.json") by either supplying the correct
SHA-256 hashes in their sha256 fields or removing the sha256 property entirely
so the artifacts remain as path-only indicators; ensure you only change the
sha256 key for those objects and keep the filename and path values intact.
| const target = resolve(parsedArgs.target) | ||
| const result = await scan(target) | ||
| const rendered = renderResult(result, parsedArgs.format) | ||
|
|
||
| if (parsedArgs.output) { | ||
| const outputPath = resolve(parsedArgs.output) | ||
| writeFileSync(outputPath, `${rendered}\n`) | ||
| console.error(`[worms-ctrl] Wrote ${parsedArgs.format} output to ${outputPath}`) | ||
| } else { | ||
| process.stdout.write(`${rendered}\n`) | ||
| } |
There was a problem hiding this comment.
writeFileSync failures will reject runCli with no friendly message 🧯
Every other error path in runCli returns a numeric exit code and prints a [worms-ctrl] … line; but if resolve(parsedArgs.output) points to a non-writable path (read-only FS, missing parent dir, permission denied), writeFileSync throws and runCli rejects the promise — callers (including apps/cli/bin/cli.js) then see a raw stack trace instead of your nice CLI error format. Worth wrapping in a try/catch mirroring the parseScanArgs handling.
🩹 Suggested handling
if (parsedArgs.output) {
const outputPath = resolve(parsedArgs.output)
- writeFileSync(outputPath, `${rendered}\n`)
- console.error(`[worms-ctrl] Wrote ${parsedArgs.format} output to ${outputPath}`)
+ try {
+ writeFileSync(outputPath, `${rendered}\n`)
+ } catch (error) {
+ console.error(
+ `[worms-ctrl] Failed to write ${parsedArgs.format} output to ${outputPath}: ${
+ error instanceof Error ? error.message : String(error)
+ }`
+ )
+ return 1
+ }
+ console.error(`[worms-ctrl] Wrote ${parsedArgs.format} output to ${outputPath}`)
} else {
process.stdout.write(`${rendered}\n`)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const target = resolve(parsedArgs.target) | |
| const result = await scan(target) | |
| const rendered = renderResult(result, parsedArgs.format) | |
| if (parsedArgs.output) { | |
| const outputPath = resolve(parsedArgs.output) | |
| writeFileSync(outputPath, `${rendered}\n`) | |
| console.error(`[worms-ctrl] Wrote ${parsedArgs.format} output to ${outputPath}`) | |
| } else { | |
| process.stdout.write(`${rendered}\n`) | |
| } | |
| const target = resolve(parsedArgs.target) | |
| const result = await scan(target) | |
| const rendered = renderResult(result, parsedArgs.format) | |
| if (parsedArgs.output) { | |
| const outputPath = resolve(parsedArgs.output) | |
| try { | |
| writeFileSync(outputPath, `${rendered}\n`) | |
| } catch (error) { | |
| console.error( | |
| `[worms-ctrl] Failed to write ${parsedArgs.format} output to ${outputPath}: ${ | |
| error instanceof Error ? error.message : String(error) | |
| }` | |
| ) | |
| return 1 | |
| } | |
| console.error(`[worms-ctrl] Wrote ${parsedArgs.format} output to ${outputPath}`) | |
| } else { | |
| process.stdout.write(`${rendered}\n`) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/scanner/src/cli.ts` around lines 225 - 235, The writeFileSync call
in runCli (when parsedArgs.output is set) can throw and currently lets runCli
reject with a stack trace; wrap the output path resolution and writeFileSync
invocation in a try/catch, and on error print a friendly `[worms-ctrl]` error
message (including the error.message and the attempted parsedArgs.output/format)
to stderr and return the same numeric exit code pattern used elsewhere in runCli
(mirror parseScanArgs handling) so callers receive the formatted error instead
of a raw exception.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8c5adb7e75
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| function parseDescriptor(descriptor: string): { name: string; version: string } | null { | ||
| const versionSeparator = descriptor.lastIndexOf('@') |
There was a problem hiding this comment.
Parse Bun alias descriptors by alias package name
Bun alias descriptors can be encoded as alias@npm:real@1.2.3, but splitting on lastIndexOf('@') makes the parsed name alias@npm:real instead of alias. In scans that run injection detection against package.json, this causes legitimate aliased dependencies to be reported as high-severity injections because the declared name no longer matches. Handle @npm: descriptors explicitly so the alias name is preserved.
Useful? React with 👍 / 👎.
| } | ||
| } | ||
|
|
||
| const versionSeparator = normalized.lastIndexOf('@') |
There was a problem hiding this comment.
Parse pnpm alias keys using the alias package name
pnpm lockfile keys for aliased dependencies follow patterns like alias@npm:real@version, but this branch uses lastIndexOf('@') and returns alias@npm:real as the package name. That mismatches package.json (which declares alias) and leads detectInjection to emit false high-severity findings on valid pnpm projects that use aliases. Special-case alias keys so name extraction stops before @npm:.
Useful? React with 👍 / 👎.
Code Review SummaryStatus: No Issues Found | Recommendation: Merge Files Reviewed (8 files)
Reviewed by grok-code-fast-1 · 63,945 tokens |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 42 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
packages/ioc/index.js:38
- The JSDoc and function name suggest this returns only archived threats, but it now appends the full threat catalog from
packages/ioc/threats/(which includes PATCHED/ACTIVE). Either filter the loaded catalog to archived-only entries, or update the function name/JSDoc to reflect that it returns all public-safe threat summaries.
/**
* List all archived threats (public-safe metadata only)
* @returns {Array<{id: string, name: string, ecosystem: string, severity: string, status: string}>}
*/
export function getArchivedThreats() {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import { readdirSync, readFileSync } from 'node:fs' | ||
| import { dirname, join } from 'node:path' | ||
| import { fileURLToPath } from 'node:url' | ||
| import shaiHulud2Iocs from './archived/shai-hulud/iocs.json' | ||
| import shaiHulud2 from './archived/shai-hulud/threat-model.json' |
There was a problem hiding this comment.
index.js imports JSON modules without import assertions. This works in some bundlers/runtimes, but Node ESM (including Node >=18) requires JSON import assertions, otherwise importing @worms-ctrl/ioc will throw at runtime. To keep Node compatibility, consider loading these JSON files via fs/JSON.parse (similar to loadThreatCatalog()), or add the appropriate JSON import assertions syntax supported by your targeted Node versions.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Miccy <code@miccy.dev>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Miccy <code@miccy.dev>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Miccy <code@miccy.dev>
|
Warning Review the following alerts detected in dependencies. According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.
Ignoring alerts on:
|
There was a problem hiding this comment.
Actionable comments posted: 10
♻️ Duplicate comments (1)
packages/engine/src/validate.ts (1)
20-23:⚠️ Potential issue | 🟡 MinorKeep unknown hashes as
null, not empty strings.
sha256: z.string().nullable()still accepts'', so missing hashes remain indistinguishable from invalid hashes. Since the schema now supportsnull, require real hashes to be 64-char hex and update fixtures with unknown hashes tonull.🛡️ Proposed schema tightening
const threatFileArtifactSchema = z.object({ filename: z.string().min(1), - sha256: z.string().nullable(), + sha256: z.string().regex(/^[a-f0-9]{64}$/i).nullable(), path: z.string().min(1).optional(), })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/engine/src/validate.ts` around lines 20 - 23, The sha256 field in threatFileArtifactSchema currently allows empty strings because it's defined as z.string().nullable(); change the schema so that when not null the value must be a real 64-character hex hash (e.g., use a string validator that enforces length 64 and a hex regex) while still allowing null, and update any test/fixture data to use null for unknown hashes instead of an empty string; refer to threatFileArtifactSchema and the sha256 property when making this change.
🧹 Nitpick comments (3)
packages/engine/src/__tests__/threats.validate.test.ts (1)
1-16: Validate every threat fixture, not just three hardcoded files.This test can miss invalid catalog entries as the threat DB grows. Reading all
*.jsonfiles keeps schema coverage aligned with the catalog automatically.🧪 Proposed test update
import { describe, expect, test } from 'bun:test' +import { readdir } from 'node:fs/promises' import { join, resolve } from 'node:path' import { validateThreatObject } from '../validate.js' const THREATS_DIR = resolve(import.meta.dir, '../../../../packages/ioc/threats') describe('threat catalog validation', () => { test('new threat objects pass ThreatObject schema validation', async () => { - const filenames = ['axios-2026.json', 'shai-hulud-2025.json', 'teampcp-2026.json'] + const filenames = (await readdir(THREATS_DIR)) + .filter((filename) => filename.endsWith('.json')) + .sort() for (const filename of filenames) { const content = await Bun.file(join(THREATS_DIR, filename)).text() const candidate = JSON.parse(content) as unknown🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/engine/src/__tests__/threats.validate.test.ts` around lines 1 - 16, The test currently only validates three hardcoded files; update the test in threats.validate.test.ts (the test named 'new threat objects pass ThreatObject schema validation') to read all JSON files from the THREATS_DIR constant instead of the filenames array, iterate over every '*.json' file found (e.g., using Bun.file or fs.readdir with a filter), parse each file and assert validateThreatObject(candidate) is not null for each file; keep the validateThreatObject call and the existing expect assertion but replace the hardcoded filenames loop with dynamic discovery of all JSON fixtures so the test covers the entire catalog.packages/scanner/src/scan.ts (1)
41-63: Tiny nit: theparseNpmLockwrapper is just a gift-wrapped forward 🎁
parseNpmLockat lines 61-63 only delegates toparseNpmLockfilewith no extra logic, so the dispatcher could call the parser directly and keep the switch symmetrical with the other cases. Purely optional cleanup.♻️ Optional simplification
switch (basename(path)) { case 'package-lock.json': case 'package-lock.v2.json': case 'package-lock.v3.json': - return parseNpmLock(path) + return parseNpmLockfile(path) case 'yarn.lock': @@ default: return [] } } - -function parseNpmLock(path: string): LockfilePackage[] { - return parseNpmLockfile(path) -}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/scanner/src/scan.ts` around lines 41 - 63, The parseNpmLock function is a redundant wrapper around parseNpmLockfile; update parseLockfile to call parseNpmLockfile(path) directly for the 'package-lock.json'/'package-lock.v2.json'/'package-lock.v3.json' cases and remove the now-unused parseNpmLock function to keep the dispatcher symmetrical with other cases (references: parseLockfile, parseNpmLockfile, parseNpmLock).packages/scanner/src/threats.ts (1)
116-128: Lazy-load the threat DB instead of top-levelawaiton import 💤You're right that top-level disk I/O at module evaluation blocks all imports, even code paths that never call the threat detectors. However, the suggested refactor needs adjustment: making
getThreatDatabase()return a Promise would require cascadingasyncchanges togetThreatVersionMatches(),getPhantomDependencyMatches(),detectInjection(), and their call sites throughout the scanner layer. That's a larger refactor than the diff indicates.Current trade-offs:
- ES2022+ is fully supported (root tsconfig confirms
target: "ES2022")- Scanner is private, so downstream bundling concerns are moot
- The sync API is convenient, but forces eager loading
If you want to lazy-load without an API change, you'd need a wrapper like memoizing the promise and carefully managing sync vs. async boundaries—which gets messy. Alternatively, accepting the current pattern is reasonable for a private package with proven ES support.
Worth revisiting if tests start struggling with threat DB state between runs, or if you want to offer an async variant alongside the sync one.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/scanner/src/threats.ts` around lines 116 - 128, The current top-level await on loadThreatDatabaseFromDisk forces eager disk I/O but avoids large API changes; keep the top-level await (the threatDatabase initialization) to preserve the synchronous API (getThreatDatabase, getThreatVersionMatches, getPhantomDependencyMatches) and add a clear comment above the const threatDatabase = await loadThreatDatabaseFromDisk() explaining why eager load is intentional (ES2022 target, private package, avoids cascading async changes) so future maintainers know not to replace it with a Promise-returning lazy initializer; if you later choose lazy-loading, implement a memoized Promise wrapper and provide an async alternative rather than changing the existing sync getters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/engine/src/validate.ts`:
- Around line 39-48: The current schema allows empty strings because z.string()
is used for package_names and remediation entries; update the
indicators_of_compromise package_names array and the remediation immediate and
long_term arrays to require non-empty strings (e.g., replace z.string() with
z.string().min(1) or z.string().nonempty()) so blank package names and
remediation steps are rejected; keep the rest of the threat* schemas unchanged
and run validation tests to confirm empty strings now fail for
indicators_of_compromise.package_names and remediation.{immediate,long_term}.
In `@packages/ioc/index.js`:
- Around line 24-31: In loadThreatCatalog(), avoid swallowing all entries on a
single file error by iterating over each .json file from
readdirSync(THREATS_DIR) and for each file use readFileSync + JSON.parse inside
its own try/catch so a malformed or unreadable file is skipped instead of
returning []; preserve and return an array of successfully parsed objects
(optionally log the filename and parse error) while still returning an empty
array only if no files parse successfully.
In `@packages/scanner/src/__tests__/threats.test.ts`:
- Around line 11-18: The test currently uses an OR check so it passes if only
one dependency is found; update the assertion to require both detections by
checking result.findings for two separate matches: one where finding.severity
=== 'critical' and finding.message.includes('axios') and another where
finding.severity === 'critical' and finding.message.includes('plain-crypto-js');
use result.findings.some(...) twice (or otherwise assert both exist) referencing
the existing result.findings, finding.severity and finding.message symbols.
In `@packages/scanner/src/parsers/bun.ts`:
- Around line 130-139: The fallback builds bun.lockb using targetDirOrPath
directly, which breaks when targetDirOrPath is a path to bun.lock (e.g.,
/repo/bun.lock) because it joins a file path with bun.lockb; change the logic to
compute a projectDir = basename(targetDirOrPath) === 'bun.lock' ?
dirname(targetDirOrPath) : targetDirOrPath, then compute bunLockBinaryPath =
basename(targetDirOrPath) === 'bun.lockb' ? targetDirOrPath : join(projectDir,
'bun.lockb'), and use bunProjectDirectory(projectDir) as the cwd for spawnSync
so sibling bun.lockb files next to bun.lock are found (update references to
bunLockBinaryPath, targetDirOrPath, and bunProjectDirectory accordingly).
In `@packages/scanner/src/parsers/npm.ts`:
- Around line 38-40: packageNameFromPath currently returns the full path for
inputs that don't contain node_modules (e.g., "packages/api"), causing workspace
roots to be treated as dependencies; update packageNameFromPath to first check
whether pkgPath includes "node_modules/" and if not return an empty string (or
null/undefined per project convention), otherwise split on /node_modules\// and
return the last segment; apply the same guard/fix to the analogous logic around
lines 67-71 (the other path-to-package-name helper used for package-lock
entries) so only paths that actually contain node_modules are parsed as
dependency names.
In `@packages/scanner/src/parsers/requirements.ts`:
- Around line 42-49: The requirements parser currently returns raw package names
which can miss IOC matches; update the parser in
packages/scanner/src/parsers/requirements.ts so that the returned name is
normalized (lowercased and collapse/replace runs of '.', '_', '-' into a single
'-' e.g. name.toLowerCase().replace(/[-_.]+/g,'-')) before returning from the
function that destructures match into name/version; also update the
normalizePackageVersion function in packages/scanner/src/threats.ts to apply the
same normalization to the name portion (lowercase + collapse special chars) so
threat matching uses the same canonical form.
In `@packages/scanner/src/threats.ts`:
- Around line 130-138: The switch in toFindingSeverity (which maps
ThreatSeverity -> Severity) can fall through to undefined for unexpected values;
update toFindingSeverity to include a default/fallback branch that returns a
safe Severity (e.g., 'low') for any unhandled ThreatSeverity, so
Finding.severity and downstream formatText/JSON always receive a valid Severity;
reference the toFindingSeverity function and ThreatSeverity/Severity types and
ensure the default handles unexpected inputs (optionally log or assert) rather
than letting the switch fall through.
- Around line 68-82: The call to readdirSync in loadThreatDatabaseFromDisk can
still throw (EACCES/TOCTOU) and should be guarded so the module doesn't crash;
wrap the readdirSync + mapping to threatFiles and the
Promise.all(readThreatObject(...)) call in a try/catch inside
loadThreatDatabaseFromDisk and on error return the same empty ThreatDatabase
shape ({ threats: [], maliciousVersions: new Map(), phantomDependencies: new
Map() }) so unreadable/missing threat data yields empty results; keep using the
existing helpers (findThreatsDir, readThreatObject) and avoid rethrowing the
error so the top-level await won't fail.
- Around line 59-107: readThreatObject currently blind-casts parsed JSON to
ThreatObject, which lets malformed files crash loadThreatDatabaseFromDisk when
iterating indicators_of_compromise.malicious_versions; fix by validating the
parsed object with the Zod threat validator exported from
packages/engine/src/validate.ts (import the validator, e.g., parse/validate
function) inside readThreatObject and return null on validation failure (or log
and skip), and ensure loadThreatDatabaseFromDisk only processes ThreatObject
instances (keep the existing type guard filter), protecting the loop over
indicators_of_compromise.malicious_versions from non-array values.
In `@tests/fixtures/teampcp-litellm/requirements.txt`:
- Line 1: Add an exclusion for the test fixture directory containing the sample
requirements file so dependency/OSV scanners skip it: update your scanner/CI
configuration to ignore the tests/fixtures/** pattern (so the specific
requirements.txt containing litellm==1.82.7 is excluded) and add a short note to
the repository’s security or CI docs explaining that tests/fixtures contain
intentionally problematic packages and must be excluded from automated
dependency/vulnerability checks; ensure this change is applied to any scanner
configs used in CI (e.g., detect/OSV scanner settings) and documented for
developers.
---
Duplicate comments:
In `@packages/engine/src/validate.ts`:
- Around line 20-23: The sha256 field in threatFileArtifactSchema currently
allows empty strings because it's defined as z.string().nullable(); change the
schema so that when not null the value must be a real 64-character hex hash
(e.g., use a string validator that enforces length 64 and a hex regex) while
still allowing null, and update any test/fixture data to use null for unknown
hashes instead of an empty string; refer to threatFileArtifactSchema and the
sha256 property when making this change.
---
Nitpick comments:
In `@packages/engine/src/__tests__/threats.validate.test.ts`:
- Around line 1-16: The test currently only validates three hardcoded files;
update the test in threats.validate.test.ts (the test named 'new threat objects
pass ThreatObject schema validation') to read all JSON files from the
THREATS_DIR constant instead of the filenames array, iterate over every '*.json'
file found (e.g., using Bun.file or fs.readdir with a filter), parse each file
and assert validateThreatObject(candidate) is not null for each file; keep the
validateThreatObject call and the existing expect assertion but replace the
hardcoded filenames loop with dynamic discovery of all JSON fixtures so the test
covers the entire catalog.
In `@packages/scanner/src/scan.ts`:
- Around line 41-63: The parseNpmLock function is a redundant wrapper around
parseNpmLockfile; update parseLockfile to call parseNpmLockfile(path) directly
for the 'package-lock.json'/'package-lock.v2.json'/'package-lock.v3.json' cases
and remove the now-unused parseNpmLock function to keep the dispatcher
symmetrical with other cases (references: parseLockfile, parseNpmLockfile,
parseNpmLock).
In `@packages/scanner/src/threats.ts`:
- Around line 116-128: The current top-level await on loadThreatDatabaseFromDisk
forces eager disk I/O but avoids large API changes; keep the top-level await
(the threatDatabase initialization) to preserve the synchronous API
(getThreatDatabase, getThreatVersionMatches, getPhantomDependencyMatches) and
add a clear comment above the const threatDatabase = await
loadThreatDatabaseFromDisk() explaining why eager load is intentional (ES2022
target, private package, avoids cascading async changes) so future maintainers
know not to replace it with a Promise-returning lazy initializer; if you later
choose lazy-loading, implement a memoized Promise wrapper and provide an async
alternative rather than changing the existing sync getters.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c70559ba-c1f2-4cac-9876-ac03bcf69f19
⛔ Files ignored due to path filters (4)
bun.lockis excluded by!**/*.locktests/fixtures/axios-compromise/package-lock.jsonis excluded by!**/package-lock.jsontests/fixtures/clean-baseline/package-lock.jsonis excluded by!**/package-lock.jsontests/fixtures/shai-hulud-worm/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (33)
.gitignoreCHANGELOG.mdREADME.mdcs/README.mdexamples/axios-compromise.sarifpackage.jsonpackages/engine/src/__tests__/threats.validate.test.tspackages/engine/src/__tests__/validate.test.tspackages/engine/src/types.tspackages/engine/src/validate.tspackages/ioc/__tests__/index.test.tspackages/ioc/index.jspackages/ioc/threats/axios-2026.jsonpackages/ioc/threats/shai-hulud-2025.jsonpackages/ioc/threats/teampcp-2026.jsonpackages/scanner/src/__tests__/cli-output.test.tspackages/scanner/src/__tests__/parsers.test.tspackages/scanner/src/__tests__/scan.test.tspackages/scanner/src/__tests__/threats.test.tspackages/scanner/src/detectors/injection.tspackages/scanner/src/io.tspackages/scanner/src/output/sarif.tspackages/scanner/src/parsers/bun.tspackages/scanner/src/parsers/index.tspackages/scanner/src/parsers/npm.tspackages/scanner/src/parsers/requirements.tspackages/scanner/src/scan.tspackages/scanner/src/threats.tspackages/scanner/src/types.tstests/fixtures/axios-compromise/package.jsontests/fixtures/clean-baseline/package.jsontests/fixtures/shai-hulud-worm/package.jsontests/fixtures/teampcp-litellm/requirements.txt
✅ Files skipped from review due to trivial changes (10)
- tests/fixtures/axios-compromise/package.json
- .gitignore
- tests/fixtures/clean-baseline/package.json
- packages/scanner/src/types.ts
- packages/ioc/threats/axios-2026.json
- packages/ioc/threats/teampcp-2026.json
- examples/axios-compromise.sarif
- tests/fixtures/shai-hulud-worm/package.json
- packages/ioc/threats/shai-hulud-2025.json
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (5)
- package.json
- packages/scanner/src/parsers/index.ts
- packages/scanner/src/output/sarif.ts
- packages/scanner/src/detectors/injection.ts
- packages/engine/src/types.ts
| indicators_of_compromise: z.object({ | ||
| package_names: z.array(z.string()), | ||
| malicious_versions: z.array(threatMaliciousVersionSchema), | ||
| network_iocs: z.array(threatNetworkIocSchema), | ||
| file_artifacts: z.array(threatFileArtifactSchema), | ||
| }), | ||
| remediation: z.object({ | ||
| immediate: z.array(z.string()).min(3).max(5), | ||
| long_term: z.array(z.string()).min(1), | ||
| }), |
There was a problem hiding this comment.
Reject empty IOC and remediation strings.
The arrays enforce shape/length, but z.string() allows ''; that lets blank package names and blank remediation steps pass validation.
🧹 Proposed schema tightening
indicators_of_compromise: z.object({
- package_names: z.array(z.string()),
+ package_names: z.array(z.string().min(1)),
malicious_versions: z.array(threatMaliciousVersionSchema),
network_iocs: z.array(threatNetworkIocSchema),
file_artifacts: z.array(threatFileArtifactSchema),
}),
remediation: z.object({
- immediate: z.array(z.string()).min(3).max(5),
- long_term: z.array(z.string()).min(1),
+ immediate: z.array(z.string().min(1)).min(3).max(5),
+ long_term: z.array(z.string().min(1)).min(1),
}),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| indicators_of_compromise: z.object({ | |
| package_names: z.array(z.string()), | |
| malicious_versions: z.array(threatMaliciousVersionSchema), | |
| network_iocs: z.array(threatNetworkIocSchema), | |
| file_artifacts: z.array(threatFileArtifactSchema), | |
| }), | |
| remediation: z.object({ | |
| immediate: z.array(z.string()).min(3).max(5), | |
| long_term: z.array(z.string()).min(1), | |
| }), | |
| indicators_of_compromise: z.object({ | |
| package_names: z.array(z.string().min(1)), | |
| malicious_versions: z.array(threatMaliciousVersionSchema), | |
| network_iocs: z.array(threatNetworkIocSchema), | |
| file_artifacts: z.array(threatFileArtifactSchema), | |
| }), | |
| remediation: z.object({ | |
| immediate: z.array(z.string().min(1)).min(3).max(5), | |
| long_term: z.array(z.string().min(1)).min(1), | |
| }), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/engine/src/validate.ts` around lines 39 - 48, The current schema
allows empty strings because z.string() is used for package_names and
remediation entries; update the indicators_of_compromise package_names array and
the remediation immediate and long_term arrays to require non-empty strings
(e.g., replace z.string() with z.string().min(1) or z.string().nonempty()) so
blank package names and remediation steps are rejected; keep the rest of the
threat* schemas unchanged and run validation tests to confirm empty strings now
fail for indicators_of_compromise.package_names and
remediation.{immediate,long_term}.
| function loadThreatCatalog() { | ||
| try { | ||
| return readdirSync(THREATS_DIR) | ||
| .filter((entry) => entry.endsWith('.json')) | ||
| .map((entry) => JSON.parse(readFileSync(join(THREATS_DIR, entry), 'utf8'))) | ||
| } catch { | ||
| return [] | ||
| } |
There was a problem hiding this comment.
Don’t drop the whole threat catalog because one JSON file is bad.
Right now, one malformed/unreadable file in packages/ioc/threats makes loadThreatCatalog() return [], hiding every valid catalog entry too. Please isolate parse failures per file.
🛡️ Proposed fix
function loadThreatCatalog() {
try {
return readdirSync(THREATS_DIR)
.filter((entry) => entry.endsWith('.json'))
- .map((entry) => JSON.parse(readFileSync(join(THREATS_DIR, entry), 'utf8')))
+ .flatMap((entry) => {
+ try {
+ return [JSON.parse(readFileSync(join(THREATS_DIR, entry), 'utf8'))]
+ } catch (error) {
+ console.warn(
+ `[ioc] Skipping invalid threat catalog entry ${entry}: ${
+ error instanceof Error ? error.message : String(error)
+ }`,
+ )
+ return []
+ }
+ })
} catch {
return []
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/ioc/index.js` around lines 24 - 31, In loadThreatCatalog(), avoid
swallowing all entries on a single file error by iterating over each .json file
from readdirSync(THREATS_DIR) and for each file use readFileSync + JSON.parse
inside its own try/catch so a malformed or unreadable file is skipped instead of
returning []; preserve and return an array of successfully parsed objects
(optionally log the filename and parse error) while still returning an empty
array only if no files parse successfully.
| expect(result.findings.length).toBeGreaterThanOrEqual(1) | ||
| expect( | ||
| result.findings.some( | ||
| (finding) => | ||
| finding.severity === 'critical' && | ||
| (finding.message.includes('axios') || finding.message.includes('plain-crypto-js')) | ||
| ) | ||
| ).toBe(true) |
There was a problem hiding this comment.
Assert both axios and phantom dependency findings.
Right now the || means this passes if only one side works, even though the test name promises both detections.
🧪 Proposed test tightening
expect(result.findings.length).toBeGreaterThanOrEqual(1)
expect(
result.findings.some(
(finding) =>
finding.severity === 'critical' &&
- (finding.message.includes('axios') || finding.message.includes('plain-crypto-js'))
+ finding.message.includes('axios')
+ )
+ ).toBe(true)
+ expect(
+ result.findings.some(
+ (finding) =>
+ finding.severity === 'critical' &&
+ finding.message.includes('plain-crypto-js')
)
).toBe(true)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| expect(result.findings.length).toBeGreaterThanOrEqual(1) | |
| expect( | |
| result.findings.some( | |
| (finding) => | |
| finding.severity === 'critical' && | |
| (finding.message.includes('axios') || finding.message.includes('plain-crypto-js')) | |
| ) | |
| ).toBe(true) | |
| expect(result.findings.length).toBeGreaterThanOrEqual(1) | |
| expect( | |
| result.findings.some( | |
| (finding) => | |
| finding.severity === 'critical' && | |
| finding.message.includes('axios') | |
| ) | |
| ).toBe(true) | |
| expect( | |
| result.findings.some( | |
| (finding) => | |
| finding.severity === 'critical' && | |
| finding.message.includes('plain-crypto-js') | |
| ) | |
| ).toBe(true) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/scanner/src/__tests__/threats.test.ts` around lines 11 - 18, The
test currently uses an OR check so it passes if only one dependency is found;
update the assertion to require both detections by checking result.findings for
two separate matches: one where finding.severity === 'critical' and
finding.message.includes('axios') and another where finding.severity ===
'critical' and finding.message.includes('plain-crypto-js'); use
result.findings.some(...) twice (or otherwise assert both exist) referencing the
existing result.findings, finding.severity and finding.message symbols.
| const bunLockBinaryPath = | ||
| basename(targetDirOrPath) === 'bun.lockb' ? targetDirOrPath : join(targetDirOrPath, 'bun.lockb') | ||
|
|
||
| if (!existsSync(bunLockBinaryPath)) { | ||
| return [] | ||
| } | ||
|
|
||
| try { | ||
| const result = spawnSync('bun', ['pm', 'ls', '--all'], { | ||
| cwd: bunProjectDirectory(targetDirOrPath), |
There was a problem hiding this comment.
Use the project directory when falling back from an explicit bun.lock path.
Friendly heads-up: if targetDirOrPath is /repo/bun.lock, the fallback currently checks /repo/bun.lock/bun.lockb, so sibling bun.lockb files are missed.
🐛 Proposed fix
+ const projectDir = bunProjectDirectory(targetDirOrPath)
const bunLockBinaryPath =
- basename(targetDirOrPath) === 'bun.lockb' ? targetDirOrPath : join(targetDirOrPath, 'bun.lockb')
+ basename(targetDirOrPath) === 'bun.lockb' ? targetDirOrPath : join(projectDir, 'bun.lockb')
if (!existsSync(bunLockBinaryPath)) {
return []
}
@@
const result = spawnSync('bun', ['pm', 'ls', '--all'], {
- cwd: bunProjectDirectory(targetDirOrPath),
+ cwd: projectDir,
encoding: 'utf8',
stdio: ['ignore', 'pipe', 'ignore'],
timeout: BUN_PM_LS_TIMEOUT_MS,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const bunLockBinaryPath = | |
| basename(targetDirOrPath) === 'bun.lockb' ? targetDirOrPath : join(targetDirOrPath, 'bun.lockb') | |
| if (!existsSync(bunLockBinaryPath)) { | |
| return [] | |
| } | |
| try { | |
| const result = spawnSync('bun', ['pm', 'ls', '--all'], { | |
| cwd: bunProjectDirectory(targetDirOrPath), | |
| const projectDir = bunProjectDirectory(targetDirOrPath) | |
| const bunLockBinaryPath = | |
| basename(targetDirOrPath) === 'bun.lockb' ? targetDirOrPath : join(projectDir, 'bun.lockb') | |
| if (!existsSync(bunLockBinaryPath)) { | |
| return [] | |
| } | |
| try { | |
| const result = spawnSync('bun', ['pm', 'ls', '--all'], { | |
| cwd: projectDir, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/scanner/src/parsers/bun.ts` around lines 130 - 139, The fallback
builds bun.lockb using targetDirOrPath directly, which breaks when
targetDirOrPath is a path to bun.lock (e.g., /repo/bun.lock) because it joins a
file path with bun.lockb; change the logic to compute a projectDir =
basename(targetDirOrPath) === 'bun.lock' ? dirname(targetDirOrPath) :
targetDirOrPath, then compute bunLockBinaryPath = basename(targetDirOrPath) ===
'bun.lockb' ? targetDirOrPath : join(projectDir, 'bun.lockb'), and use
bunProjectDirectory(projectDir) as the cwd for spawnSync so sibling bun.lockb
files next to bun.lock are found (update references to bunLockBinaryPath,
targetDirOrPath, and bunProjectDirectory accordingly).
| function packageNameFromPath(pkgPath: string): string { | ||
| const segments = pkgPath.split(/node_modules\//).filter(Boolean) | ||
| return segments.at(-1) ?? '' |
There was a problem hiding this comment.
Skip non-dependency package-lock entries.
packageNameFromPath('packages/api') currently returns packages/api, so workspace/package-root entries can be reported as dependencies and matched by detectors.
🐛 Proposed fix
-function packageNameFromPath(pkgPath: string): string {
+function packageNameFromPath(pkgPath: string): string | null {
+ if (!pkgPath.includes('node_modules/')) {
+ return null
+ }
+
const segments = pkgPath.split(/node_modules\//).filter(Boolean)
- return segments.at(-1) ?? ''
+ return segments.at(-1) ?? null
}Also applies to: 67-71
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/scanner/src/parsers/npm.ts` around lines 38 - 40,
packageNameFromPath currently returns the full path for inputs that don't
contain node_modules (e.g., "packages/api"), causing workspace roots to be
treated as dependencies; update packageNameFromPath to first check whether
pkgPath includes "node_modules/" and if not return an empty string (or
null/undefined per project convention), otherwise split on /node_modules\// and
return the last segment; apply the same guard/fix to the analogous logic around
lines 67-71 (the other path-to-package-name helper used for package-lock
entries) so only paths that actually contain node_modules are parsed as
dependency names.
| const [, name, version] = match | ||
| if (!name || !version) { | ||
| return null | ||
| } | ||
|
|
||
| return { | ||
| name, | ||
| version, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Inspect threat matching and requirements parsing for package-name normalization.
# Expected: either requirements parsing or threat matching should normalize Python package names consistently.
rg -n -C3 'getThreatVersionMatches|packageName|pkg\.name|toLowerCase|normalize|replace\(/\[-_\.\\]' \
--glob '!**/node_modules/**' \
--glob '!**/dist/**' \
--glob '!**/build/**'Repository: miccy/wormsCTRL
Length of output: 21408
Normalize Python package names before threat matching.
PyPI names normalize case-insensitively and collapse ., _, and - to a canonical form. The requirements parser returns raw package names, which can miss IOC matches for pins like LiteLLM==1.82.7 or underscore/dash variants. Additionally, the threat matching function normalizePackageVersion in packages/scanner/src/threats.ts (line 45) only lowercases names but does not normalize special characters, making it incomplete.
Normalize package names in the requirements parser before returning them, and update normalizePackageVersion to also normalize special characters.
🐍 Proposed fix
+function normalizeRequirementName(name: string): string {
+ return name.toLowerCase().replace(/[-_.]+/g, '-')
+}
+
function parsePinnedRequirement(line: string): LockfilePackage | null {
@@
return {
- name,
+ name: normalizeRequirementName(name),
version,
}
}Additionally, update normalizePackageVersion in packages/scanner/src/threats.ts (line 45) to:
function normalizePackageVersion(name: string, version: string): string {
return `${name.toLowerCase().replace(/[-_.]+/g, '-')}@${version}`
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const [, name, version] = match | |
| if (!name || !version) { | |
| return null | |
| } | |
| return { | |
| name, | |
| version, | |
| function normalizeRequirementName(name: string): string { | |
| return name.toLowerCase().replace(/[-_.]+/g, '-') | |
| } | |
| function parsePinnedRequirement(line: string): LockfilePackage | null { | |
| const [, name, version] = match | |
| if (!name || !version) { | |
| return null | |
| } | |
| return { | |
| name: normalizeRequirementName(name), | |
| version, | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/scanner/src/parsers/requirements.ts` around lines 42 - 49, The
requirements parser currently returns raw package names which can miss IOC
matches; update the parser in packages/scanner/src/parsers/requirements.ts so
that the returned name is normalized (lowercased and collapse/replace runs of
'.', '_', '-' into a single '-' e.g. name.toLowerCase().replace(/[-_.]+/g,'-'))
before returning from the function that destructures match into name/version;
also update the normalizePackageVersion function in
packages/scanner/src/threats.ts to apply the same normalization to the name
portion (lowercase + collapse special chars) so threat matching uses the same
canonical form.
| async function readThreatObject(path: string): Promise<ThreatObject | null> { | ||
| try { | ||
| const content = await readTextFile(path) | ||
| return JSON.parse(content) as ThreatObject | ||
| } catch { | ||
| return null | ||
| } | ||
| } | ||
|
|
||
| async function loadThreatDatabaseFromDisk(): Promise<ThreatDatabase> { | ||
| const threatsDir = findThreatsDir() | ||
| if (!threatsDir) { | ||
| return { | ||
| threats: [], | ||
| maliciousVersions: new Map(), | ||
| phantomDependencies: new Map(), | ||
| } | ||
| } | ||
|
|
||
| const threatFiles = readdirSync(threatsDir) | ||
| .filter((entry) => extname(entry) === '.json') | ||
| .map((entry) => resolve(threatsDir, entry)) | ||
|
|
||
| const threatObjects = await Promise.all(threatFiles.map((path) => readThreatObject(path))) | ||
| const threats = threatObjects.filter((threat): threat is ThreatObject => threat !== null) | ||
| const maliciousVersions = new Map<string, ThreatVersionMatch[]>() | ||
| const phantomDependencies = new Map<string, ThreatVersionMatch[]>() | ||
|
|
||
| for (const threat of threats) { | ||
| for (const version of threat.indicators_of_compromise.malicious_versions) { | ||
| const key = normalizePackageVersion(version.package, version.version) | ||
| const match: ThreatVersionMatch = { | ||
| threat, | ||
| package: version.package, | ||
| version: version.version, | ||
| reason: version.reason, | ||
| } | ||
|
|
||
| const currentMatches = maliciousVersions.get(key) ?? [] | ||
| currentMatches.push(match) | ||
| maliciousVersions.set(key, currentMatches) | ||
|
|
||
| if (version.reason?.toLowerCase().includes('phantom dependency')) { | ||
| const currentPhantoms = phantomDependencies.get(key) ?? [] | ||
| currentPhantoms.push(match) | ||
| phantomDependencies.set(key, currentPhantoms) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Malformed threat JSON will crash the loader — please validate the shape 🛡️
readThreatObject at Line 62 only guards against parse errors and then blind-casts the result to ThreatObject. If any file under threats/ is missing indicators_of_compromise.malicious_versions (or has it as a non-array), the loop at Line 88 throws TypeError: ... is not iterable, which propagates out of the top-level await on Line 116 and makes the entire scanner module fail to import — not just one finding lost, but the whole CLI unusable.
Since this PR already adds Zod-based threat validation in packages/engine/src/validate.ts, the friendliest fix is to reuse it here (or at minimum defensively guard the nested access):
🛡️ Suggested hardening
async function readThreatObject(path: string): Promise<ThreatObject | null> {
try {
const content = await readTextFile(path)
- return JSON.parse(content) as ThreatObject
+ const parsed = JSON.parse(content)
+ // Minimal structural check; prefer validateThreatObject from `@engine` if available.
+ if (
+ !parsed ||
+ typeof parsed.id !== 'string' ||
+ !parsed.indicators_of_compromise ||
+ !Array.isArray(parsed.indicators_of_compromise.malicious_versions)
+ ) {
+ return null
+ }
+ return parsed as ThreatObject
} catch {
return null
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/scanner/src/threats.ts` around lines 59 - 107, readThreatObject
currently blind-casts parsed JSON to ThreatObject, which lets malformed files
crash loadThreatDatabaseFromDisk when iterating
indicators_of_compromise.malicious_versions; fix by validating the parsed object
with the Zod threat validator exported from packages/engine/src/validate.ts
(import the validator, e.g., parse/validate function) inside readThreatObject
and return null on validation failure (or log and skip), and ensure
loadThreatDatabaseFromDisk only processes ThreatObject instances (keep the
existing type guard filter), protecting the loop over
indicators_of_compromise.malicious_versions from non-array values.
| async function loadThreatDatabaseFromDisk(): Promise<ThreatDatabase> { | ||
| const threatsDir = findThreatsDir() | ||
| if (!threatsDir) { | ||
| return { | ||
| threats: [], | ||
| maliciousVersions: new Map(), | ||
| phantomDependencies: new Map(), | ||
| } | ||
| } | ||
|
|
||
| const threatFiles = readdirSync(threatsDir) | ||
| .filter((entry) => extname(entry) === '.json') | ||
| .map((entry) => resolve(threatsDir, entry)) | ||
|
|
||
| const threatObjects = await Promise.all(threatFiles.map((path) => readThreatObject(path))) |
There was a problem hiding this comment.
readdirSync can still throw after the existsSync check
existsSync doesn't protect against EACCES or TOCTOU races, and any throw here escapes the top-level await on Line 116 and prevents the module from loading at all. Since the file's overall philosophy is "missing/unreadable threats produce empty results," a small try/catch around the directory read would stay true to that promise.
🛡️ Suggested guard
- const threatFiles = readdirSync(threatsDir)
- .filter((entry) => extname(entry) === '.json')
- .map((entry) => resolve(threatsDir, entry))
+ let threatFiles: string[]
+ try {
+ threatFiles = readdirSync(threatsDir)
+ .filter((entry) => extname(entry) === '.json')
+ .map((entry) => resolve(threatsDir, entry))
+ } catch {
+ return { threats: [], maliciousVersions: new Map(), phantomDependencies: new Map() }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/scanner/src/threats.ts` around lines 68 - 82, The call to
readdirSync in loadThreatDatabaseFromDisk can still throw (EACCES/TOCTOU) and
should be guarded so the module doesn't crash; wrap the readdirSync + mapping to
threatFiles and the Promise.all(readThreatObject(...)) call in a try/catch
inside loadThreatDatabaseFromDisk and on error return the same empty
ThreatDatabase shape ({ threats: [], maliciousVersions: new Map(),
phantomDependencies: new Map() }) so unreadable/missing threat data yields empty
results; keep using the existing helpers (findThreatsDir, readThreatObject) and
avoid rethrowing the error so the top-level await won't fail.
| export function toFindingSeverity(severity: ThreatSeverity): Severity { | ||
| switch (severity) { | ||
| case 'CRITICAL': | ||
| return 'critical' | ||
| case 'HIGH': | ||
| return 'high' | ||
| case 'MEDIUM': | ||
| return 'medium' | ||
| } |
There was a problem hiding this comment.
Switch falls through to undefined for any unexpected severity value
Because ThreatObject is cast from raw JSON (Line 62) without validation, a fixture with severity: "LOW" / "INFO" / a typo passes the type system and then this switch returns undefined at runtime — violating the declared Severity return type and potentially propagating as an invalid Finding.severity into formatText and the JSON output. A safe default keeps the contract honest:
🛡️ Suggested fallback
export function toFindingSeverity(severity: ThreatSeverity): Severity {
switch (severity) {
case 'CRITICAL':
return 'critical'
case 'HIGH':
return 'high'
case 'MEDIUM':
return 'medium'
+ default:
+ return 'medium'
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/scanner/src/threats.ts` around lines 130 - 138, The switch in
toFindingSeverity (which maps ThreatSeverity -> Severity) can fall through to
undefined for unexpected values; update toFindingSeverity to include a
default/fallback branch that returns a safe Severity (e.g., 'low') for any
unhandled ThreatSeverity, so Finding.severity and downstream formatText/JSON
always receive a valid Severity; reference the toFindingSeverity function and
ThreatSeverity/Severity types and ensure the default handles unexpected inputs
(optionally log or assert) rather than letting the switch fall through.
| @@ -0,0 +1 @@ | |||
| litellm==1.82.7 | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check whether CI or helper scripts recursively scan/install requirements.txt fixtures.
# Expected: any OSV/dependency scanner config should exclude tests/fixtures/**, and pip install commands should not target this fixture tree.
rg -n -C3 'osv|pip\s+install|requirements\.txt|tests/fixtures' \
--glob '!**/node_modules/**' \
--glob '!**/dist/**' \
--glob '!**/build/**'Repository: miccy/wormsCTRL
Length of output: 12319
🏁 Script executed:
# Search for CI/CD configuration files
fd -e yaml -e yml -e json | grep -E '(\.github|\.gitlab|\.circleci|bitbucket|travis|jenkins)' | head -20Repository: miccy/wormsCTRL
Length of output: 41
🏁 Script executed:
# Check .gitignore for fixture exclusions
find . -name '.gitignore' -type f | head -10Repository: miccy/wormsCTRL
Length of output: 71
🏁 Script executed:
# Look for scanner exclude patterns and test isolation
rg -n 'exclude|skip|ignore' --glob '*.ts' --glob '*.js' packages/scanner/src/ | head -30Repository: miccy/wormsCTRL
Length of output: 141
🏁 Script executed:
# Check test file to see how fixture is isolated
cat -n packages/scanner/src/__tests__/threats.test.ts | head -50Repository: miccy/wormsCTRL
Length of output: 1992
🏁 Script executed:
# Read .gitignore to see if tests/fixtures is excluded
cat -n ./.gitignoreRepository: miccy/wormsCTRL
Length of output: 2230
🏁 Script executed:
# Search more carefully for CI config files (check root and common locations)
ls -la .github/ 2>/dev/null || echo "No .github directory"
ls -la .gitlab-ci.yml 2>/dev/null || echo "No .gitlab-ci.yml"
ls -la .circleci/ 2>/dev/null || echo "No .circleci directory"Repository: miccy/wormsCTRL
Length of output: 468
🏁 Script executed:
# Check if scanner.ts has any built-in path exclusions
cat -n packages/scanner/src/scan.ts | head -80Repository: miccy/wormsCTRL
Length of output: 3059
🏁 Script executed:
# Check if there are any npm scripts or build configs that might run scanner
cat -n package.json | grep -A 5 -B 5 'test\|scan\|build'Repository: miccy/wormsCTRL
Length of output: 1351
🏁 Script executed:
# Check the detect.sh script to see if it has exclusions
cat -n ./packages/scripts/detect.shRepository: miccy/wormsCTRL
Length of output: 36544
🏁 Script executed:
# Check GitHub workflows to see what they scan
ls -la .github/workflows/Repository: miccy/wormsCTRL
Length of output: 628
🏁 Script executed:
# Look at the test setup more carefully - is it isolated?
rg -n 'clean-baseline|scan' packages/scanner/src/__tests__/threats.test.tsRepository: miccy/wormsCTRL
Length of output: 436
🏁 Script executed:
# Check if there's any documentation about scanner usage and test fixtures
rg -n 'fixture|test|exclude' README.md | head -20Repository: miccy/wormsCTRL
Length of output: 222
🏁 Script executed:
# Check the supply-chain-security workflow to see if it uses scanner
cat -n .github/workflows/supply-chain-security.ymlRepository: miccy/wormsCTRL
Length of output: 6429
🏁 Script executed:
# Check the CI workflow to see what it runs
cat -n .github/workflows/ci.ymlRepository: miccy/wormsCTRL
Length of output: 1356
🏁 Script executed:
# Check if the scanner package is used in any build scripts or CI commands
rg -n 'scanner|scan\(' .github/workflows/ | head -20Repository: miccy/wormsCTRL
Length of output: 41
🏁 Script executed:
# Look for the fixture directory structure to understand what's there
find tests/fixtures -type f -name "*.txt" -o -name "*.lock*" | head -20Repository: miccy/wormsCTRL
Length of output: 106
🏁 Script executed:
# Check what the audit scripts do
cat -n packages/scripts/quick-audit.sh | head -50Repository: miccy/wormsCTRL
Length of output: 1297
🏁 Script executed:
# Check the full audit script
cat -n packages/scripts/full-audit.sh | head -100Repository: miccy/wormsCTRL
Length of output: 3121
🏁 Script executed:
# Search for where the scanner package is actually used/imported
rg -n 'from.*scanner|import.*scanner' --glob '*.ts' --glob '*.js'Repository: miccy/wormsCTRL
Length of output: 41
🏁 Script executed:
# Check if there are any pip install or requirements.txt parsing in CI beyond the test
rg -n 'pip|requirements' .github/workflows/Repository: miccy/wormsCTRL
Length of output: 223
Consider adding tests/fixtures/ exclusion to scanner and CI tooling.
The litellm==1.82.7 fixture is useful test data, but since it's a real requirements.txt, it's accessible to any scanner that recurses the repo. While the current CI's detect.sh focuses on IOC file detection rather than vulnerability scanning, this fixture could be flagged if an OSV-style tool is later added or if someone runs the scanner directly on tests/fixtures/. Recommend: (1) add tests/fixtures/** exclusion to any OSV/dependency scanner config, (2) document that test fixtures contain intentionally malicious packages and should be excluded from automated security gates.
🧰 Tools
🪛 OSV Scanner (2.3.5)
[CRITICAL] 1-1: litellm 1.82.7: Malicious code in litellm (PyPI)
(MAL-2026-2144)
[CRITICAL] 1-1: litellm 1.82.7: Two litellm versions published containing credential harvesting malware
(PYSEC-2026-2)
[CRITICAL] 1-1: litellm 1.82.7: LiteLLM: Privilege escalation via unrestricted proxy configuration endpoint
[CRITICAL] 1-1: litellm 1.82.7: Two LiteLLM versions published containing credential harvesting malware
[CRITICAL] 1-1: litellm 1.82.7: LiteLLM: Password hash exposure and pass-the-hash authentication bypass
[CRITICAL] 1-1: litellm 1.82.7: LiteLLM: Authentication bypass via OIDC userinfo cache key collision
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/fixtures/teampcp-litellm/requirements.txt` at line 1, Add an exclusion
for the test fixture directory containing the sample requirements file so
dependency/OSV scanners skip it: update your scanner/CI configuration to ignore
the tests/fixtures/** pattern (so the specific requirements.txt containing
litellm==1.82.7 is excluded) and add a short note to the repository’s security
or CI docs explaining that tests/fixtures contain intentionally problematic
packages and must be excluded from automated dependency/vulnerability checks;
ensure this change is applied to any scanner configs used in CI (e.g.,
detect/OSV scanner settings) and documented for developers.
@SocketSecurity ignore npm/axios@1.14.1 |
|
@SocketSecurity ignore npm/axios@1.14.1 |
Updated the search component layout to use CSS Grid, allowing the cancel button to have a flexible width based on its content. This accommodates different word lengths in localized translations. - Replaced absolute positioning with CSS Grid on .dialog-frame. - Used display: contents for intermediate search containers to allow nested elements to participate in the grid. - Removed hardcoded --sl-search-cancel-space and width calculations. - Added a Playwright test to verify layout flexibility.
Removed debug console.log statements from apps/docs/test-loader.ts. Since the file only contained these logs and their supporting boilerplate, and it was not referenced anywhere in the codebase, the entire file was deleted to improve code health.
STIX 2.1 bundle IDs should be unique and ideally non-predictable. Replacing Date.now() with crypto.randomUUID() ensures uniqueness and adheres to security best practices for identifier generation.
Removed the unused underscore from the _dryRun parameter and implemented conditional logging to respect the dryRun flag.
- Added validatePath utility to prevent null byte injection and empty paths - Applied validation at all scanner entry points: scan(), findLockfiles(), parseNpmLockfile(), parseYarnLockfile(), detectInjection() - Added security tests for path validation
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 64 out of 72 changed files in this pull request and generated 6 comments.
Files not reviewed (3)
- tests/fixtures/axios-compromise/package-lock.json: Language not supported
- tests/fixtures/clean-baseline/package-lock.json: Language not supported
- tests/fixtures/shai-hulud-worm/package-lock.json: Language not supported
Comments suppressed due to low confidence (1)
packages/scanner/src/scan.ts:35
scan()validatestargetwithvalidatePath(), but downstream it usesresolve(target, name)which produces absolute lockfile paths; those absolute paths are then passed into parsers that also callvalidatePath(). With the currentvalidatePathbehavior, scanning a normal directory will fail once any parser is invoked. Either stop validating parser inputs as “non-absolute”, or normalize the contract so parsers accept absolute lockfile paths.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import { expect, test } from "bun:test"; | ||
| import { extractStixBundle, type RawIOC } from "./ioc"; | ||
|
|
||
| test("extractStixBundle should generate a valid UUIDv4 for the bundle ID", () => { |
| * STIX 2.1 spec: https://docs.oasis-open.org/cti/stix/v2.1/ | ||
| */ | ||
|
|
||
| import { randomUUID } from "node:crypto"; |
| // Optionally reject absolute paths if only relative paths are expected | ||
| if (path.startsWith('/') || /^[a-zA-Z]:/.test(path)) { | ||
| throw new Error('Invalid path: absolute paths not allowed'); | ||
| } |
| export function parseLockfiles(targetDir: string): LockfilePackage[] { | ||
| return [ | ||
| ...parseNpmLockfile(targetDir), | ||
| ...parseYarnLockfile(targetDir), | ||
| ...parsePnpmLockfile(targetDir), | ||
| ...parseBunLockfile(targetDir), | ||
| ] |
| const target = resolve(parsedArgs.target) | ||
| const result = await scan(target) | ||
| const rendered = renderResult(result, parsedArgs.format) |
| import { describe, expect, test } from "bun:test"; | ||
| import { osvToThreatProfile, type OsvPackage } from "../src/ingestion/osv"; | ||
|
|
||
| describe("osvToThreatProfile", () => { | ||
| const baseOsv: OsvPackage = { | ||
| id: "GHSA-xxxx-yyyy-zzzz", | ||
| modified: "2023-01-01T00:00:00Z", | ||
| published: "2023-01-01T00:00:00Z", | ||
| summary: "A vulnerability summary", | ||
| affected: [ | ||
| { | ||
| package: { | ||
| name: "test-package", | ||
| ecosystem: "npm", | ||
| }, | ||
| }, | ||
| ], | ||
| }; | ||
|
|
||
| test("converts basic OSV record correctly", () => { | ||
| const result = osvToThreatProfile(baseOsv) as any; | ||
| expect(result.id).toBe(baseOsv.id); | ||
| expect(result.name).toBe("test-package"); | ||
| expect(result.ecosystem).toBe("npm"); | ||
| expect(result.severity).toBe("LOW"); // Default score 0 -> LOW | ||
| expect(result.status).toBe("UNDER_REVIEW"); | ||
| }); | ||
|
|
||
| test("maps CVSS_V3 scores to severity correctly", () => { | ||
| const testCases = [ | ||
| { score: "9.5", expected: "CRITICAL" }, | ||
| { score: "9.0", expected: "CRITICAL" }, | ||
| { score: "8.9", expected: "HIGH" }, | ||
| { score: "7.0", expected: "HIGH" }, | ||
| { score: "6.9", expected: "MEDIUM" }, | ||
| { score: "4.0", expected: "MEDIUM" }, | ||
| { score: "3.9", expected: "LOW" }, | ||
| { score: "0.0", expected: "LOW" }, | ||
| ]; | ||
|
|
||
| for (const { score, expected } of testCases) { | ||
| const osv = { | ||
| ...baseOsv, | ||
| severity: [{ type: "CVSS_V3", score }], | ||
| }; | ||
| const result = osvToThreatProfile(osv) as any; | ||
| expect(result.severity).toBe(expected); | ||
| } | ||
| }); | ||
|
|
||
| test("handles non-CVSS_V3 severity types by defaulting to score 0", () => { | ||
| const osv = { | ||
| ...baseOsv, | ||
| severity: [{ type: "CVSS_V2", score: "10.0" }], | ||
| }; | ||
| const result = osvToThreatProfile(osv) as any; | ||
| expect(result.severity).toBe("LOW"); | ||
| }); | ||
|
|
||
| test("handles missing severity array", () => { | ||
| const osv = { | ||
| ...baseOsv, | ||
| severity: undefined, | ||
| }; | ||
| const result = osvToThreatProfile(osv) as any; | ||
| expect(result.severity).toBe("LOW"); | ||
| }); | ||
|
|
||
| test("uses name and ecosystem from first affected package", () => { | ||
| const osv: OsvPackage = { | ||
| ...baseOsv, | ||
| affected: [ | ||
| { package: { name: "pkg1", ecosystem: "pypi" } }, | ||
| { package: { name: "pkg2", ecosystem: "npm" } }, | ||
| ], | ||
| }; | ||
| const result = osvToThreatProfile(osv) as any; | ||
| expect(result.name).toBe("pkg1"); | ||
| expect(result.ecosystem).toBe("pypi"); | ||
| }); |
BREAKING CHANGES: - sha256 field in file_artifacts now requires 64-char hex or null - pnpm parser separates resolved and integrity fields - IOC/remediation arrays reject empty strings Security: - SSRF protection in ingest.ts (private URL blocking, timeouts, size limits) - Path traversal validation in scanner (null byte, empty path) Fixed: - Bun/pnpm alias descriptor parsing (@npm: patterns) - Bun lockfile deduplication (prefer bun.lock over bun.lockb) - Per-entry threat catalog loading (one bad file no longer drops all) - Threat shape validation before indexing - readdirSync TOCTOU race in threat loader - toFindingSeverity default case - CLI writeFileSync error handling - npm parser: skip root/link entries in v3 lockfiles - Node ESM: replaced bare JSON imports with fs.readFileSync - Empty sha256 indicators replaced with null - Removed redundant parseNpmLock wrapper - Dynamic test discovery for threat catalog validation - Playwright config: pipe stdout/stderr for CI debugging
Dependencies updated: - ora: 9.3.0 → 9.4.0 - @astrojs/check: 0.9.8 → 0.9.9 - @astrojs/starlight: 0.38.3 → 0.38.4 - astro: 6.1.8 → 6.2.1 - marked: 18.0.2 → 18.0.3 - vite: 7.0.0 → 8.0.10 (major) - zod: 4.3.6 → 4.4.2 - @biomejs/biome: 2.4.12 → 2.4.14 - turbo: 2.9.6 → 2.9.8 - @types/bun: 1.3.12 → 1.3.13 (all workspaces) Renamed: - Replaced all 'dont-be-shy-hulud' references with 'wormsCTRL' across 37 files (docs, scripts, configs, AGENTS.md, SECURITY.md) - Fixed package.json repository URLs to match GitHub repo name
- Fixed curl/script URLs in installation.md and quickstart.md to use correct path packages/scripts/detect.sh (not scripts/detect.sh) - Expanded isPrivateUrl loopback check from exact 127.0.0.1 to full 127.0.0.0/8 range via regex - Added TODO for IPv6 private range coverage (fc00::/7, fe80::/10) Rejected CodeRabbit findings (5 of 8): - dont-be-shy-hulud rename suggestions: that is the OLD name, wormsCTRL is the current GitHub repo - worms-ctrl npm name: correct published name per package.json bin - event-stream sha256 null: original values were empty strings, null correctly represents unknown hashes per updated schema
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3316176667
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try { | ||
| parsedContent = JSON.parse(content) | ||
| } catch { | ||
| return [] |
There was a problem hiding this comment.
Parse bun.lock with Bun's text grammar
This parser assumes bun.lock is strict JSON, but Bun's text lockfile format is JSONC-like (for example this repo's bun.lock has trailing commas), so JSON.parse returns [] for valid Bun projects. When a project has only bun.lock (no bun.lockb), scans silently skip all Bun dependencies and miss malicious-package/phantom IOC detections entirely.
Useful? React with 👍 / 👎.
| const versionAtSeparator = normalized.indexOf('@', scopeSeparator + 1) | ||
| if (versionAtSeparator !== -1) { | ||
| return { | ||
| name: normalized.slice(0, versionAtSeparator), | ||
| version: normalized.slice(versionAtSeparator + 1).split('_')[0] ?? '', |
There was a problem hiding this comment.
Parse scoped pnpm aliases before scoped-key split
Scoped alias keys like /@scope/alias@npm:real@1.2.3 hit this scoped-package branch first, so the returned version becomes npm:real@1.2.3 instead of 1.2.3. Even though alias handling exists later in the function, this early return prevents it from running, which causes version-based threat matching to miss IOC hits for scoped aliases.
Useful? React with 👍 / 👎.
This pull request introduces major enhancements to the wormsCTRL project, focusing on AI-powered threat ingestion, improved threat cataloging, expanded lockfile support, and better documentation. It adds a new AI ingestion skeleton for extracting structured threat objects from advisories, integrates Zod validation, and documents several high-profile supply chain attacks. The CLI and scanner now support more formats and workflows, and documentation is updated for both English and Czech audiences.
AI-powered threat ingestion and validation:
ingestAdvisoryinpackages/engine/src/ingest.tsfor extracting structured threat objects from advisories or URLs using OpenAI's API, with graceful fallback if the API key is missing.packages/engine/src/prompt.tsto ensure JSON output matches the expected schema.packages/engine/src/validate.ts, and exposed validation utilities in the engine's public API. [1] [2] [3]Threat catalog and knowledge base:
event-stream,node-ipc,ua-parser-js,ctx, andxz-utilsunderpackages/ioc/threats/, and updated the knowledge base API to load and expose them. [1] [2] [3] [4]README.mdand added a Czech-languagecs/README.mdwith grant context, AI architecture, and threat database details. [1] [2] [3]Scanner, CLI, and lockfile coverage:
--format,--output, and--threatsoptions, and to run the scanner package end-to-end. [1] [2] [3]Testing and validation:
bun:testcoverage for npm lock parsing, injection findings, and engine schema validation.Dependency and environment updates:
zodas a dependency inpackages/engine/package.jsonfor schema validation.These changes make wormsCTRL more robust, extensible, and ready for AI-assisted supply chain threat ingestion and automation.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores