Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Dec 2, 2025

This is an automated pull request to release the candidate branch into production, which will trigger a deployment.
It was created by the [Production PR] action.

…onents (#1849)

Co-authored-by: Daniel Fu <itsnotaka@gmail.com>
@vercel
Copy link

vercel bot commented Dec 2, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
app (staging) Ready Ready Preview Comment Dec 4, 2025 3:48pm
portal (staging) Ready Ready Preview Comment Dec 4, 2025 3:48pm

@comp-ai-code-review
Copy link

comp-ai-code-review bot commented Dec 2, 2025

Comp AI - Code Vulnerability Scan

Analysis in progress...

Reviewing 30 file(s). This may take a few moments.


Powered by Comp AI - AI that handles compliance for you | Reviewed Dec 4, 2025, 02:42 PM

@CLAassistant
Copy link

CLAassistant commented Dec 2, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ Itsnotaka
✅ Marfuen
❌ github-actions[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

* feat(tasks): add screenshot reminder dialog for file uploads

* feat(comments): implement screenshot reminder dialog for file uploads

* refactor(comments): remove unused interfaces and clean up code

---------

Co-authored-by: Daniel Fu <itsnotaka@gmail.com>
Co-authored-by: Mariano Fuentes <marfuen98@gmail.com>
* refactor(api): move logic from SSE to API

* chore(api): add knowledge base document management endpoints and refactor document actions

* refactor(soa): moved SOA feature to  API

* feat(trust-portal): add compliance resource management endpoints and update documentation

* refactor(questionnaire): remove unused actions for answering questions

* refactor(questionnaire): clear questionnaire module

* refactor(soa): enhance SOA service with new utility methods and improve answer processing

* refactor(knowledge-base): clear components

* refactor(vector-store-sync): restructure sync logic for policies, contexts, and knowledge base documents

* refactor(knowledge-base): remove unused components and update document formats

* refactor(api): remove duplicate DevicesModule import

* refactor(api): rename compliance framework and update related logic

* refactor(ci): remove Vercel credentials from deployment workflows

* refactor(api): update compliance framework references to use TrustFramework

* refactor(api): enhance SSE handling and add sanitization utilities

* refactor(api): update SSE utilities to enhance security and sanitization

---------

Co-authored-by: Tofik Hasanov <annexcies@gmail.com>
Co-authored-by: Mariano Fuentes <marfuen98@gmail.com>
@vercel vercel bot temporarily deployed to staging – portal December 4, 2025 14:22 Inactive
@vercel vercel bot temporarily deployed to staging – app December 4, 2025 14:22 Inactive
* refactor(api): move logic from SSE to API

* chore(api): add knowledge base document management endpoints and refactor document actions

* refactor(soa): moved SOA feature to  API

* feat(trust-portal): add compliance resource management endpoints and update documentation

* refactor(questionnaire): remove unused actions for answering questions

* refactor(questionnaire): clear questionnaire module

* refactor(soa): enhance SOA service with new utility methods and improve answer processing

* refactor(knowledge-base): clear components

* refactor(vector-store-sync): restructure sync logic for policies, contexts, and knowledge base documents

* refactor(knowledge-base): remove unused components and update document formats

* refactor(api): remove duplicate DevicesModule import

* refactor(api): rename compliance framework and update related logic

* refactor(ci): remove Vercel credentials from deployment workflows

* refactor(api): update compliance framework references to use TrustFramework

* refactor(api): enhance SSE handling and add sanitization utilities

* refactor(api): update SSE utilities to enhance security and sanitization

* chore(api): add mammoth and @types/multer dependencies

* feat(trust-portal): add drag-and-drop file upload functionality for certificates

---------

Co-authored-by: Tofik Hasanov <annexcies@gmail.com>
Co-authored-by: Mariano Fuentes <marfuen98@gmail.com>
@comp-ai-code-review
Copy link

Comp AI - Code Vulnerability Scan

Analysis in progress...

Reviewing 30 file(s). This may take a few moments.


Powered by Comp AI - AI that handles compliance for you | Reviewed Dec 4, 2025, 02:22 PM

* chore(package): lock packageManager to bun@1.3.3

* refactor(policy): update policy details and AI assistant components for improved functionality

* feat(docs): add ai-policy-editor page to documentation

---------

Co-authored-by: Daniel Fu <itsnotaka@gmail.com>
Co-authored-by: Mariano Fuentes <marfuen98@gmail.com>
@comp-ai-code-review
Copy link

comp-ai-code-review bot commented Dec 4, 2025

🔒 Comp AI - Security Review

🔴 Risk Level: HIGH

OSV: 2 HIGH CVEs in xlsx@0.18.5 and 1 LOW CVE in ai@5.0.0. Repo contains plaintext DB credentials in .env.example and SELF_HOSTING.md. Code shows shell/header injection risks in customPrismaExtension.ts and s3-operations.ts.


📦 Dependency Vulnerabilities

🟠 NPM Packages (HIGH)

Risk Score: 8/10 | Summary: 2 high, 1 low CVEs found

Package Version CVE Severity CVSS Summary Fixed In
xlsx 0.18.5 GHSA-4r6h-8v6p-xvw6 HIGH N/A Prototype Pollution in sheetJS No fix yet
xlsx 0.18.5 GHSA-5pgg-2g8v-p4x9 HIGH N/A SheetJS Regular Expression Denial of Service (ReDoS) No fix yet
ai 5.0.0 GHSA-rwvc-j5jr-mgvh LOW N/A Vercel’s AI SDK's filetype whitelists can be bypassed when uploading files 5.0.52

🛡️ Code Security Analysis

View 20 file(s) with issues

🟡 .env.example (MEDIUM Risk)

# Issue Risk Level
1 Example DB URL contains plaintext password 'postgres:pass' in comment MEDIUM
2 Potentially overly-permissive AUTH_TRUSTED_ORIGINS (wildcard + localhost) MEDIUM
3 NEXT_PUBLIC_PORTAL_URL is public and exposes portal URL client-side MEDIUM
4 Duplicate TRIGGER_SECRET_KEY variable can cause overrides/confusion MEDIUM
5 Env file may be committed and leak populated secrets MEDIUM
6 Multiple secret keys present as placeholders risk accidental exposure if saved MEDIUM

Recommendations:

  1. Remove or sanitize credentials in comments (remove 'postgres:pass' from the DATABASE_URL example). Provide a template without real-looking credentials, e.g. postgresql://:@host:port/db
  2. Restrict AUTH_TRUSTED_ORIGINS to explicit, necessary origins only. Avoid wildcard subdomain patterns where possible and avoid broadly trusting localhost entries in production configurations.
  3. Avoid exposing anything sensitive via NEXT_PUBLIC_* variables. It's acceptable to expose non-sensitive public URLs, but ensure no secrets (API keys, secrets, tokens) use NEXT_PUBLIC_. Consider serving env-based URLs from server-side configs when secrecy is required.
  4. Remove duplicate environment variable keys (TRIGGER_SECRET_KEY) to prevent accidental overrides; validate env loading order in deployment scripts and CI.
  5. Do not commit populated .env files to version control. Add .env to .gitignore and use a secrets manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager, etc.) or CI/CD secret injection for deployments.
  6. Treat placeholders for secret keys as sensitive during development: use per-environment secrets, rotate credentials before sharing, and educate developers to never paste real secrets into example files.

🟡 .github/workflows/trigger-api-tasks-deploy-main.yml (MEDIUM Risk)

# Issue Risk Level
1 Debug log level may expose secrets in CI logs MEDIUM
2 Secret passed to third-party CLI (bunx trigger.dev) can leak MEDIUM
3 Runtime installation of remote package (bunx) is supply-chain risk MEDIUM
4 Allowing install scripts on dependency install enables malicious hooks MEDIUM
5 Unpinned GitHub Actions/external actions increase supply-chain risk MEDIUM
6 Possibly custom/unknown runner may expose secrets if self-hosted MEDIUM

Recommendations:

  1. Remove or avoid --log-level debug when running commands that have access to secrets (replace with info/warn). If debug is necessary, ensure the CLI is verified not to log secrets and restrict logs to trusted storage.
  2. Avoid passing long-lived secrets directly to third-party CLIs. Use GitHub OIDC where possible or short-lived tokens scoped to the minimum required permissions. Restrict the TRIGGER_ACCESS_TOKEN scope and rotate regularly.
  3. Avoid runtime fetching/executing unpinned remote packages. Pin the third-party CLI to an immutable release (commit SHA or pinned artifact) and verify its integrity before executing. Consider vendoring or installing from a verified lockfile/artifact.
  4. Run dependency installs with install-scripts disabled (e.g., --ignore-scripts) when possible and perform an audit of package.json post-install. If scripts are required, ensure lockfiles are strict (frozen-lockfile) and CI runs in an isolated/trusted environment.
  5. Pin external GitHub Actions/releases to immutable commit SHAs rather than floating tags (e.g., actions/checkout@) so updates cannot be injected silently.
  6. If using self-hosted or nonstandard runners (runs-on: warp-ubuntu-latest-arm64-4x), ensure the runners are hardened, up-to-date, and under your control. Prefer GitHub-hosted runners or tightly controlled self-hosted runners with minimal privileges and network access for workflows that handle secrets.

🔴 .github/workflows/trigger-api-tasks-deploy-release.yml (HIGH Risk)

# Issue Risk Level
1 Custom runner 'warp-ubuntu-latest-arm64-4x' may be untrusted/self-hosted HIGH
2 bunx runs packages remotely, allowing supply-chain code execution HIGH
3 Dependency installs run lifecycle scripts (postinstall) enabling remote code HIGH
4 Third-party action oven-sh/setup-bun@v2 not pinned to a commit SHA HIGH
5 TRIGGER_ACCESS_TOKEN exposed to runner env; can be exfiltrated if compromised HIGH
6 Floating Node version '20.x' may introduce unexpected changes HIGH

Recommendations:

  1. Runners: Prefer GitHub-hosted runners for CI or ensure self-hosted runners are hardened and limited. If using self-hosted, restrict network egress, run jobs as unprivileged users, apply OS/hardening patches, regularly rotate runner credentials, and limit which repos/orgs can use the runner.
  2. bunx / remote exec: Avoid executing remote packages at runtime. Instead install pinned CLI packages as devDependencies (with lockfile), run the local binary (node_modules/.bin or equivalent), or vendor the CLI. If bunx must be used, ensure strict version pinning and verify package provenance.
  3. Dependency install scripts: Minimize executing lifecycle scripts. Use frozen lockfiles and consider running installs with --ignore-scripts in build phases where scripts aren't required. Audit dependencies for postinstall scripts, pin dependency versions, and use SBOM/auditing tools to detect risky postinstall behavior.
  4. Pin Actions: Pin third-party actions to full commit SHAs (not floating tags) to prevent supply-chain tampering (e.g., oven-sh/setup-bun@). Optionally add action verification steps or use a actions policy for allowed actions.
  5. Secrets exposure: Limit scope and lifetime of TRIGGER_ACCESS_TOKEN, restrict which environments and branches the secret is available to, enable environment protection and required reviewers for environments, prefer short-lived credentials or OIDC where possible, and rotate tokens if an incident is suspected. Avoid echoing environment variables and restrict runner network access to prevent exfiltration.
  6. Tooling versions: Pin Node (and other critical tooling) to exact versions for reproducible builds (e.g., node-version: "20.14.0"). Use lockfiles and CI images with reproducible toolchains.

🔴 SELF_HOSTING.md (HIGH Risk)

# Issue Risk Level
1 Plaintext DATABASE_URL in .env contains credentials HIGH
2 .env is used via env_file; may be accidentally committed HIGH
3 NEXT_PUBLIC_* env vars expose values to client-side HIGH
4 AUTH_SECRET and BETTER_AUTH_SECRET lack rotation/storage guidance HIGH
5 AWS credentials stored in env can grant broad S3 access HIGH
6 RESEND_API_KEY in env can be abused to send emails if leaked HIGH
7 TRIGGER_SECRET_KEY in env can allow workflow tampering if leaked HIGH
8 UPSTASH/OPENAI API keys in env expose third-party accounts HIGH
9 Ports 3000 & 3002 mapped to host increase attack surface HIGH
10 Docker healthchecks use plain HTTP endpoints HIGH
11 No mention of TLS/HTTPS enforcement for app/portal URLs HIGH
12 No mention of least-privileged IAM roles for cloud keys HIGH
13 No instructions to secure .env file permissions HIGH
14 No secrets management (vault/docker secrets) recommended HIGH
15 NEXT_PUBLIC_API_URL may expose internal API base URL HIGH

Recommendations:

  1. Do not store secrets in plaintext in the repo. Move DATABASE_URL, API keys, and other secrets into a secrets manager (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, GCP Secret Manager) or platform-native secret mechanisms (Docker secrets, Kubernetes Secrets backed by KMS).
  2. Add .env to .gitignore and ensure no committed .env files or example files contain real secrets. Scan the repo history for leaked secrets and rotate if any were committed.
  3. Avoid using NEXT_PUBLIC_* env variables for secrets or sensitive endpoints. Only expose truly public configuration to the client. Keep DB URLs, API keys, and internal URLs server-side.
  4. Use platform secret injection (Docker secrets, Kubernetes) or environment variable encryption at rest, and restrict access to only processes that require them. Implement automatic rotation procedures for AUTH_SECRET/BETTER_AUTH_SECRET and all API keys; store rotation steps in ops runbooks.
  5. Grant least-privilege IAM permissions for cloud credentials (S3, Upstash, OpenAI). Create limited-scope roles/buckets and rotate keys regularly. Use scoped service accounts where supported.
  6. Treat third-party API keys (Resend, Trigger.dev, OpenAI, Upstash) as high-value secrets: restrict usage by IP/hostname where possible, monitor usage, enable alerts, and rotate on suspicion of compromise.
  7. Limit host port mappings to only what's necessary. Place apps behind a reverse proxy/load balancer and firewall; avoid exposing application ports directly unless required for local dev.
  8. Use HTTPS/TLS for all external-facing URLs. Enforce HSTS and redirect HTTP->HTTPS. Ensure DATABASE_URL uses sslmode=require (or equivalent) and that Postgres enforces TLS.
  9. Make healthchecks internal-only or use HTTPS health endpoints. Restrict access to health endpoints via network controls or authentication if they reveal sensitive information.
  10. Document and enforce file permissions for .env and secrets files (e.g., chmod 600). Limit who on the host can read these files.
  11. Recommend and document using secret scanning in CI (git-secrets, truffleHog, GitHub secret scanning) and monitoring for leaked credentials. Add incident response steps for leaked secrets.
  12. Avoid exposing internal API base URLs in client config (NEXT_PUBLIC_API_URL) if they can reveal internal topology; use public-facing, authenticated proxy endpoints for client calls.
  13. Add guidance to use least-privilege, scoped credentials for Trigger.dev/workflows and audit logs for workflow actions. Consider isolating workflow runners from production data where possible.

🟡 apps/api/customPrismaExtension.ts (MEDIUM Risk)

# Issue Risk Level
1 Shell command injection via manifest.runtime in layer command MEDIUM
2 Arbitrary dependency version from dbPackageVersion may pull malicious package MEDIUM
3 Sensitive env vars (DATABASE_URL/DIRECT_URL) logged and added to layer MEDIUM
4 Executing prismaBinary from workingDir may run attacker-controlled binary MEDIUM
5 TOCTOU: existsSync then generate allows race to replace client files MEDIUM

Recommendations:

  1. Avoid interpolating manifest.runtime into a shell command string. Instead, whitelist allowed runtimes and invoke the binary using spawn/execFile with args (no shell) to avoid shell injection. Example: const bin = binaryForRuntime(manifest.runtime) after validating manifest.runtime is an expected value; then spawn(bin, ['node_modules/prisma/build/index.js', 'generate', --schema=./prisma/schema.prisma]).
  2. Do not accept arbitrary dbPackageVersion values from untrusted sources. Pin or validate allowed versions (e.g., maintain an allowlist), or require exact semver ranges from a trusted configuration store to reduce supply-chain risk.
  3. Avoid logging full environment objects containing secrets. Remove DATABASE_URL/DIRECT_URL/DIRECT_DATABASE_URL from debug payloads or mask them before logging. When adding env to layers, ensure only expected variables are passed and document that secrets will be injected at deploy-time rather than baked into logs/artifacts.
  4. Validate the prismaBinary path before executing. Prefer execFile/spawn with absolute path and verify it's owned/trusted (e.g., check package.json dependencies or compute binary checksum). Consider generating the client in a controlled environment rather than executing an unchecked .bin binary from workingDir.
  5. Mitigate TOCTOU by performing atomic operations where possible and avoiding separate existsSync checks followed by actions. Instead attempt the operation and handle EEXIST/EACCES errors, or use file-system primitives that replace files atomically. If checking is necessary, reduce the time between check and use and lock the resource if feasible.

🟡 apps/api/src/app/s3.ts (MEDIUM Risk)

# Issue Risk Level
1 decodeURIComponent can throw on malformed input leading to DoS MEDIUM
2 Non-URL path branch doesn't decode percent-encoding; encoded traversal may bypass checks MEDIUM
3 S3 client uses static env creds instead of IAM role or credential provider MEDIUM
4 s3Client null fallback allows app to run without S3 and may cause runtime errors MEDIUM
5 getFleetAgent lacks runtime validation of 'os' leading to object key injection MEDIUM

Recommendations:

  1. Wrap decodeURIComponent in try/catch and validate percent-encoding before decoding. e.g., validate pathname for %XX sequences of two hex digits or use a safe decoding routine; return a clear error rather than letting the exception propagate.
  2. For the non-URL branch, normalize and percent-decode the input (safely, with validation) before checking for path traversal. After decoding, reject inputs containing '../' or '..' and disallow domain-like patterns.
  3. Avoid hardcoding credentials into S3Client configuration. Prefer AWS SDK default provider chain / IAM roles (EC2/ECS/EKS task role or credential provider) or a secure secrets manager. If env creds must be used, rotate and scope them appropriately and consider using the SDK's credential provider instead of manually passing accessKeyId/secretAccessKey.
  4. Fail fast when S3 cannot be initialized. Instead of exporting a null/dummy s3Client, throw during startup or disable S3-dependent features explicitly so callers cannot silently receive nulls and cause runtime errors. Add health checks that surface misconfiguration.
  5. Whitelist and validate the 'os' parameter at runtime before constructing the S3 Key (e.g., allow only 'macos', 'windows', 'linux'). Also validate/escape the resulting key and/or enforce a Key prefix on S3 to limit access to allowed objects.
  6. Add unit/integration tests for extractS3KeyFromUrl to cover malformed percent-encodings, domain-like inputs, and encoded traversal attempts. Add tests for getFleetAgent to verify invalid 'os' values are rejected.

🔴 apps/api/src/attachments/attachments.service.ts (HIGH Risk)

# Issue Risk Level
1 Trusting client MIME type and file extension HIGH
2 Blacklist approach can be bypassed (missing .html/.svg checks) HIGH
3 No file content validation or malware scanning HIGH
4 Files served inline (no Content-Disposition) can enable XSS HIGH
5 User-supplied metadata not fully sanitized HIGH
6 Generating signed URLs for all attachments may leak files HIGH

Recommendations:

  1. Use a whitelist (allow-list) of permitted file extensions and MIME types rather than a blacklist. Validate and canonicalize the extension server-side.
  2. Detect actual file type from file content (magic bytes) using a library (e.g., file-type) and map to a safe server-side Content-Type. Do not trust uploadDto.fileType as authoritative.
  3. Run malware/AV scanning (e.g., ClamAV, third-party service) on uploaded files before making them available or storing them permanently.
  4. Force downloads instead of inline rendering: set Content-Disposition: attachment (or include ResponseContentDisposition on presigned GETs) to prevent browsers rendering HTML/SVG/JS from S3. Prefer setting this server-side when generating signed URLs.
  5. Sanitize all metadata values stored in S3 (not just originalFileName). Remove control characters, enforce ASCII or a safe encoding, and validate IDs/strings at the API boundary.
  6. Prefer generating signed URLs on-demand rather than creating/wrapping them for every attachment list call. Limit expiry to the minimum necessary, and apply rate-limits/authorization checks to signed URL generation endpoints.
  7. Ensure S3 bucket policy and object ACLs keep objects private by default. Explicitly set ACLs and bucket policy to prevent accidental public access.
  8. Validate and enforce max file size (already present) but also enforce content-type/extension checks together with content inspection to reduce bypass risk.
  9. Log and alert on anomalous uploads (large volumes, unusual content types) and consider quarantining files until scanned/approved.

🟡 apps/api/src/config/load-env.ts (MEDIUM Risk)

# Issue Risk Level
1 Auto-loading .env from process.cwd() may load attacker-controlled file MEDIUM
2 config(..., override: true) lets .env overwrite existing env vars MEDIUM
3 Library-side auto-load can inject or leak secrets in host applications MEDIUM
4 existsSync then load introduces TOCTOU race to swap .env file MEDIUM
5 envLoaded set true even if no .env found may hide missing config MEDIUM

Recommendations:

  1. Do not auto-load .env implicitly on module import. Require explicit invocation (e.g., call ensureEnvLoaded() from application bootstrap) or expose an API to opt-in to loading.
  2. Remove override: true (use the default behavior) or make overriding opt-in via a flag. Avoid letting .env silently overwrite process.env values that may be set by the host environment.
  3. Avoid searching process.cwd() as a fallback. Restrict search to known, application-controlled paths (packaged config directory or a configured absolute path) to reduce the chance of attacker-controlled files being loaded.
  4. Avoid existsSync + later read: open/read the file atomically (fs.readFile or fs.open with appropriate flags) and handle errors. Alternatively, attempt to load and catch failures rather than checking existence first to reduce TOCTOU window.
  5. Verify file ownership/permissions before loading (e.g., compare fs.stat uid/gid to process owner where applicable) and reject files that are world-writable or not owned by the expected user. On platforms without uid/gid, consider stricter path restrictions.
  6. Only mark envLoaded = true after a successful load/parse. If dotenv.parse/config fails or no file was found, keep envLoaded false so callers can detect/load again or fail fast.
  7. If this code is distributed as a library, remove global side effects entirely (no automatic env loading on import) to avoid surprising host applications and leaking/injecting secrets.

🟡 apps/api/src/knowledge-base/dto/delete-document.dto.ts (MEDIUM Risk)

# Issue Risk Level
1 Missing non-empty validation for organizationId and documentId MEDIUM
2 No format validation (e.g., UUID) for IDs; allows malformed input MEDIUM

Recommendations:

  1. Add @isnotempty() to organizationId and documentId to prevent empty strings.
  2. Enforce expected ID format: use @IsUUID() for UUIDs or @matches()/@IsAlphanumeric() with @Length() for other ID schemes.
  3. Apply DTO validation at the controller or globally (e.g., NestJS ValidationPipe with whitelist and forbidNonWhitelisted enabled) so invalid requests are rejected before business logic/DB access.
  4. Sanitize/whitelist inputs prior to using them in database queries; always use parameterized queries/ORM query builders to prevent injection even if format validation is present.
  5. Restrict maximum length (e.g., @maxlength()) to prevent oversized payloads that could lead to DoS or unexpected behavior.

🟡 apps/api/src/knowledge-base/dto/process-documents.dto.ts (MEDIUM Risk)

# Issue Risk Level
1 organizationId lacks format or max-length validation MEDIUM
2 documentIds accept any non-empty string; no item format or max-length MEDIUM
3 No max array size for documentIds (DoS via huge payload) MEDIUM

Recommendations:

  1. Add explicit format and length constraints for organizationId, e.g. @IsUUID() if GUIDs are expected and/or @maxlength(64)
  2. Validate documentIds items more strictly: use @IsUUID({ each: true }) or @matches(/regex/, { each: true }) and add @maxlength(length, { each: true }) to limit item size
  3. Add an upper bound on array size with @ArrayMaxSize(n) (choose n according to application limits, e.g. 100 or 1000) to mitigate DoS via huge payloads
  4. Reject whitespace-only strings and trim inputs (use class-transformer @Transform to trim and @isnotempty()/@minlength(1) to ensure non-empty values)
  5. When these DTO values are used in DB/command contexts, always use parameterized queries/ORM binding and avoid string interpolation; perform server-side sanitization as a defense-in-depth measure

🟡 apps/api/src/knowledge-base/dto/upload-document.dto.ts (MEDIUM Risk)

# Issue Risk Level
1 fileData has only IsString; no validation of base64 content or size MEDIUM
2 fileName not sanitized — may allow path traversal or filename injection MEDIUM
3 fileType only checked as string; no allowlist or MIME validation MEDIUM
4 organizationId only string; no format validation or auth checks MEDIUM
5 No file upload size limits or streaming protections; OOM/DoS risk MEDIUM

Recommendations:

  1. Validate fileData: enforce it is valid base64 (or better, accept binary/multipart streams). Add a size limit both on the DTO (e.g., max length) and at the request level. Prefer streaming/multipart upload endpoints to avoid loading large base64 payloads into memory.
  2. Sanitize and restrict fileName: strip path separators, disallow control characters, enforce a safe filename pattern or generate server-side filenames/IDs. Never use the raw filename for filesystem paths; join with a safe directory and use path normalization checks.
  3. Validate fileType: use an allowlist (enum) of permitted MIME types and verify actual content matches the claimed MIME type (content sniffing) before processing or storing.
  4. Validate organizationId: enforce a strict format (e.g., UUID via @IsUUID or regex) and ensure authorization checks (guards/middleware) verify the requesting user is allowed to upload for that org.
  5. Protect against DoS/OOM: enforce request body size limits at server/proxy, stream uploads, use timeouts, rate limiting, and scan uploaded files for malware. Reject base64 uploads above a sane byte limit and consider returning an error advising streaming upload instead.
  6. Add DTO-level validators (class-validator custom decorators or @Matches/@maxlength) or implement validation middleware to enforce the above checks before business logic runs.
  7. Store files safely: keep files outside the web root, use generated storage keys, and validate any path/names before accessing storage. Log and monitor upload failures and size/volume anomalies.

🟡 apps/api/src/knowledge-base/knowledge-base.controller.ts (MEDIUM Risk)

# Issue Risk Level
1 No authentication or authorization guards on controller endpoints MEDIUM
2 Missing input validation/sanitization for @Param/@Query/@Body across endpoints MEDIUM
3 IDOR risk: documentId/runId/manualAnswerId used directly without authorization MEDIUM
4 Signed URL endpoints may expose documents if access control is absent MEDIUM
5 createRunToken returns tokens in responses, risk of token leakage MEDIUM
6 Destructive endpoints (delete/process) lack checks allowing unauthorized abuse MEDIUM
7 Unsanitized inputs could lead to SQL/command injection in downstream services MEDIUM

Recommendations:

  1. Add authentication and per-resource authorization guards
  2. Apply validation pipes and DTO validation for all inputs
  3. Restrict signed URL/token issuance to authorized users and short TTLs
  4. Avoid returning raw secrets; rotate tokens and audit issuance
  5. Sanitize inputs and use parameterized queries in service layer

🟡 apps/api/src/knowledge-base/knowledge-base.service.ts (MEDIUM Risk)

# Issue Risk Level
1 No validation of DTO inputs before using in DB queries MEDIUM
2 Uploaded files not validated for type/size/content MEDIUM
3 Public run tokens created without additional authorization checks MEDIUM
4 Signed URLs and public tokens can expose run/file access MEDIUM
5 Logging manual-answer IDs may leak sensitive identifiers MEDIUM
6 No rate/size limits on processing or uploads (abuse risk) MEDIUM

Recommendations:

  1. Validate and sanitize all DTO inputs at the boundary (controllers) using a robust validation library (e.g., class-validator with NestJS ValidationPipe or a schema validator like Zod). Ensure DTOs are enforced and reject malformed or unexpected values before they reach services/DB calls.
  2. Enforce authorization checks before creating any public tokens. Verify the caller has permissions to request a read token for the specified run, and ensure token creation is audited. Consider adding an additional server-side guard that ties runs to organization/user context.
  3. Limit scope and lifetime of public tokens to the minimum necessary. Consider much shorter expiration (e.g., minutes) for public tokens, bind tokens to specific IPs or origins if possible, and log/audit token issuance and usage.
  4. Harden signed URL issuance: ensure signed URLs are short-lived and consider additional checks such as validating requestor identity before returning a signed URL. Continue to use safe S3 response headers (ResponseContentDisposition/Type) as implemented.
  5. Improve file validation beyond size: enforce a whitelist of allowed MIME types, verify MIME type by inspecting file headers (magic bytes) rather than relying solely on client-provided type, and add AV/malware scanning for uploaded files. The code includes a MAX_FILE_SIZE_BYTES check and filename sanitization (confirmed in apps/api/src/knowledge-base/utils/s3-operations.ts and constants.ts), but file type/content validation and scanning are missing.
  6. Avoid logging raw resource identifiers. Redact or hash IDs in logs (or only log counts) when identifiers are sensitive. For debug use, provide an opt-in trace-level logging that is protected and not enabled in production.
  7. Add rate limiting and quotas on upload and processing endpoints (per-user and per-organization). Throttle task orchestration requests and limit the number/size of documents that can be processed concurrently to mitigate abuse and DoS.
  8. Ensure deletion flows are secure: when creating public tokens for deletion runs, apply the same authorization checks and consider additional safeguards (confirmation steps, admin-only operations) to prevent unauthorized deletion triggers.
  9. Audit and monitor: implement logging/alerting around signed URL creation, public token issuance, and deletion operations. Retain logs in a secure, access-controlled store and monitor anomalous usage.

🟡 apps/api/src/knowledge-base/utils/constants.ts (MEDIUM Risk)

# Issue Risk Level
1 Allows text/html as viewable MIME — risk of XSS when serving user files MEDIUM
2 Allows text/markdown and text/plain inline — may enable XSS or content injection MEDIUM
3 No content-type sniffing protection or validation before inline display MEDIUM
4 S3 key inputs (orgId,fileId,fileName) not validated — can include slashes/control chars MEDIUM
5 sanitizeFileName preserves leading dots and long names — hidden files & path issues MEDIUM
6 generateS3Key doesn't enforce S3 key byte-length limit; can exceed 1024 bytes MEDIUM
7 sanitizeMetadataFileName converts non-ASCII to '?' then '_' risking filename collisions MEDIUM

Recommendations:

  1. Remove 'text/html' from VIEWABLE_MIME_TYPES or never serve HTML user uploads inline. If HTML must be supported, sanitize HTML with a robust HTML sanitizer (e.g., DOMPurify server-side or an allowlist sanitizer) before serving and use Content-Security-Policy to limit script execution.
  2. Treat text/markdown and any user-supplied text that may be rendered as HTML with care: either render them to sanitized HTML on the server (with a safe markdown-to-HTML pipeline + sanitizer) or force downloads / serve as text/plain with Content-Disposition: attachment to avoid browser rendering.
  3. When serving files inline, ensure response headers include X-Content-Type-Options: nosniff and set an explicit Content-Type from a validated whitelist; do not rely solely on the uploaded MIME type. Validate the content-type against allowed list and inspect file contents if needed.
  4. Whitelist and validate organizationId and fileId (e.g., /^[A-Za-z0-9_-]{1,64}$/), strip or reject path separators and control characters. Never embed raw user-controlled strings into S3 keys without validation/normalization.
  5. Harden sanitizeFileName: normalize Unicode (NFC), remove/control leading dots (reject or prepend a safe prefix), enforce max length in bytes/characters, collapse repeated separators, and escape or remove any remaining unsafe characters. Consider hashing long filenames into a fixed-length safe token appended to a short sanitized basename.
  6. Enforce S3 key byte-length limits: compute Buffer.byteLength(fullKey, 'utf8') and if >1024, deterministically shorten components (e.g., hash fileId or filename) so the resulting key is ≤1024 bytes while preserving uniqueness.
  7. Avoid converting to ASCII via Buffer.toString('ascii') which maps non-ASCII to '?'. Use a safe transliteration library (e.g., unidecode) or percent-encode/URL-safe base64 the filename for metadata keys, or maintain the original UTF-8 but ensure metadata storage supports it. If you must replace characters, choose a replacement strategy that avoids high collision risk and ensure final length limits.
  8. Make SIGNED_URL_EXPIRATION_SECONDS configurable per use case and consider shorter default lifetimes for public or sensitive files; rotate credentials and audit signed URL generation usage.

🔴 apps/api/src/knowledge-base/utils/s3-operations.ts (HIGH Risk)

# Issue Risk Level
1 No authorization checks for download/view/delete operations HIGH
2 Client-supplied fileType used as ContentType/ResponseContentType HIGH
3 Unvalidated fileName used in Content-Disposition header HIGH
4 Silent error handling in deleteFromS3 hides failures HIGH
5 No malware/content scanning of uploaded files HIGH

Recommendations:

  1. Enforce authorization before generating signed URLs or deleting objects. Either perform org/user checks inside these functions (pass a user/org context) or ensure the calling layer enforces and validates that the caller is allowed to access the given s3Key/organizationId. Consider mapping s3Key -> organizationId and verifying the requester matches.
  2. Do not trust client-supplied MIME types. Maintain a whitelist of allowed content-types and map/override the value based on file extension or, better, inspect file magic bytes (content sniffing) on upload. At minimum, fall back to 'application/octet-stream' for unknown/unsupported types.
  3. Sanitize filenames used in Content-Disposition or omit supplying a filename. Currently encodeURIComponent(fileName) is used, but explicitly sanitize/remove problematic characters, truncate length, and avoid embedding untrusted input directly in headers. Alternatively return a server-controlled safe filename or content-disposition without filename to avoid header/content-injection vectors.
  4. Do not swallow errors silently in deleteFromS3. Log failures with context (s3Key, org) and return an error to the caller or throw a controlled exception so callers can decide how to respond. At minimum log the caught exception before returning false.
  5. Scan uploads for malware and validate file content. Implement antivirus scanning (AV bridge, Lambda/scan service) and validate file type via magic bytes before storing in S3. Also enforce file size limits (already present), shorten signed URL expirations where feasible, and ensure S3 objects are stored private with least privilege.
  6. Additional hardening: restrict bucket policies and IAM roles to least privilege, enable server-side encryption, set appropriate object ACLs, and consider putting signed URL generation behind an authorization check that verifies the requester’s permissions to the specific object.

🟡 apps/api/src/main.ts (MEDIUM Risk)

# Issue Risk Level
1 CORS allows any origin with credentials (origin: true, credentials: true) MEDIUM
2 Public Swagger UI at /api/docs exposes API surface MEDIUM
3 Swagger persistAuthorization can store tokens in browser MEDIUM
4 OpenAPI written to packages/docs/openapi.json in non-prod (possible leakage) MEDIUM
5 OpenAPI includes apiKey scheme enabling interactive auth via docs MEDIUM
6 enableImplicitConversion may coerce types unexpectedly MEDIUM

Recommendations:

  1. Restrict CORS origin to a whitelist of trusted domains. If cross-site cookies are not required, set credentials: false. If credentials must be true, ensure origin is not '*' and validate origins strictly (e.g., using a runtime whitelist).
  2. Protect Swagger UI: serve it only in non-production environments or require authentication/IP allowlist when in production. Gate access with middleware that checks an admin token or environment flag.
  3. Disable persistAuthorization or document and warn users. Prefer not to persist tokens in the browser; require re-authentication in the UI or clear persisted tokens on logout.
  4. Avoid writing OpenAPI output into repository paths or ensure the file/directory is gitignored and access controlled. If generated artifacts must be stored, ensure CI/storage has strict access controls and sensitive endpoints/definitions are omitted from public docs.
  5. Consider not exposing an interactive apiKey scheme in public docs. If interactive auth is needed, require docs UI authentication or remove apiKey from public Swagger in production builds.
  6. Disable enableImplicitConversion unless you deliberately rely on silent conversions. Prefer explicit DTO types and strict validation rules; add unit/integration tests to ensure numeric/boolean coercions behave as expected.

🟡 apps/api/src/policies/dto/ai-suggest-policy.dto.ts (MEDIUM Risk)

# Issue Risk Level
1 Missing nested validation for chatHistory items MEDIUM
2 chatHistory items lack role/content type checks MEDIUM
3 No length or size constraints on instructions or chatHistory MEDIUM

Recommendations:

  1. Create a Message DTO and validate nested items:
  • export class MessageDto {
    @isin(['user','assistant'])
    role: 'user' | 'assistant';

    @IsString()
    @maxlength(2000)
    content: string;
    }

  • In AISuggestPolicyRequestDto use @ValidateNested({ each: true }) and @type(() => MessageDto) on chatHistory and set chatHistory: MessageDto[];

  1. Add explicit length/size constraints:
  1. Enable and configure global ValidationPipe in main (or module):
  • app.useGlobalPipes(new ValidationPipe({ whitelist: true, forbidNonWhitelisted: true, transform: true }));
  1. Apply input size limits at the HTTP/body-parser level (e.g., express.json({ limit: '1mb' })) or API gateway to avoid large payload DoS.
  2. Sanitize or encode user-provided content before storing, rendering, or passing to other systems. Treat content as untrusted: escape HTML when rendering, use parameterized queries for DB access, avoid injecting into shell/commands, and never pass raw user content into eval()/exec().
  3. Consider additional protections: rate limiting, request validation at API gateway, logging/monitoring of unusual payload sizes or roles.

🟢 apps/api/src/questionnaire/dto/answer-single-question.dto.ts (LOW Risk)

# Issue Risk Level
1 No strict format validation for organizationId/questionnaireId (e.g., UUID) LOW
2 No bounds check ensuring questionIndex < totalQuestions LOW
3 IsInt may fail for string input without class-transformer type casting LOW

Recommendations:

  1. Add format validation for IDs: use @IsUUID() where the IDs are UUIDs, or @matches(/regex/) for other required formats. This prevents malformed IDs being accepted.
  2. Add a relational/bounds check that questionIndex < totalQuestions. Implement as a class-level custom validator (or runtime check) to ensure the index is within range before processing.
  3. Use class-transformer to ensure numeric inputs are converted: add @type(() => Number) to numeric fields and enable transform in your validation pipeline (e.g., NestJS ValidationPipe with transform: true).
  4. Add further content rules where appropriate (string length limits with @Length, @Min/@max for numeric ranges) and sanitize inputs upstream if values are used in downstream operations.

🟡 apps/api/src/questionnaire/dto/export-by-id.dto.ts (MEDIUM Risk)

# Issue Risk Level
1 questionnaireId missing UUID/format/length validation MEDIUM
2 organizationId missing UUID/format/length validation MEDIUM

Recommendations:

  1. Add explicit ID validation: use @IsUUID() for questionnaireId and organizationId (or @matches() with a validated UUID regex) to enforce format.
  2. Optionally add MinLength/MaxLength if IDs follow a non-UUID scheme.
  3. Sanitize and parameterize IDs before using them in DB queries or command invocations; never concatenate raw values into queries/commands.
  4. Continue to enforce runtime DTO validation (the project already enables a global NestJS ValidationPipe in apps/api/src/main.ts with whitelist/forbidNonWhitelisted/transform — verify these settings are retained in all deployments).
  5. Reject/whitelist inputs early and enforce strict typing at controller/service boundaries.

🟡 apps/api/src/questionnaire/dto/export-questionnaire.dto.ts (MEDIUM Risk)

# Issue Risk Level
1 organizationId limited to IsString (no format/length/UUID validation) MEDIUM

Recommendations:

  1. Enforce a strict format for organizationId using class-validator (e.g., @IsUUID() or @matches(/regex/) if it's not a UUID).
  2. Add length/charset limits (e.g., @Length(min, max) or @matches to restrict allowed characters).
  3. Apply global validation (e.g., NestJS ValidationPipe with whitelist and forbidNonWhitelisted) so DTOs are enforced at the boundary.
  4. Sanitize/normalize DTO values before use (trim, canonicalize) and validate again where they are consumed if needed.
  5. Ensure all uses of organizationId in DB queries use parameterized queries/ORM bindings (no string concatenation).
  6. Never interpolate DTO values into shell/exec calls; if external commands are required, validate thoroughly and use safe APIs.

💡 Recommendations

View 3 recommendation(s)
  1. Remediate CVEs: upgrade xlsx@0.18.5 to a patched release that addresses GHSA-4r6h-8v6p-xvw6 and GHSA-5pgg-2g8v-p4x9; update ai to >= 5.0.52 to address GHSA-rwvc-j5jr-mgvh. Verify via OSV/advisory metadata that the chosen versions include the fixes before merging.
  2. Remove/sanitize hardcoded credentials: delete or replace the example plaintext DATABASE_URL ("postgres:pass") in .env.example and the plaintext DATABASE_URL in SELF_HOSTING.md with non-credential placeholders (e.g., postgresql://:@host:port/db). Search the repo for any other committed literal credentials and remove them from tracked files.
  3. Fix injection points in code: in apps/api/customPrismaExtension.ts do not interpolate manifest.runtime into a shell command—whitelist acceptable runtimes and invoke binaries via execFile/spawn (no shell). In apps/api/src/knowledge-base/utils/s3-operations.ts and related code, never insert unvalidated client-supplied filenames into Content-Disposition or headers—sanitize/normalize or generate server-side safe filenames and percent-encode/validate header values; also validate/whitelist MIME types before using them in responses.

Powered by Comp AI - AI that handles compliance for you. Reviewed Dec 4, 2025

* chore(package): lock packageManager to bun@1.3.3

* refactor(policy): update policy details and AI assistant components for improved functionality

* feat(docs): add ai-policy-editor page to documentation

* refactor(policy): enhance layout and styling of policy details and AI assistant components

* refactor(ui): update conversation component styles for consistency
@Marfuen Marfuen merged commit 99ab84a into release Dec 4, 2025
12 of 13 checks passed
@claudfuen
Copy link
Contributor

🎉 This PR is included in version 1.67.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants