Comprehensive review, hardening, and a platform-wide design-system rework by enyineer · Pull Request #354 · enyineer/checkstack

enyineer · 2026-06-20T09:19:23Z

A large, verified body of work spanning six areas - correctness, security, testing, UX, docs, and a premium, consistent UI design language - plus a low-noise retune of anomaly-detection defaults.

Design system (premium UI rework)

New @checkstack/ui foundation: surface elevation tokens, an aurora gradient signature, a colorblind-safe status triad, a comfortable/compact density model (provider + user toggle), polished skeleton/empty/error states, and honest token-driven chart primitives (time series, sparkline, radial gauge, request waterfall, uptime ribbon).
A signature aurora page-header (icon-stroke gradient) + deeper cards, an elevated app shell, and reskinned dashboard / health-check / SLO views.
Every plugin frontend adopts the tokens; the highest-impact surfaces are then redesigned to a premium bar (depth, number-led hierarchy, multi-encoded status). Pure tone/format logic extracted into unit-tested modules.
The system-health dashboard widget was reworked (actionable headline + proportional composition bar + a legend that doubles as filters, with an empty state when a filter has no matches).
All alerts unified onto one premium Alert.

Warning

Breaking change: the duplicate InfoBanner component (and its sub-components) is removed - use Alert, a drop-in replacement with the same variants and composable parts.

Anomaly-detection defaults (low-noise problem detection)

Reviewed all 264 metrics across every health-check strategy + the hardware collector: 94 noisy/un-baselineable ones default-disabled (raw counts/identifiers, config echoes, payload sizes, deterministic values like certificate days-remaining that a static health threshold already governs), 80 kept-and-hardened with confirmation windows + practical-significance floors, the rest already correct. Disabled metrics stay chartable and opt-in. No engine/schema changes.

Security hardening

At-rest encryption with key rotation + fail-loud decryption, brute-force/token-timing fixes, HTTP-collector SSRF guard, fail-closed plugin supply-chain integrity pinning, SQL plugin-schema identifier hardening, notification email HTML sanitization, per-assignment satellite result authorization, and a first-run onboarding TOCTOU guard.

Testing

A real end-to-end suite (Playwright + Testcontainers Postgres) covering the authenticated app, made a blocking CI job, plus extracted pure-logic unit tests throughout.

UX & accessibility

Form quality across editors, mobile responsiveness and touch targets, accessible overlays/forms, list/loading/empty/error state consistency, onboarding + point-of-use coaching, wider command-palette coverage, and sidebar IA.

Refactors & docs

Typed router-factory args + structured logging, typed Drizzle JSON columns, shared formatting helpers, removal of boundary casts, and same-PR docs/AI updates.

Notes

Changesets included; platform is in BETA so all bumps are minor/patch (no major). Design-system and anomaly changesets are consolidated.
Verified throughout: typecheck, lint, unit tests, and the full e2e suite green at each stage.

🤖 Generated with Claude Code

https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

changeset-bot · 2026-06-20T09:19:27Z

🦋 Changeset detected

Latest commit: a02d2bd

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 98 packages

Name	Type
@checkstack/status-page-frontend	Minor
@checkstack/auth-frontend	Minor
@checkstack/ai-backend	Minor
@checkstack/ai-common	Minor
@checkstack/healthcheck-http-backend	Minor
@checkstack/healthcheck-dns-backend	Patch
@checkstack/healthcheck-grpc-backend	Minor
@checkstack/healthcheck-ping-backend	Patch
@checkstack/healthcheck-tcp-backend	Patch
@checkstack/healthcheck-tls-backend	Patch
@checkstack/healthcheck-redis-backend	Patch
@checkstack/healthcheck-postgres-backend	Patch
@checkstack/healthcheck-mysql-backend	Patch
@checkstack/healthcheck-ssh-backend	Patch
@checkstack/healthcheck-script-backend	Minor
@checkstack/healthcheck-jenkins-backend	Minor
@checkstack/healthcheck-rcon-backend	Patch
@checkstack/collector-hardware-backend	Patch
@checkstack/api-docs-frontend	Minor
@checkstack/auth-backend	Minor
@checkstack/automation-backend	Patch
@checkstack/dependency-backend	Patch
@checkstack/status-page-backend	Patch
@checkstack/satellite-backend	Patch
@checkstack/gitops-backend	Patch
@checkstack/secrets-backend	Patch
@checkstack/notification-backend	Patch
@checkstack/script-packages-backend	Patch
@checkstack/ui	Minor
@checkstack/pluginmanager-frontend	Patch
@checkstack/command-frontend	Patch
@checkstack/backend-api	Minor
@checkstack/dependency-frontend	Patch
@checkstack/about-frontend	Patch
@checkstack/ai-frontend	Patch
@checkstack/announcement-frontend	Patch
@checkstack/anomaly-frontend	Patch
@checkstack/automation-frontend	Minor
@checkstack/cache-frontend	Patch
@checkstack/catalog-frontend	Patch
@checkstack/dashboard-frontend	Minor
@checkstack/frontend	Minor
@checkstack/gitops-frontend	Patch
@checkstack/healthcheck-frontend	Minor
@checkstack/incident-frontend	Minor
@checkstack/infrastructure-frontend	Patch
@checkstack/integration-frontend	Patch
@checkstack/maintenance-frontend	Minor
@checkstack/notification-frontend	Patch
@checkstack/queue-frontend	Patch
@checkstack/satellite-frontend	Patch
@checkstack/script-packages-frontend	Patch
@checkstack/secrets-frontend	Patch
@checkstack/slo-frontend	Minor
@checkstack/theme-frontend	Minor
@checkstack/tips-frontend	Patch
@checkstack/backend	Minor
@checkstack/healthcheck-backend	Minor
@checkstack/anomaly-backend	Patch
@checkstack/catalog-backend	Patch
@checkstack/incident-backend	Patch
@checkstack/maintenance-backend	Patch
@checkstack/slo-backend	Patch
@checkstack/announcement-backend	Patch
@checkstack/theme-backend	Patch
@checkstack/tips-backend	Patch
@checkstack/auth-credential-backend	Patch
@checkstack/auth-github-backend	Patch
@checkstack/auth-ldap-backend	Patch
@checkstack/auth-saml-backend	Patch
@checkstack/integration-jira-backend	Patch
@checkstack/integration-script-backend	Patch
@checkstack/integration-teams-backend	Patch
@checkstack/integration-webex-backend	Patch
@checkstack/integration-webhook-backend	Patch
@checkstack/integration-backend	Patch
@checkstack/secrets-backend-local	Patch
@checkstack/secrets-backend-vault	Patch
@checkstack/notification-backstage-backend	Patch
@checkstack/notification-discord-backend	Patch
@checkstack/notification-gotify-backend	Patch
@checkstack/notification-pushover-backend	Patch
@checkstack/notification-slack-backend	Patch
@checkstack/notification-smtp-backend	Patch
@checkstack/notification-teams-backend	Patch
@checkstack/notification-telegram-backend	Patch
@checkstack/notification-webex-backend	Patch
@checkstack/satellite	Patch
@checkstack/script-packages-store-postgres	Patch
@checkstack/script-packages-store-s3	Patch
@checkstack/cache-backend	Patch
@checkstack/command-backend	Patch
@checkstack/queue-backend	Patch
@checkstack/signal-backend	Patch
@checkstack/test-utils-backend	Patch
@checkstack/cache-memory-backend	Patch
@checkstack/queue-bullmq-backend	Patch
@checkstack/queue-memory-backend	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

github-actions · 2026-06-20T09:19:48Z

📦 Changeset Coverage Incomplete

The following packages have code changes but are not included in any changeset:

@checkstack/test-utils-backend
@checkstack/queue-memory-backend

⚠️ Please add a changeset entry for each of the listed packages before merging.

…tem rework A large, verified body of work bringing Checkstack up across six areas: correctness, security, testing, UX, docs, and AI - capped by a premium, consistent UI design language. Design system (premium UI rework) - New `@checkstack/ui` foundation: surface elevation tokens, aurora gradient, colorblind-safe status triad, density model (comfortable/compact) + provider and user toggle, polished skeleton/empty/error states, and honest token-driven chart primitives (time series, sparkline, radial gauge, request waterfall, uptime ribbon). - A signature aurora page-header + deeper cards, an elevated app shell, and reskinned dashboard / health-check / SLO views. - Every plugin frontend adopts the tokens, then its highest-impact surfaces are redesigned to a premium bar (depth, number-led hierarchy, multi-encoded status). Pure tone/format logic extracted into unit-tested modules. - All alerts unified onto one premium `Alert`; the duplicate `InfoBanner` is removed (BREAKING: use `Alert`). Security hardening - At-rest encryption with key rotation + fail-loud decryption, brute-force / token-timing fixes, HTTP-collector SSRF guard, fail-closed plugin supply-chain integrity pinning, SQL plugin-schema identifier hardening, notification email HTML sanitization, per-assignment satellite result authorization, and a first-run onboarding TOCTOU guard. Testing - A real end-to-end suite (Playwright + Testcontainers Postgres) covering the authenticated app, made a blocking CI job, plus extracted pure-logic unit tests throughout. UX & accessibility - All review-surfaced UX improvements: form quality across editors, mobile responsiveness and touch targets, accessible overlays/forms, list/loading/ empty/error state consistency, onboarding + point-of-use coaching, wider command-palette coverage, and sidebar IA. Refactors & docs - Typed router-factory args + structured logging, typed Drizzle JSON columns, shared formatting helpers, removal of boundary casts, and same-PR docs/AI updates. Reliability - Retune anomaly-detection defaults across every health-check strategy and the hardware collector for a low-noise posture: noisy or un-baselineable metrics (raw counts, config echoes, payload sizes, deterministic values) default to off, while latency, availability, and saturation-percent signals are kept and hardened with confirmation windows and practical-significance floors. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

…ssertable results A collector must fail only when the transport could not complete (DNS/connect/ TLS failure, timeout, aborted, unspawnable process). A successfully-received result that is simply "not what you hoped" - an HTTP 4xx/5xx, a gRPC NOT_SERVING, offline Jenkins nodes, a non-zero script exit - is an ASSERTABLE METRIC, not a collector failure; the user's assertions (or the no-assertion default) decide health. Fixes the HTTP collector hard-failing on 404 (now a successful collection with statusCode exposed and assertable), plus gRPC, Jenkins node-health, and the script execute collector. Audited every other strategy: they already failed only on genuine transport failures. Adds regression tests, docs, a new project rule (.claude/rules/healthcheck-collectors.md), and a changeset (BREAKING: affected checks now need an explicit assertion to fail on a non-OK result). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…unchanged) Records an optional structured metadata.timings per run (DNS, connect, TLS, wait/TTFB, transfer, and a processing catch-all for non-HTTP operation time); the run-detail view renders the present phases in transport order and falls back to the old Connection+Processing split for older runs. HTTP: the request is byte-for-byte the same fetch path (IP-pinned, original Host + SNI) - request behavior is unchanged. Timing is measured around it: fetch resolves at the response headers so wait (TTFB) and transfer (body) are exact on the request, DNS is timed at the resolve step, and connect/TLS come from a short-lived best-effort raw net/tls probe to the same validated IP (Bun's fetch socket emits no connect/handshake events; raw sockets do). The probe is timing-only and never fails the check. Other transports surface the connect and operation times they already measure. Also fixes the run-detail "slowest" badge colliding with the bar, and shows a genuinely sub-millisecond phase as "<1 ms". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…fixes HTTP/2 404s) The SSRF hardening added in this branch pinned the request to the resolved IP: it rewrote the URL host to the IP literal and moved the hostname into the `Host` header. That breaks HTTP/2 origins, whose authority comes from the URL's `:authority` pseudo-header rather than `Host`, so real hosts such as google.com started answering 404/429 instead of 200. Keep the SSRF guard as a pre-flight validation (still rejects cloud-metadata / link-local and operator-denied ranges, and direct denied IP literals) but drop the pin and `fetch` the original URL verbatim, byte-identical to a plain fetch. The resolved IP is reused only for the best-effort timing probe. The only thing lost is DNS-rebind TOCTOU protection - a narrow window whose price was breaking every legitimate HTTP/2 request. Verified: example.com and cloudflare.com (both HTTP/2) return 200 with the full timing breakdown intact; SSRF guard tests still pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

The e2e suite hung indefinitely at "stopping ephemeral Postgres...". Bisected locally: container.stop({ timeout: 0 }) resolves in ~180ms, but the process never exits. Testcontainers' Ryuk reaper keeps a persistent socket open to its sidecar for the process lifetime and relies on socket.unref() so it does not block exit - the Bun runtime does not honor that unref, so the socket keeps the event loop alive forever after the suite finishes. In CI the step pipes through `tee`, which only ends when our stdout closes, hence the indefinite hang. Disable Ryuk in the harness: the wrapper already stops and removes the container deterministically in `finally` on every exit path (verified the container is gone after stop() with Ryuk off), and CI runners are ephemeral, so the reaper is unnecessary. The process now exits naturally - no force-exit workaround. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

The assistant had healthcheck.status (every check globally) but no way to map a check to a system, so it had to GUESS which check monitored a given system (e.g. "google.com expecting 201" vs "expecting 200"). Project the existing getSystemConfigurations query as the read-only AI tool healthcheck.listSystemChecks: given a systemId (resolved from a name via catalog.listSystems), it returns the checks assigned to that system - id, name, strategy, interval, collectors/ assertions, and paused state. The tool inherits the source procedure's system-scoped configuration.read gate (parentScope on catalog.system), so it stays team-scoped and needs no new permission. Adds a projection test mirroring the sibling tools, documents the system-scoped read pattern in registering-tools.md, and regenerates the docs index. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

Closing a downtime window depended entirely on catching the system's transient health-recovery edge (onEntityChanged), which is emitted only by a check RUN. Fixing, pausing, deleting, or unassigning the offending check just invalidates the read cache and emits no edge, and even a plain edit can lose the single recovery delivery - leaving the open window orphaned until the once-daily reconcile. Result: the SLO read 100% availability (live health is authoritative for the budget) while "Recent Downtime Events" still showed a 25-day "ongoing" window. The two views disagreed. The user-facing SLO reads now reconcile against live health before reporting: getDowntimeEvents and the status reads void an orphaned open window when the system is currently healthy, reusing the same voidOrphanedDowntime the daily job runs. The dashboard self-heals the moment it is viewed instead of waiting for midnight. The reactive entity read / computeStatus stays side-effect-free; the reconcile is a cheap no-op when there are no open events. The void primitive is already unit-tested; the router change is thin glue over it (no router harness exists in this package). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

The chat loop replays earlier tool results verbatim with no age annotation, and the system prompt injected "current time" but never how long the thread had been idle. So resuming an old conversation, the model answered from stale captured data (a check's old name, a "failing" status) instead of current state. streamTurn now measures the idle gap before the message (the conversation's last-activity timestamp, captured before the new user message bumps updatedAt) and, once it exceeds 10 minutes, folds a "Data freshness" directive into the system prompt telling the model to re-call the relevant read tools for current state rather than trust earlier-turn results. The directive sits at the volatile end of the prompt (next to the time line) so the cache-friendly stable prefix is unaffected; an active back-and-forth never sees it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

The reconcile-on-read fix DELETED orphaned downtime windows (when the system is healthy but the recovery edge was missed). For a genuine multi-day outage whose recovery edge was lost, that erased real downtime - so the SLO read a false 100% availability with full error budget, even though the system had been down for ~25 days. Reconcile now PRESERVES the downtime: it resolves the system's actual recovery time (the first healthy run on/after the window opened, via healthcheck getHistory) and CLOSES the window at that instant, so the real outage is counted against availability and the error budget. It only DELETES as a fallback when the recovery time can't be determined (e.g. run history pruned), where the unprovable downtime must not be counted. - closeDowntimeEvent gains an optional explicit endTime (clamped >= startTime). - SloEngine gains a recovery-time resolver, wired in afterPluginsReady from the healthcheck run history; voidOrphanedDowntime -> reconcileOrphanedDowntime. - Forward-only: already-written daily snapshots are not retroactively corrected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

A chat Skill (e.g. "write like a redneck") held during tool-calling steps but normalized back to professional tone in the synthesized reply. The multi-step loop's forced final-answer step replaces the whole system prompt with a tool-less "answer now, be concise" instruction, dropping the skill preamble on the exact step that writes the user-visible answer. prepareFinalAnswerStep now accepts persistent guidance (the skill preamble) and appends it after the base final-answer instruction, so the skill's voice governs the synthesized reply too. The headless runner passes none, so it is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

Asked "how do I add a system to the catalog?", the assistant answered with the internal tool name (catalog.createSystem) and its input JSON schema - but the operator cannot call tools and never sees them; that is the assistant's own mechanism, not a workflow. The chat system prompt now states tools are the assistant's own (not a public API), and a how-to must be answered in product terms (the UI, grounded in docs) and/or by offering to do it for the operator - never by presenting tool names, tool input JSON, or parameter schemas as steps to follow. Chat-only; the headless runner is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

CodeQL flagged the connect-probe's `rejectUnauthorized: false` ("disabling certificate validation is strongly discouraged"). The probe is timing-only, but disabling validation is unnecessary: it dials the validated IP with the original hostname as SNI, so a valid cert verifies against `servername`, and the real `fetch` already validates strictly (a bad cert fails the check regardless). Drop the override; if the handshake can't complete (invalid/self-signed cert) the existing error handler resolves with just the TCP `connectMs` and no `tlsMs` - timing stays best-effort, never fatal. Verified: valid hosts still report tlsMs; a cert/servername mismatch degrades to connectMs only with no crash. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

github-actions · 2026-06-20T14:14:52Z

❌ PR Checks Failed

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	✅ Passed
Security	✅ Passed
E2E	❌ Failed

❌ E2E Failures

... (truncated 494 lines)
�[1A�[2K[1/5] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/5] [chromium] › tests/queue.spec.ts:33:3 › Queue admin area › opens the Queue runtime panel from the infrastructure shell
�[1A�[2K[3/5] [chromium] › tests/queue.spec.ts:65:3 › Queue admin area › the runtime panel shows the instance-scope banner and count tiles
�[1A�[2K[4/5] [chromium] › tests/queue.spec.ts:93:3 › Queue admin area › the job-state sub-tabs render their listings
�[1A�[2K[5/5] [chromium] › tests/queue.spec.ts:131:3 › Queue admin area › the Queue tab also exposes the configuration sub-section
�[1A�[2K  5 passed (16.2s)

========== satellite.spec.ts ==========

Running 4 tests using 1 worker

�[1A�[2K[1/4] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/4] [chromium] › tests/satellite.spec.ts:14:3 › satellites › renders the page chrome with title, subtitle and create action
�[1A�[2K[3/4] [chromium] › tests/satellite.spec.ts:43:3 › satellites › shows the onboarding empty state when no satellites are registered
�[1A�[2K[4/4] [chromium] › tests/satellite.spec.ts:77:3 › satellites › the create affordance opens the registration dialog
�[1A�[2K  4 passed (17.3s)

========== script-packages.spec.ts ==========

Running 7 tests using 1 worker

�[1A�[2K[1/7] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/7] [chromium] › tests/script-packages.spec.ts:32:3 › Script Packages settings › renders with the install-state card and an empty allowlist
�[1A�[2K[3/7] [chromium] › tests/script-packages.spec.ts:68:3 › Script Packages settings › Add is disabled until a name and a valid pinned version are present
�[1A�[2K[4/7] [chromium] › tests/script-packages.spec.ts:99:3 › Script Packages settings › the Advanced section exposes the registry-URL config
�[1A�[2K[5/7] [chromium] › tests/script-packages.spec.ts:126:3 › Script Sandbox settings › renders the global policy editor with its key controls
�[1A�[2K[6/7] [chromium] › tests/script-packages.spec.ts:159:3 › Script Sandbox settings › switching network mode to allowlist reveals the destinations field
�[1A�[2K[7/7] [chromium] › tests/script-packages.spec.ts:180:3 › Script Sandbox settings › saving the policy surfaces the success confirmation
�[1A�[2K  7 passed (18.8s)

========== secrets.spec.ts ==========

Running 7 tests using 1 worker

�[1A�[2K[1/7] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/7] [chromium] › tests/secrets.spec.ts:44:3 › admin secrets › shows the empty state with no secrets
�[1A�[2K[3/7] [chromium] › tests/secrets.spec.ts:62:3 › admin secrets › disables the create button until name and value are provided
�[1A�[2K[4/7] [chromium] › tests/secrets.spec.ts:86:3 › admin secrets › creates a secret and lists it without ever exposing the value
�[1A�[2K[5/7] [chromium] › tests/secrets.spec.ts:113:3 › admin secrets › creating with an existing name rotates it rather than duplicating
�[1A�[2K[6/7] [chromium] › tests/secrets.spec.ts:141:3 › admin secrets › rotates a secret via the dialog without revealing the value
�[1A�[2K[7/7] [chromium] › tests/secrets.spec.ts:182:3 › admin secrets › deletes a secret only after confirming
�[1A�[2K  7 passed (20.2s)

========== slo.spec.ts ==========

Running 5 tests using 1 worker

�[1A�[2K[1/5] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/5] [chromium] › tests/slo.spec.ts:23:3 › SLOs › overview renders its empty state when no SLOs exist
�[1A�[2K[3/5] [chromium] › tests/slo.spec.ts:48:3 › SLOs › config page renders its empty state when no objectives exist
�[1A�[2K[4/5] [chromium] › tests/slo.spec.ts:65:3 › SLOs › create flow validates required system and target range
�[1A�[2K[5/5] [chromium] › tests/slo.spec.ts:165:3 › SLOs › overview lists the created SLO and links to its detail page
�[1A�[2K  5 passed (18.4s)

========== smoke.spec.ts ==========

Running 2 tests using 1 worker

�[1A�[2K[1/2] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/2] [chromium] › tests/smoke.spec.ts:11:3 › app shell › boots and renders the dashboard chrome
�[1A�[2K  2 passed (12.5s)

========== status-page.spec.ts ==========

Running 4 tests using 1 worker

�[1A�[2K[1/4] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/4] [chromium] › tests/status-page.spec.ts:31:3 › Status pages › list renders its empty state when no pages exist
�[1A�[2K[3/4] [chromium] › tests/status-page.spec.ts:43:3 › Status pages › an unpublished page is NOT served on the public route
�[1A�[2K[4/4] [chromium] › tests/status-page.spec.ts:68:3 › Status pages › operator builds, publishes, and the public page serves the content
�[1A�[2K  4 passed (19.4s)

========== theme.spec.ts ==========

Running 5 tests using 1 worker

�[1A�[2K[1/5] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/5] [chromium] › tests/theme.spec.ts:46:3 › theme / dark-mode switcher › the dark-mode switch lives in the user menu and reflects the applied theme
�[1A�[2K[3/5] [chromium] › tests/theme.spec.ts:68:3 › theme / dark-mode switcher › toggling to dark applies the `dark` class on <html>
�[1A�[2K[4/5] [chromium] › tests/theme.spec.ts:98:3 › theme / dark-mode switcher › toggling back to light reverts the `dark` class
�[1A�[2K[5/5] [chromium] › tests/theme.spec.ts:126:3 › theme / dark-mode switcher › the chosen theme persists across reload (localStorage + backend)
�[1A�[2K  5 passed (17.7s)

========== user-guide.spec.ts ==========

Running 5 tests using 1 worker

�[1A�[2K[1/5] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/5] [chromium] › tests/user-guide.spec.ts:10:3 › in-app user guide › serves the Starlight docs at /checkstack/user-guide/ (not a 404)
�[1A�[2K[3/5] [chromium] › tests/user-guide.spec.ts:24:3 › in-app user guide › a deep-linked docs page resolves in-app
�[1A�[2K[4/5] [chromium] › tests/user-guide.spec.ts:33:3 › in-app user guide › the sidebar Docs link targets the user guide
�[1A�[2K[5/5] [chromium] › tests/user-guide.spec.ts:45:3 › in-app user guide › an unknown /checkstack/ path returns the docs 404, not the SPA shell
�[1A�[2K  5 passed (13.0s)

================ summary ================
passed: 30/31
FAILED: catalog.spec.ts
[e2e] stopping ephemeral Postgres...
[e2e] teardown complete.

How to fix: These are the Playwright end-to-end tests. Reproduce locally with bun run --filter @checkstack/e2e test:e2e (it provisions an ephemeral Postgres via Testcontainers, so Docker must be running). Read the failing assertions and uploaded traces, then fix the implementation or the selectors so the flows pass. Do not weaken or skip the tests.

@enyineer The above code quality issues were found in this PR. Please fix them before merging.

github-actions · 2026-06-20T14:29:50Z

✅ All PR Checks Passed

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	✅ Passed
Security	✅ Passed
E2E	✅ Passed

@enyineer All quality checks have passed. This PR is ready for your review.

The catalog spec runs as a serial group with retries:2, but the e2e DB is reset only per file boot, NOT per retry, and a serial group retries from the top. So a flake in any later test (e.g. "edits a system name") re-ran the WHOLE group against an already-populated catalog: the global empty-state assertions then hard-failed ("No systems in the catalog yet" is gone), and creates would collide on the fixed name suffix. That turned a single transient into a red E2E job. Two changes make it retry-safe: - Split the two read-only empty-state tests into catalog-empty.spec.ts. run-all boots a fresh, migration-empty DB per file and this file creates nothing, so the empty state holds on every attempt (a retry re-asserts against the same empty DB; the mutating spec runs in a separate invocation and can't pollute it). - Key the mutating chain's created names to the retry attempt (`-r<n>`, via test.info().retry) so a group retry runs in its own namespace and never collides with the previous attempt's leftover rows. Drop the delete test's global "No systems yet" assertion (can't hold against retry leftovers; the empty-state file owns that check). Verified structurally: playwright --list discovers all tests in both files; lint and typecheck pass. Full behavioral verification is via the e2e CI run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

github-actions · 2026-06-20T16:23:09Z

✅ All PR Checks Passed

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	✅ Passed
Security	✅ Passed
E2E	✅ Passed

@enyineer All quality checks have passed. This PR is ready for your review.

The E2E job ran all ~32 spec files serially on one runner (~12 min). run-all.ts already supports round-robin sharding (CHECKSTACK_E2E_SHARD_INDEX/TOTAL + selectShard); this wires it into the workflow as a 3-way matrix. Each shard is an independent runner with its OWN ephemeral Postgres (Testcontainers), booting one backend at a time - the proven single-Postgres / one-backend-per-runner model is unchanged, so there's no new cross-test contention; we only split the FILE list across runners, cutting test wall-clock ~linearly (verified 32 specs split 11/11/10, each spec exactly once). - matrix shard [1,2,3], fail-fast:false; CHECKSTACK_E2E_SHARD_TOTAL uses ${{ strategy.job-total }} so the matrix size is the single source of truth. - Per-shard artifact names (e2e-output-<n>, e2e-traces-<n>): v4+ artifacts reject duplicate names across parallel legs. - report job: download e2e-output-* with merge-multiple; readOutput now concatenates all .txt in a job's artifact dir (single-output jobs unchanged, sharded E2E gets every shard's tail). needs.e2e.result already aggregates the matrix legs, so the pass/fail gate is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

…hards Follow-up to the e2e sharding so nothing is hand-maintained and CI minutes aren't wasted: - Shard COUNT is now derived from the actual spec files. A tiny e2e_matrix job counts core/e2e/tests/*.spec.ts and emits a 1-based JSON shard array (~11 files/shard, capped at 5 runners); the e2e job consumes it via fromJSON(needs.e2e_matrix.outputs.shards). Adding/removing a spec needs no workflow edit. (The file LIST was already auto-discovered by run-all.ts; this removes the last hand-maintained literal, the shard count.) Portable array build via `seq | paste -sd,` to avoid the BSD `seq -s` trailing-comma quirk. - Build the frontend + docs ONCE in a new e2e_build job and upload the two dist dirs the backend serves (core/frontend/dist, docs/dist) as an artifact; each shard DOWNLOADS it instead of rebuilding. Removes the per-shard build (the dominant cost) and drops git-LFS from the shards (images are baked into the built docs/dist). e2e now `needs: [e2e_matrix, e2e_build]`. - report job: add e2e_matrix + e2e_build to needs and require both success/skipped, so a generator/build failure (which skips e2e) can't read as a false green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

github-actions · 2026-06-20T16:48:07Z

❌ PR Checks Failed

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	✅ Passed
Security	✅ Passed
E2E	❌ Failed

❌ E2E Failures

... (truncated 527 lines)
�[1A�[2K[29/11] (retries) [chromium] › tests/catalog.spec.ts:298:3 › Systems & Catalog › creates an environment and attaches a system to it (retry #2)
�[1A�[2K[30/11] (retries) [chromium] › tests/catalog.spec.ts:340:3 › Systems & Catalog › filtered browse shows a no-matches state with clear-filters (retry #2)
�[1A�[2K[31/11] (retries) [chromium] › tests/catalog.spec.ts:377:3 › Systems & Catalog › deletes a system with confirmation (retry #2)
�[1A�[2K  1 failed
    [chromium] › tests/catalog.spec.ts:224:3 › Systems & Catalog › edits a system name ─────────────
  1 flaky
    [chromium] › tests/catalog.spec.ts:115:3 › Systems & Catalog › creating a system adds it to management and browse 
  5 did not run
  4 passed (36.2s)

========== dependency.spec.ts ==========

Running 4 tests using 1 worker

�[1A�[2K[1/4] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/4] [chromium] › tests/dependency.spec.ts:64:3 › dependency map › renders the map page with its instructional header and graph toolbar
�[1A�[2K[3/4] [chromium] › tests/dependency.spec.ts:94:3 › dependency map › shows an empty graph when there are no systems
�[1A�[2K[4/4] [chromium] › tests/dependency.spec.ts:111:3 › dependency map › reflects a dependency created between two systems
�[1A�[2K  4 passed (17.6s)

========== incident.spec.ts ==========

Running 8 tests using 1 worker

�[1A�[2K[1/8] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/8] [chromium] › tests/incident.spec.ts:100:3 › incidents › shows the empty incidents state on a fresh database
�[1A�[2K[3/8] [chromium] › tests/incident.spec.ts:117:3 › incidents › validates that an incident requires at least one system
�[1A�[2K[4/8] [chromium] › tests/incident.spec.ts:159:3 › incidents › creates a system via the catalog so incidents can target it
�[1A�[2K[5/8] [chromium] › tests/incident.spec.ts:193:3 › incidents › creates an incident against the system
�[1A�[2K[6/8] [chromium] › tests/incident.spec.ts:220:3 › incidents › opens the incident detail page via the system history
�[1A�[2K[7/8] [chromium] › tests/incident.spec.ts:247:3 › incidents › resolves the incident from the detail page and reflects the new status
�[1A�[2K[8/8] [chromium] › tests/incident.spec.ts:269:3 › incidents › resolved incident is hidden by default and visible via 'Show resolved'
�[1A�[2K  8 passed (25.1s)

========== maintenance.spec.ts ==========

Running 9 tests using 1 worker

�[1A�[2K[1/9] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/9] [chromium] › tests/maintenance.spec.ts:59:3 › maintenance windows › creates the prerequisite system via the catalog UI
�[1A�[2K[3/9] [chromium] › tests/maintenance.spec.ts:84:3 › maintenance windows › resolves the created system's id from the catalog browse row
�[1A�[2K[4/9] [chromium] › tests/maintenance.spec.ts:109:3 › maintenance windows › shows the empty state before any maintenance exists
�[1A�[2K[5/9] [chromium] › tests/maintenance.spec.ts:124:3 › maintenance windows › validates required fields and end-before-start in the editor
�[1A�[2K[6/9] [chromium] › tests/maintenance.spec.ts:186:3 › maintenance windows › creates a maintenance window and lists it
�[1A�[2K[7/9] [chromium] › tests/maintenance.spec.ts:224:3 › maintenance windows › opens the detail page from the system history
�[1A�[2K[8/9] [chromium] › tests/maintenance.spec.ts:254:3 › maintenance windows › edits an existing maintenance window
�[1A�[2K[9/9] [chromium] › tests/maintenance.spec.ts:283:3 › maintenance windows › deletes a maintenance window with confirmation
�[1A�[2K  9 passed (21.5s)

========== permissions.spec.ts ==========

Running 5 tests using 1 worker

�[1A�[2K[1/5] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/5] [setup-member] › tests/member.setup.ts:14:1 › register a non-admin member
�[1A�[2K[3/5] [member] › tests/permissions.spec.ts:21:3 › UI permissions (non-admin member) › the member is signed in as themselves, not the admin
�[1A�[2K[4/5] [member] › tests/permissions.spec.ts:33:3 › UI permissions (non-admin member) › an admin-only route renders the Access Denied gate
�[1A�[2K[5/5] [member] › tests/permissions.spec.ts:48:3 › UI permissions (non-admin member) › admin-only navigation is not rendered for the member
�[1A�[2K  5 passed (15.6s)

========== queue.spec.ts ==========

Running 5 tests using 1 worker

�[1A�[2K[1/5] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/5] [chromium] › tests/queue.spec.ts:33:3 › Queue admin area › opens the Queue runtime panel from the infrastructure shell
�[1A�[2K[3/5] [chromium] › tests/queue.spec.ts:65:3 › Queue admin area › the runtime panel shows the instance-scope banner and count tiles
�[1A�[2K[4/5] [chromium] › tests/queue.spec.ts:93:3 › Queue admin area › the job-state sub-tabs render their listings
�[1A�[2K[5/5] [chromium] › tests/queue.spec.ts:131:3 › Queue admin area › the Queue tab also exposes the configuration sub-section
�[1A�[2K  5 passed (15.2s)

========== secrets.spec.ts ==========

Running 7 tests using 1 worker

�[1A�[2K[1/7] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/7] [chromium] › tests/secrets.spec.ts:44:3 › admin secrets › shows the empty state with no secrets
�[1A�[2K[3/7] [chromium] › tests/secrets.spec.ts:62:3 › admin secrets › disables the create button until name and value are provided
�[1A�[2K[4/7] [chromium] › tests/secrets.spec.ts:86:3 › admin secrets › creates a secret and lists it without ever exposing the value
�[1A�[2K[5/7] [chromium] › tests/secrets.spec.ts:113:3 › admin secrets › creating with an existing name rotates it rather than duplicating
�[1A�[2K[6/7] [chromium] › tests/secrets.spec.ts:141:3 › admin secrets › rotates a secret via the dialog without revealing the value
�[1A�[2K[7/7] [chromium] › tests/secrets.spec.ts:182:3 › admin secrets › deletes a secret only after confirming
�[1A�[2K  7 passed (19.2s)

========== status-page.spec.ts ==========

Running 4 tests using 1 worker

�[1A�[2K[1/4] [setup-admin] › tests/auth.setup.ts:15:1 › authenticate (onboard first admin)
�[1A�[2K[2/4] [chromium] › tests/status-page.spec.ts:31:3 › Status pages › list renders its empty state when no pages exist
�[1A�[2K[3/4] [chromium] › tests/status-page.spec.ts:43:3 › Status pages › an unpublished page is NOT served on the public route
�[1A�[2K[4/4] [chromium] › tests/status-page.spec.ts:68:3 › Status pages › operator builds, publishes, and the public page serves the content
�[1A�[2K  4 passed (17.2s)

================ summary ================
passed: 9/10
FAILED: catalog.spec.ts
[e2e] stopping ephemeral Postgres...
[e2e] teardown complete.

How to fix: These are the Playwright end-to-end tests. Reproduce locally with bun run --filter @checkstack/e2e test:e2e (it provisions an ephemeral Postgres via Testcontainers, so Docker must be running). Read the failing assertions and uploaded traces, then fix the implementation or the selectors so the flows pass. Do not weaken or skip the tests.

@enyineer The above code quality issues were found in this PR. Please fix them before merging.

…ety) The catalog retry-safety fix keyed system/group/env NAMES to the retry attempt but left SYSTEM_DESCRIPTION a constant. On a serial-group retry the management table lists every system - including the previous attempt's leftover row - so a shared description matched two rows: `getByText(SYSTEM_DESCRIPTION)` tripped strict mode ("resolved to 2 elements") and failed E2E shard 3. The local run never retried, so it didn't surface. Make the description per-attempt (`-r<n>`) like the names, so every value an assertion matches on is unique to the attempt and a retry's leftover rows can't collide. Audited all getByText/name assertions in the spec: the description was the only remaining fixed data value. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

github-actions · 2026-06-20T17:00:14Z

✅ All PR Checks Passed

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	✅ Passed
Security	✅ Passed
E2E	✅ Passed

@enyineer All quality checks have passed. This PR is ready for your review.

…tempt Makes every spec retry-safe by construction and ends the per-spec whack-a-mole (empty-state-first + serial-mutate specs - catalog, incident, maintenance, secrets, status-page - all shared the same latent fragility). Root cause: the e2e DB is reset per FILE boot, not per Playwright retry, and a serial group retries from the top. So in-process retries re-ran against the previous attempt's polluted DB - global empty-state assertions failed, and fixed names/descriptions collided with leftover rows. Move retries from Playwright (same DB) to run-all at the FILE level: set Playwright retries:0, and on a spec failure re-run the whole `playwright test <file>` invocation (up to 3 attempts in CI). Each invocation re-boots the backend (webServer reuseExistingServer:false), which DROP/CREATEs the e2e DB, so every attempt starts from a fresh, empty, migration-reset database - the serial chain simply re-runs clean. A trace is captured only on a retry, so the happy path keeps no per-test tracing overhead. Verified locally with an induced flake: attempt 1 fails (no in-process retry), run-all re-boots, attempt 2 passes -> "all spec files green". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

github-actions · 2026-06-20T17:12:14Z

✅ All PR Checks Passed

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	✅ Passed
Security	✅ Passed
E2E	✅ Passed

@enyineer All quality checks have passed. This PR is ready for your review.

…B per attempt" This reverts commit b549ae7. The file-level retry was unnecessary machinery: the suite was already green on the previous commit (the empty-state split + per-attempt naming) using Playwright's built-in retries. Retrying harder masks transient flakiness rather than fixing it - the real fix is the per-spec robustness (test isolation + idempotent assertions), which removes the DETERMINISTIC fragility. Keep Playwright's standard CI retries as the thin safety net for genuine transient browser races; do not re-run whole files on a fresh DB. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

github-actions · 2026-06-20T17:21:40Z

✅ All PR Checks Passed

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	✅ Passed
Security	✅ Passed
E2E	✅ Passed

@enyineer All quality checks have passed. This PR is ready for your review.

Audit showed the retry-fragility is suite-wide (~15 mutating serial specs), not a handful: a serial group retries from the top against a DB reset only per file boot, so empty-state assertions and fixed-value matches collide with the prior attempt's leftover rows. Hardening each spec by hand (extract empty-state file + per-attempt naming) is whack-a-mole that every future spec would also need. Reinstate the structural fix instead (reverts the earlier revert 9592c2b): set Playwright retries:0 and retry a failed spec at the FILE level in run-all (3 attempts in CI). Each invocation re-boots the backend, which DROP/CREATEs the e2e DB - so every retry starts from a fresh, empty, migration-reset database. This makes the retries we already keep honor the suite's own per-file-fresh-DB design, fixing all specs (and future ones) uniformly. It is not "retry harder" - it gives each retry a clean slate, which is the actual root-cause fix. Verified locally with an induced flake: attempt 1 fails, run-all re-boots, attempt 2 passes. (catalog.spec keeps its per-test idempotency from the earlier commits as harmless defense-in-depth; no other spec needs per-spec changes now.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

… it) With the file-level retry giving every attempt a fresh DB, the per-spec robustness added to catalog earlier (empty-state split into catalog-empty.spec + per-attempt naming) is no longer needed. Restore catalog.spec.ts to its original inline form and remove catalog-empty.spec.ts, so the suite has ONE retry-safety mechanism (fresh DB per attempt) instead of a mix - nothing for future specs to cargo-cult. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

github-actions · 2026-06-20T18:05:32Z

✅ All PR Checks Passed

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	✅ Passed
Security	✅ Passed
E2E	✅ Passed

@enyineer All quality checks have passed. This PR is ready for your review.

Each spec file boots the backend, which re-ran ALL migrations (~100+ across ~25 plugin schemas) on every boot because the reset created an EMPTY database. Build the migrated schema ONCE per run and clone it instead. - template-db.ts: build the template by booting the REAL backend once against an empty DB (the exact production migration path + idempotent role/access-rule seeding), wait for readiness, stop it, and drain its connections so the template can be a CREATE DATABASE ... TEMPLATE source. Built from current migrations every run -> drift-proof, no checked-in dump. No admin user is seeded, so per-file onboarding is unchanged. - with-e2e-postgres.ts: build the template once after Postgres is up, before the spec loop (inside the try, so a build failure still tears the container down and fails loudly). - start-e2e-server.ts: reset by `CREATE DATABASE ... TEMPLATE` when the template exists (file copy -> boot-time migrations no-op), falling back to empty-create + migrate when it doesn't (direct test:e2e:file runs). Verified locally: template builds in ~3s, catalog spec passes through the clone path. Green CI proves the path is active (build failure would fail loudly). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

github-actions · 2026-06-20T18:57:25Z

❌ PR Checks Failed

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	❌ Failed
Security	✅ Passed
E2E	✅ Passed

❌ Integration Test Failures

... (truncated 127 lines)
(skip) MCP Streamable-HTTP conformance > tools/list never lists an out-of-scope tool
(skip) MCP Streamable-HTTP conformance > tools/call for an out-of-scope tool is REFUSED 403 (not merely hidden)
(skip) MCP Streamable-HTTP conformance > tools/call for a mutating tool is refused by the structural effect-gate

::endgroup::

::group::core/backend/src/public-host/routing.e2e.it.test.ts:
(pass) custom-domain host (status.fake.test) — locked down > 404s an admin data endpoint [5.97ms]
(pass) custom-domain host (status.fake.test) — locked down > 404s REST and platform endpoints [0.58ms]
(pass) custom-domain host (status.fake.test) — locked down > allows the single public read [0.42ms]
(pass) custom-domain host (status.fake.test) — locked down > /api/config returns the custom origin + publicHost (never the admin origin) [0.41ms]
(pass) custom-domain host (status.fake.test) — locked down > serves the PUBLIC bundle for navigational routes [0.57ms]
(pass) admin host (admin.fake.test) — unaffected > admin data endpoint is reachable [0.26ms]
(pass) admin host (admin.fake.test) — unaffected > serves the ADMIN bundle and admin config [0.73ms]
(pass) unknown host — no regression (admin behavior) > is not locked down [0.57ms]

::endgroup::

::group::core/catalog-backend/src/services/entity-service.it.test.ts:
(pass) EntityService (real Postgres) > getSystemByName (case-insensitive uniqueness lookup) > matches regardless of case so 'Api' collides with 'api' [3.60ms]
(pass) EntityService (real Postgres) > getSystemByName (case-insensitive uniqueness lookup) > returns undefined when the name is free [1.07ms]
(pass) EntityService (real Postgres) > removeContact (compound id + systemId scoping) > does not delete a contact when the systemId does not match [7.30ms]
(pass) EntityService (real Postgres) > removeLink (compound id + systemId scoping) > does not delete a link when the systemId does not match [6.67ms]

::endgroup::

::group::core/automation-backend/src/dispatch/stage1.it.test.ts:
(pass) Stage-1 routing exactly-once (real Redis) > one ENTITY_CHANGED-style job runs the routing handler exactly once across two workers [1525.45ms]

::endgroup::

::group::core/automation-backend/src/dispatch/dwell.it.test.ts:
(pass) dwell-store atomic claim (real Postgres) > two concurrent delete(id) calls → exactly one returns a row [20.78ms]

::endgroup::

::group::core/automation-backend/src/dispatch/stage2-stalled.it.test.ts:
(pass) Stage-2 stalled redelivery (real Redis) > a dead worker's job is redelivered to another worker and completed once [2074.41ms]

::endgroup::

::group::core/automation-backend/src/entity/wake-index.it.test.ts:
(pass) wake-index arm race + intersection lookup (real Postgres) > intersection lookup returns the owning until-lock for a concrete ref [20.89ms]
(pass) wake-index arm race + intersection lookup (real Postgres) > matches a kind-level wildcard wait [4.38ms]
(pass) wake-index arm race + intersection lookup (real Postgres) > concurrent same-(lock, ref) inserts leave exactly one row [26.99ms]

::endgroup::

::group::core/automation-backend/src/entity/cross-pod-read-consistency.it.test.ts:
(pass) cross-pod reactive-entity read consistency (real Postgres) > durable kind: a write on pod A is visible to pod B's read + getMany [22.87ms]
(pass) cross-pod reactive-entity read consistency (real Postgres) > NEGATIVE CONTROL: a pod-local read does NOT see another pod's write (proves teeth) [5.78ms]

::endgroup::

::group::core/backend-api/src/script-sandbox/rootless-egress.it.test.ts:
(skip) rootless egress (real slirp4netns) > delivers filtered egress: blocks a non-allowlisted destination
(skip) rootless egress (real slirp4netns) > the network decision picks the rootless path on this host

::endgroup::

::group::core/backend-api/src/script-sandbox/forkbomb.it.test.ts:
(skip) per-run fork-bomb containment (real bwrap) > caps a shell fork bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > caps an ESM spawn-loop bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > still runs a benign script to success under the same fail-closed default

::endgroup::

::group::core/backend/src/services/plugin-installers/install-from-tarball.it.test.ts:
(pass) installBundleFromArtifacts (real bun install) > resolves an intra-bundle sibling dependency without a registry [34.63ms]

::endgroup::

16 tests skipped:
(skip) external plugin install (real instance + UI) > (unnamed)
(skip) external plugin install (real instance + UI) > installs the packaged plugin via the UI; frontend + backend + core plugins load
(skip) external plugin install (real instance + UI) > (unnamed)
(skip) MCP Streamable-HTTP conformance > initialize advertises a protocol version and a session id
(skip) MCP Streamable-HTTP conformance > initialize echoes a negotiated protocol version
(skip) MCP Streamable-HTTP conformance > tools/list WITHOUT a session id is refused (session enforced, not cosmetic)
(skip) MCP Streamable-HTTP conformance > tools/list returns the read-only tool surface
(skip) MCP Streamable-HTTP conformance > tools/call returns a non-error content block
(skip) MCP Streamable-HTTP conformance > tools/list never lists an out-of-scope tool
(skip) MCP Streamable-HTTP conformance > tools/call for an out-of-scope tool is REFUSED 403 (not merely hidden)
(skip) MCP Streamable-HTTP conformance > tools/call for a mutating tool is refused by the structural effect-gate
(skip) rootless egress (real slirp4netns) > delivers filtered egress: blocks a non-allowlisted destination
(skip) rootless egress (real slirp4netns) > the network decision picks the rootless path on this host
(skip) per-run fork-bomb containment (real bwrap) > caps a shell fork bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > caps an ESM spawn-loop bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > still runs a benign script to success under the same fail-closed default


1 tests failed:
(fail) external plugin lifecycle (published tarballs) > installs from the local registry [35553.00ms]

 62 pass
 16 skip
 1 fail
 209 expect() calls
Ran 79 tests across 24 files. [70.10s]

How to fix: These are the real-services integration tests (*.it.test.ts). To reproduce locally, start the dev services with docker compose -f docker-compose-dev.yml up -d postgres redis, then run CHECKSTACK_IT=1 bun test it.test. Read the failing assertions and fix the implementation so the tests pass against real Postgres/Redis. Do not weaken or skip the tests.

@enyineer The above code quality issues were found in this PR. Please fix them before merging.

github-actions · 2026-06-20T19:02:45Z

❌ PR Checks Failed

⚠️ Escalation: Automated fixes have not resolved the issues after 3 attempts. Manual intervention is required.

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	❌ Failed
Security	✅ Passed
E2E	✅ Passed

❌ Integration Test Failures

... (truncated 167 lines)
(skip) MCP Streamable-HTTP conformance > tools/call for a mutating tool is refused by the structural effect-gate

::endgroup::

::group::core/backend/src/public-host/routing.e2e.it.test.ts:
(pass) custom-domain host (status.fake.test) — locked down > 404s an admin data endpoint [3.97ms]
(pass) custom-domain host (status.fake.test) — locked down > 404s REST and platform endpoints [0.54ms]
(pass) custom-domain host (status.fake.test) — locked down > allows the single public read [0.41ms]
(pass) custom-domain host (status.fake.test) — locked down > /api/config returns the custom origin + publicHost (never the admin origin) [0.51ms]
(pass) custom-domain host (status.fake.test) — locked down > serves the PUBLIC bundle for navigational routes [0.53ms]
(pass) admin host (admin.fake.test) — unaffected > admin data endpoint is reachable [0.32ms]
(pass) admin host (admin.fake.test) — unaffected > serves the ADMIN bundle and admin config [1.83ms]
(pass) unknown host — no regression (admin behavior) > is not locked down [0.58ms]

::endgroup::

::group::core/catalog-backend/src/services/entity-service.it.test.ts:
(pass) EntityService (real Postgres) > getSystemByName (case-insensitive uniqueness lookup) > matches regardless of case so 'Api' collides with 'api' [4.75ms]
(pass) EntityService (real Postgres) > getSystemByName (case-insensitive uniqueness lookup) > returns undefined when the name is free [1.11ms]
(pass) EntityService (real Postgres) > removeContact (compound id + systemId scoping) > does not delete a contact when the systemId does not match [7.91ms]
(pass) EntityService (real Postgres) > removeLink (compound id + systemId scoping) > does not delete a link when the systemId does not match [7.65ms]

::endgroup::

::group::core/automation-backend/src/dispatch/stage1.it.test.ts:
(pass) Stage-1 routing exactly-once (real Redis) > one ENTITY_CHANGED-style job runs the routing handler exactly once across two workers [1528.55ms]

::endgroup::

::group::core/automation-backend/src/dispatch/dwell.it.test.ts:
(pass) dwell-store atomic claim (real Postgres) > two concurrent delete(id) calls → exactly one returns a row [19.63ms]

::endgroup::

::group::core/automation-backend/src/dispatch/stage2-stalled.it.test.ts:
(pass) Stage-2 stalled redelivery (real Redis) > a dead worker's job is redelivered to another worker and completed once [2068.65ms]

::endgroup::

::group::core/automation-backend/src/entity/wake-index.it.test.ts:
(pass) wake-index arm race + intersection lookup (real Postgres) > intersection lookup returns the owning until-lock for a concrete ref [20.01ms]
(pass) wake-index arm race + intersection lookup (real Postgres) > matches a kind-level wildcard wait [5.00ms]
(pass) wake-index arm race + intersection lookup (real Postgres) > concurrent same-(lock, ref) inserts leave exactly one row [24.02ms]

::endgroup::

::group::core/automation-backend/src/entity/cross-pod-read-consistency.it.test.ts:
(pass) cross-pod reactive-entity read consistency (real Postgres) > durable kind: a write on pod A is visible to pod B's read + getMany [20.88ms]
(pass) cross-pod reactive-entity read consistency (real Postgres) > NEGATIVE CONTROL: a pod-local read does NOT see another pod's write (proves teeth) [3.48ms]

::endgroup::

::group::core/backend-api/src/script-sandbox/rootless-egress.it.test.ts:
(skip) rootless egress (real slirp4netns) > delivers filtered egress: blocks a non-allowlisted destination
(skip) rootless egress (real slirp4netns) > the network decision picks the rootless path on this host

::endgroup::

::group::core/backend-api/src/script-sandbox/forkbomb.it.test.ts:
(skip) per-run fork-bomb containment (real bwrap) > caps a shell fork bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > caps an ESM spawn-loop bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > still runs a benign script to success under the same fail-closed default

::endgroup::

::group::core/backend/src/services/plugin-installers/install-from-tarball.it.test.ts:
(pass) installBundleFromArtifacts (real bun install) > resolves an intra-bundle sibling dependency without a registry [33.50ms]

::endgroup::

16 tests skipped:
(skip) external plugin install (real instance + UI) > (unnamed)
(skip) external plugin install (real instance + UI) > installs the packaged plugin via the UI; frontend + backend + core plugins load
(skip) external plugin install (real instance + UI) > (unnamed)
(skip) MCP Streamable-HTTP conformance > initialize advertises a protocol version and a session id
(skip) MCP Streamable-HTTP conformance > initialize echoes a negotiated protocol version
(skip) MCP Streamable-HTTP conformance > tools/list WITHOUT a session id is refused (session enforced, not cosmetic)
(skip) MCP Streamable-HTTP conformance > tools/list returns the read-only tool surface
(skip) MCP Streamable-HTTP conformance > tools/call returns a non-error content block
(skip) MCP Streamable-HTTP conformance > tools/list never lists an out-of-scope tool
(skip) MCP Streamable-HTTP conformance > tools/call for an out-of-scope tool is REFUSED 403 (not merely hidden)
(skip) MCP Streamable-HTTP conformance > tools/call for a mutating tool is refused by the structural effect-gate
(skip) rootless egress (real slirp4netns) > delivers filtered egress: blocks a non-allowlisted destination
(skip) rootless egress (real slirp4netns) > the network decision picks the rootless path on this host
(skip) per-run fork-bomb containment (real bwrap) > caps a shell fork bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > caps an ESM spawn-loop bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > still runs a benign script to success under the same fail-closed default


3 tests failed:
(fail) external plugin lifecycle (published tarballs) > installs from the local registry [33404.69ms]
(fail) external plugin lifecycle (published tarballs) > validates, packs, and bundles (workspace rewrite is a no-op for @checkstack deps) [5.36ms]
(fail) external plugin lifecycle (published tarballs) > boots the dev server and serves POST /api/<pluginId>/* == 200 with a JSON array [1006.34ms]

 60 pass
 16 skip
 3 fail
 200 expect() calls
Ran 79 tests across 24 files. [47.37s]

How to fix: These are the real-services integration tests (*.it.test.ts). To reproduce locally, start the dev services with docker compose -f docker-compose-dev.yml up -d postgres redis, then run CHECKSTACK_IT=1 bun test it.test. Read the failing assertions and fix the implementation so the tests pass against real Postgres/Redis. Do not weaken or skip the tests.

@enyineer The above code quality issues were found in this PR. Automated fixes have not resolved them after 3 attempts. Manual intervention is required.

…le resets" This reverts commit 4e3202c. Measured against the no-template baseline, the template clone gave no reliable CI speedup (within run-to-run noise): migrations were never the bottleneck - the per-file FULL backend boot (initializing ~50 plugins to readiness) + onboarding dominate the ~24s/file, and the template only removes the small migration slice. Per the decision, drop the template-DB complexity and pursue boot-once (boot the backend once + isolate test data per worker) as the real lever instead. Also corrects a stale changeset entry that still described catalog's reverted per-spec retry-safety. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

github-actions · 2026-06-20T20:02:08Z

❌ PR Checks Failed

⚠️ Escalation: Automated fixes have not resolved the issues after 3 attempts. Manual intervention is required.

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	❌ Failed
Security	✅ Passed
E2E	✅ Passed

❌ Integration Test Failures

... (truncated 164 lines)
(skip) MCP Streamable-HTTP conformance > tools/call for a mutating tool is refused by the structural effect-gate

::endgroup::

::group::core/backend/src/public-host/routing.e2e.it.test.ts:
(pass) custom-domain host (status.fake.test) — locked down > 404s an admin data endpoint [3.52ms]
(pass) custom-domain host (status.fake.test) — locked down > 404s REST and platform endpoints [0.43ms]
(pass) custom-domain host (status.fake.test) — locked down > allows the single public read [0.32ms]
(pass) custom-domain host (status.fake.test) — locked down > /api/config returns the custom origin + publicHost (never the admin origin) [0.40ms]
(pass) custom-domain host (status.fake.test) — locked down > serves the PUBLIC bundle for navigational routes [0.42ms]
(pass) admin host (admin.fake.test) — unaffected > admin data endpoint is reachable [0.23ms]
(pass) admin host (admin.fake.test) — unaffected > serves the ADMIN bundle and admin config [0.41ms]
(pass) unknown host — no regression (admin behavior) > is not locked down [0.38ms]

::endgroup::

::group::core/catalog-backend/src/services/entity-service.it.test.ts:
(pass) EntityService (real Postgres) > getSystemByName (case-insensitive uniqueness lookup) > matches regardless of case so 'Api' collides with 'api' [3.32ms]
(pass) EntityService (real Postgres) > getSystemByName (case-insensitive uniqueness lookup) > returns undefined when the name is free [0.88ms]
(pass) EntityService (real Postgres) > removeContact (compound id + systemId scoping) > does not delete a contact when the systemId does not match [7.42ms]
(pass) EntityService (real Postgres) > removeLink (compound id + systemId scoping) > does not delete a link when the systemId does not match [4.61ms]

::endgroup::

::group::core/automation-backend/src/dispatch/stage1.it.test.ts:
(pass) Stage-1 routing exactly-once (real Redis) > one ENTITY_CHANGED-style job runs the routing handler exactly once across two workers [1522.09ms]

::endgroup::

::group::core/automation-backend/src/dispatch/dwell.it.test.ts:
(pass) dwell-store atomic claim (real Postgres) > two concurrent delete(id) calls → exactly one returns a row [24.23ms]

::endgroup::

::group::core/automation-backend/src/dispatch/stage2-stalled.it.test.ts:
(pass) Stage-2 stalled redelivery (real Redis) > a dead worker's job is redelivered to another worker and completed once [2065.75ms]

::endgroup::

::group::core/automation-backend/src/entity/wake-index.it.test.ts:
(pass) wake-index arm race + intersection lookup (real Postgres) > intersection lookup returns the owning until-lock for a concrete ref [16.57ms]
(pass) wake-index arm race + intersection lookup (real Postgres) > matches a kind-level wildcard wait [3.82ms]
(pass) wake-index arm race + intersection lookup (real Postgres) > concurrent same-(lock, ref) inserts leave exactly one row [18.91ms]

::endgroup::

::group::core/automation-backend/src/entity/cross-pod-read-consistency.it.test.ts:
(pass) cross-pod reactive-entity read consistency (real Postgres) > durable kind: a write on pod A is visible to pod B's read + getMany [20.26ms]
(pass) cross-pod reactive-entity read consistency (real Postgres) > NEGATIVE CONTROL: a pod-local read does NOT see another pod's write (proves teeth) [2.76ms]

::endgroup::

::group::core/backend-api/src/script-sandbox/rootless-egress.it.test.ts:
(skip) rootless egress (real slirp4netns) > delivers filtered egress: blocks a non-allowlisted destination
(skip) rootless egress (real slirp4netns) > the network decision picks the rootless path on this host

::endgroup::

::group::core/backend-api/src/script-sandbox/forkbomb.it.test.ts:
(skip) per-run fork-bomb containment (real bwrap) > caps a shell fork bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > caps an ESM spawn-loop bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > still runs a benign script to success under the same fail-closed default

::endgroup::

::group::core/backend/src/services/plugin-installers/install-from-tarball.it.test.ts:
(pass) installBundleFromArtifacts (real bun install) > resolves an intra-bundle sibling dependency without a registry [27.33ms]

::endgroup::

16 tests skipped:
(skip) external plugin install (real instance + UI) > (unnamed)
(skip) external plugin install (real instance + UI) > installs the packaged plugin via the UI; frontend + backend + core plugins load
(skip) external plugin install (real instance + UI) > (unnamed)
(skip) MCP Streamable-HTTP conformance > initialize advertises a protocol version and a session id
(skip) MCP Streamable-HTTP conformance > initialize echoes a negotiated protocol version
(skip) MCP Streamable-HTTP conformance > tools/list WITHOUT a session id is refused (session enforced, not cosmetic)
(skip) MCP Streamable-HTTP conformance > tools/list returns the read-only tool surface
(skip) MCP Streamable-HTTP conformance > tools/call returns a non-error content block
(skip) MCP Streamable-HTTP conformance > tools/list never lists an out-of-scope tool
(skip) MCP Streamable-HTTP conformance > tools/call for an out-of-scope tool is REFUSED 403 (not merely hidden)
(skip) MCP Streamable-HTTP conformance > tools/call for a mutating tool is refused by the structural effect-gate
(skip) rootless egress (real slirp4netns) > delivers filtered egress: blocks a non-allowlisted destination
(skip) rootless egress (real slirp4netns) > the network decision picks the rootless path on this host
(skip) per-run fork-bomb containment (real bwrap) > caps a shell fork bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > caps an ESM spawn-loop bomb and keeps the supervisor alive
(skip) per-run fork-bomb containment (real bwrap) > still runs a benign script to success under the same fail-closed default


3 tests failed:
(fail) external plugin lifecycle (published tarballs) > installs from the local registry [29471.67ms]
(fail) external plugin lifecycle (published tarballs) > validates, packs, and bundles (workspace rewrite is a no-op for @checkstack deps) [4.21ms]
(fail) external plugin lifecycle (published tarballs) > boots the dev server and serves POST /api/<pluginId>/* == 200 with a JSON array [1004.03ms]

 60 pass
 16 skip
 3 fail
 200 expect() calls
Ran 79 tests across 24 files. [43.39s]

How to fix: These are the real-services integration tests (*.it.test.ts). To reproduce locally, start the dev services with docker compose -f docker-compose-dev.yml up -d postgres redis, then run CHECKSTACK_IT=1 bun test it.test. Read the failing assertions and fix the implementation so the tests pass against real Postgres/Redis. Do not weaken or skip the tests.

@enyineer The above code quality issues were found in this PR. Automated fixes have not resolved them after 3 attempts. Manual intervention is required.

The old harness (run-all.ts) rebooted the backend and reset the DB once PER SPEC FILE and ran files serially - ~24s/file of pure reboot overhead. A measured PoC showed the per-file reboot, not migrations, was the bottleneck (the template-DB approach moved nothing). So boot the backend ONCE per run/shard and run every spec in PARALLEL against one shared DB. This works because every spec is now DATA-ISOLATED: - Each namespaces the entities it creates (`const NS = ...`; unique suffix), so parallel specs sharing the DB never collide. - No spec asserts global / whole-DB state (empty lists, global counts). - Onboarding / "fresh install" empty-state assertions moved to a dedicated PRISTINE phase: `*.empty.spec.ts` in an `empty-state` Playwright project that the data specs depend on, so it runs first on the clean DB. dashboard, ai, notification, infrastructure, queue, gitops became `.empty` specs; the per-domain empties deleted during isolation are reconstructed in onboarding.empty.spec.ts. Harness: - playwright.config.ts: setup-admin -> empty-state -> chromium (parallel) -> member, with fullyParallel + workers. - with-e2e-postgres.ts: runs `playwright test` once; forwards `--shard=i/N`. - CI e2e job shards with Playwright's NATIVE --shard (matrix size = job-total). - Because data-isolated specs make in-process retries safe again, the file-level retry runner is retired: run-all.ts, shard.ts, shard.test.ts, and the PoC scaffolding are removed. Verified: full suite (168 tests, 34 files) green locally boot-once at workers=4 in ~80s (single machine), vs minutes per shard before. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE

github-actions · 2026-06-20T22:08:24Z

✅ All PR Checks Passed

Check	Status
Typecheck	✅ Passed
Lint	✅ Passed
Deps	✅ Passed
Test	✅ Passed
Integration	✅ Passed
Security	✅ Passed
E2E	✅ Passed

@enyineer All quality checks have passed. This PR is ready for your review.

enyineer force-pushed the chore/comprehensive-review-and-improvements branch from 7a0cbeb to 831ad49 Compare June 20, 2026 09:43

enyineer and others added 10 commits June 20, 2026 12:10

github-advanced-security AI found potential problems Jun 20, 2026

View reviewed changes

Comment thread plugins/healthcheck-http-backend/src/connect-probe.ts Fixed

enyineer and others added 2 commits June 20, 2026 16:02

Merge branch 'main' into chore/comprehensive-review-and-improvements

cc7e13e

enyineer and others added 2 commits June 20, 2026 18:25

enyineer and others added 2 commits June 20, 2026 19:52

enyineer merged commit 8cad340 into main Jun 21, 2026
17 checks passed

Uh oh!

Conversation

enyineer commented Jun 20, 2026

Design system (premium UI rework)

Anomaly-detection defaults (low-noise problem detection)

Security hardening

Testing

UX & accessibility

Refactors & docs

Notes

Uh oh!

changeset-bot Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

github-actions Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Changeset Coverage Incomplete

Uh oh!

Uh oh!

github-actions Bot commented Jun 20, 2026

❌ PR Checks Failed

Uh oh!

github-actions Bot commented Jun 20, 2026

✅ All PR Checks Passed

Uh oh!

github-actions Bot commented Jun 20, 2026

✅ All PR Checks Passed

Uh oh!

github-actions Bot commented Jun 20, 2026

❌ PR Checks Failed

Uh oh!

github-actions Bot commented Jun 20, 2026

✅ All PR Checks Passed

Uh oh!

github-actions Bot commented Jun 20, 2026

✅ All PR Checks Passed

Uh oh!

github-actions Bot commented Jun 20, 2026

✅ All PR Checks Passed

Uh oh!

github-actions Bot commented Jun 20, 2026

✅ All PR Checks Passed

Uh oh!

github-actions Bot commented Jun 20, 2026

❌ PR Checks Failed

Uh oh!

github-actions Bot commented Jun 20, 2026

❌ PR Checks Failed

Uh oh!

github-actions Bot commented Jun 20, 2026

❌ PR Checks Failed

Uh oh!

github-actions Bot commented Jun 20, 2026

✅ All PR Checks Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot Bot commented Jun 20, 2026 •

edited

Loading

github-actions Bot commented Jun 20, 2026 •

edited

Loading