Comprehensive review, hardening, and a platform-wide design-system rework#354
Conversation
🦋 Changeset detectedLatest commit: a02d2bd The changes in this PR will be included in the next version bump. This PR includes changesets to release 98 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
📦 Changeset Coverage IncompleteThe following packages have code changes but are not included in any changeset:
|
…tem rework A large, verified body of work bringing Checkstack up across six areas: correctness, security, testing, UX, docs, and AI - capped by a premium, consistent UI design language. Design system (premium UI rework) - New `@checkstack/ui` foundation: surface elevation tokens, aurora gradient, colorblind-safe status triad, density model (comfortable/compact) + provider and user toggle, polished skeleton/empty/error states, and honest token-driven chart primitives (time series, sparkline, radial gauge, request waterfall, uptime ribbon). - A signature aurora page-header + deeper cards, an elevated app shell, and reskinned dashboard / health-check / SLO views. - Every plugin frontend adopts the tokens, then its highest-impact surfaces are redesigned to a premium bar (depth, number-led hierarchy, multi-encoded status). Pure tone/format logic extracted into unit-tested modules. - All alerts unified onto one premium `Alert`; the duplicate `InfoBanner` is removed (BREAKING: use `Alert`). Security hardening - At-rest encryption with key rotation + fail-loud decryption, brute-force / token-timing fixes, HTTP-collector SSRF guard, fail-closed plugin supply-chain integrity pinning, SQL plugin-schema identifier hardening, notification email HTML sanitization, per-assignment satellite result authorization, and a first-run onboarding TOCTOU guard. Testing - A real end-to-end suite (Playwright + Testcontainers Postgres) covering the authenticated app, made a blocking CI job, plus extracted pure-logic unit tests throughout. UX & accessibility - All review-surfaced UX improvements: form quality across editors, mobile responsiveness and touch targets, accessible overlays/forms, list/loading/ empty/error state consistency, onboarding + point-of-use coaching, wider command-palette coverage, and sidebar IA. Refactors & docs - Typed router-factory args + structured logging, typed Drizzle JSON columns, shared formatting helpers, removal of boundary casts, and same-PR docs/AI updates. Reliability - Retune anomaly-detection defaults across every health-check strategy and the hardware collector for a low-noise posture: noisy or un-baselineable metrics (raw counts, config echoes, payload sizes, deterministic values) default to off, while latency, availability, and saturation-percent signals are kept and hardened with confirmation windows and practical-significance floors. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
7a0cbeb to
831ad49
Compare
…ssertable results A collector must fail only when the transport could not complete (DNS/connect/ TLS failure, timeout, aborted, unspawnable process). A successfully-received result that is simply "not what you hoped" - an HTTP 4xx/5xx, a gRPC NOT_SERVING, offline Jenkins nodes, a non-zero script exit - is an ASSERTABLE METRIC, not a collector failure; the user's assertions (or the no-assertion default) decide health. Fixes the HTTP collector hard-failing on 404 (now a successful collection with statusCode exposed and assertable), plus gRPC, Jenkins node-health, and the script execute collector. Audited every other strategy: they already failed only on genuine transport failures. Adds regression tests, docs, a new project rule (.claude/rules/healthcheck-collectors.md), and a changeset (BREAKING: affected checks now need an explicit assertion to fail on a non-OK result). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…unchanged) Records an optional structured metadata.timings per run (DNS, connect, TLS, wait/TTFB, transfer, and a processing catch-all for non-HTTP operation time); the run-detail view renders the present phases in transport order and falls back to the old Connection+Processing split for older runs. HTTP: the request is byte-for-byte the same fetch path (IP-pinned, original Host + SNI) - request behavior is unchanged. Timing is measured around it: fetch resolves at the response headers so wait (TTFB) and transfer (body) are exact on the request, DNS is timed at the resolve step, and connect/TLS come from a short-lived best-effort raw net/tls probe to the same validated IP (Bun's fetch socket emits no connect/handshake events; raw sockets do). The probe is timing-only and never fails the check. Other transports surface the connect and operation times they already measure. Also fixes the run-detail "slowest" badge colliding with the bar, and shows a genuinely sub-millisecond phase as "<1 ms". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…fixes HTTP/2 404s) The SSRF hardening added in this branch pinned the request to the resolved IP: it rewrote the URL host to the IP literal and moved the hostname into the `Host` header. That breaks HTTP/2 origins, whose authority comes from the URL's `:authority` pseudo-header rather than `Host`, so real hosts such as google.com started answering 404/429 instead of 200. Keep the SSRF guard as a pre-flight validation (still rejects cloud-metadata / link-local and operator-denied ranges, and direct denied IP literals) but drop the pin and `fetch` the original URL verbatim, byte-identical to a plain fetch. The resolved IP is reused only for the best-effort timing probe. The only thing lost is DNS-rebind TOCTOU protection - a narrow window whose price was breaking every legitimate HTTP/2 request. Verified: example.com and cloudflare.com (both HTTP/2) return 200 with the full timing breakdown intact; SSRF guard tests still pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
The e2e suite hung indefinitely at "stopping ephemeral Postgres...". Bisected
locally: container.stop({ timeout: 0 }) resolves in ~180ms, but the process
never exits. Testcontainers' Ryuk reaper keeps a persistent socket open to its
sidecar for the process lifetime and relies on socket.unref() so it does not
block exit - the Bun runtime does not honor that unref, so the socket keeps the
event loop alive forever after the suite finishes. In CI the step pipes through
`tee`, which only ends when our stdout closes, hence the indefinite hang.
Disable Ryuk in the harness: the wrapper already stops and removes the container
deterministically in `finally` on every exit path (verified the container is
gone after stop() with Ryuk off), and CI runners are ephemeral, so the reaper is
unnecessary. The process now exits naturally - no force-exit workaround.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
The assistant had healthcheck.status (every check globally) but no way to map a check to a system, so it had to GUESS which check monitored a given system (e.g. "google.com expecting 201" vs "expecting 200"). Project the existing getSystemConfigurations query as the read-only AI tool healthcheck.listSystemChecks: given a systemId (resolved from a name via catalog.listSystems), it returns the checks assigned to that system - id, name, strategy, interval, collectors/ assertions, and paused state. The tool inherits the source procedure's system-scoped configuration.read gate (parentScope on catalog.system), so it stays team-scoped and needs no new permission. Adds a projection test mirroring the sibling tools, documents the system-scoped read pattern in registering-tools.md, and regenerates the docs index. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
Closing a downtime window depended entirely on catching the system's transient health-recovery edge (onEntityChanged), which is emitted only by a check RUN. Fixing, pausing, deleting, or unassigning the offending check just invalidates the read cache and emits no edge, and even a plain edit can lose the single recovery delivery - leaving the open window orphaned until the once-daily reconcile. Result: the SLO read 100% availability (live health is authoritative for the budget) while "Recent Downtime Events" still showed a 25-day "ongoing" window. The two views disagreed. The user-facing SLO reads now reconcile against live health before reporting: getDowntimeEvents and the status reads void an orphaned open window when the system is currently healthy, reusing the same voidOrphanedDowntime the daily job runs. The dashboard self-heals the moment it is viewed instead of waiting for midnight. The reactive entity read / computeStatus stays side-effect-free; the reconcile is a cheap no-op when there are no open events. The void primitive is already unit-tested; the router change is thin glue over it (no router harness exists in this package). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
The chat loop replays earlier tool results verbatim with no age annotation, and the system prompt injected "current time" but never how long the thread had been idle. So resuming an old conversation, the model answered from stale captured data (a check's old name, a "failing" status) instead of current state. streamTurn now measures the idle gap before the message (the conversation's last-activity timestamp, captured before the new user message bumps updatedAt) and, once it exceeds 10 minutes, folds a "Data freshness" directive into the system prompt telling the model to re-call the relevant read tools for current state rather than trust earlier-turn results. The directive sits at the volatile end of the prompt (next to the time line) so the cache-friendly stable prefix is unaffected; an active back-and-forth never sees it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
The reconcile-on-read fix DELETED orphaned downtime windows (when the system is healthy but the recovery edge was missed). For a genuine multi-day outage whose recovery edge was lost, that erased real downtime - so the SLO read a false 100% availability with full error budget, even though the system had been down for ~25 days. Reconcile now PRESERVES the downtime: it resolves the system's actual recovery time (the first healthy run on/after the window opened, via healthcheck getHistory) and CLOSES the window at that instant, so the real outage is counted against availability and the error budget. It only DELETES as a fallback when the recovery time can't be determined (e.g. run history pruned), where the unprovable downtime must not be counted. - closeDowntimeEvent gains an optional explicit endTime (clamped >= startTime). - SloEngine gains a recovery-time resolver, wired in afterPluginsReady from the healthcheck run history; voidOrphanedDowntime -> reconcileOrphanedDowntime. - Forward-only: already-written daily snapshots are not retroactively corrected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
A chat Skill (e.g. "write like a redneck") held during tool-calling steps but normalized back to professional tone in the synthesized reply. The multi-step loop's forced final-answer step replaces the whole system prompt with a tool-less "answer now, be concise" instruction, dropping the skill preamble on the exact step that writes the user-visible answer. prepareFinalAnswerStep now accepts persistent guidance (the skill preamble) and appends it after the base final-answer instruction, so the skill's voice governs the synthesized reply too. The headless runner passes none, so it is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
Asked "how do I add a system to the catalog?", the assistant answered with the internal tool name (catalog.createSystem) and its input JSON schema - but the operator cannot call tools and never sees them; that is the assistant's own mechanism, not a workflow. The chat system prompt now states tools are the assistant's own (not a public API), and a how-to must be answered in product terms (the UI, grounded in docs) and/or by offering to do it for the operator - never by presenting tool names, tool input JSON, or parameter schemas as steps to follow. Chat-only; the headless runner is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
CodeQL flagged the connect-probe's `rejectUnauthorized: false` ("disabling
certificate validation is strongly discouraged"). The probe is timing-only, but
disabling validation is unnecessary: it dials the validated IP with the original
hostname as SNI, so a valid cert verifies against `servername`, and the real
`fetch` already validates strictly (a bad cert fails the check regardless). Drop
the override; if the handshake can't complete (invalid/self-signed cert) the
existing error handler resolves with just the TCP `connectMs` and no `tlsMs` -
timing stays best-effort, never fatal.
Verified: valid hosts still report tlsMs; a cert/servername mismatch degrades to
connectMs only with no crash.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
❌ PR Checks Failed
❌ E2E FailuresHow to fix: These are the Playwright end-to-end tests. Reproduce locally with @enyineer The above code quality issues were found in this PR. Please fix them before merging. |
✅ All PR Checks Passed
@enyineer All quality checks have passed. This PR is ready for your review. |
The catalog spec runs as a serial group with retries:2, but the e2e DB is reset
only per file boot, NOT per retry, and a serial group retries from the top. So a
flake in any later test (e.g. "edits a system name") re-ran the WHOLE group
against an already-populated catalog: the global empty-state assertions then
hard-failed ("No systems in the catalog yet" is gone), and creates would collide
on the fixed name suffix. That turned a single transient into a red E2E job.
Two changes make it retry-safe:
- Split the two read-only empty-state tests into catalog-empty.spec.ts. run-all
boots a fresh, migration-empty DB per file and this file creates nothing, so
the empty state holds on every attempt (a retry re-asserts against the same
empty DB; the mutating spec runs in a separate invocation and can't pollute
it).
- Key the mutating chain's created names to the retry attempt (`-r<n>`, via
test.info().retry) so a group retry runs in its own namespace and never
collides with the previous attempt's leftover rows. Drop the delete test's
global "No systems yet" assertion (can't hold against retry leftovers; the
empty-state file owns that check).
Verified structurally: playwright --list discovers all tests in both files; lint
and typecheck pass. Full behavioral verification is via the e2e CI run.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
✅ All PR Checks Passed
@enyineer All quality checks have passed. This PR is ready for your review. |
The E2E job ran all ~32 spec files serially on one runner (~12 min). run-all.ts
already supports round-robin sharding (CHECKSTACK_E2E_SHARD_INDEX/TOTAL +
selectShard); this wires it into the workflow as a 3-way matrix.
Each shard is an independent runner with its OWN ephemeral Postgres
(Testcontainers), booting one backend at a time - the proven single-Postgres /
one-backend-per-runner model is unchanged, so there's no new cross-test
contention; we only split the FILE list across runners, cutting test wall-clock
~linearly (verified 32 specs split 11/11/10, each spec exactly once).
- matrix shard [1,2,3], fail-fast:false; CHECKSTACK_E2E_SHARD_TOTAL uses
${{ strategy.job-total }} so the matrix size is the single source of truth.
- Per-shard artifact names (e2e-output-<n>, e2e-traces-<n>): v4+ artifacts
reject duplicate names across parallel legs.
- report job: download e2e-output-* with merge-multiple; readOutput now
concatenates all .txt in a job's artifact dir (single-output jobs unchanged,
sharded E2E gets every shard's tail). needs.e2e.result already aggregates the
matrix legs, so the pass/fail gate is unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
…hards Follow-up to the e2e sharding so nothing is hand-maintained and CI minutes aren't wasted: - Shard COUNT is now derived from the actual spec files. A tiny e2e_matrix job counts core/e2e/tests/*.spec.ts and emits a 1-based JSON shard array (~11 files/shard, capped at 5 runners); the e2e job consumes it via fromJSON(needs.e2e_matrix.outputs.shards). Adding/removing a spec needs no workflow edit. (The file LIST was already auto-discovered by run-all.ts; this removes the last hand-maintained literal, the shard count.) Portable array build via `seq | paste -sd,` to avoid the BSD `seq -s` trailing-comma quirk. - Build the frontend + docs ONCE in a new e2e_build job and upload the two dist dirs the backend serves (core/frontend/dist, docs/dist) as an artifact; each shard DOWNLOADS it instead of rebuilding. Removes the per-shard build (the dominant cost) and drops git-LFS from the shards (images are baked into the built docs/dist). e2e now `needs: [e2e_matrix, e2e_build]`. - report job: add e2e_matrix + e2e_build to needs and require both success/skipped, so a generator/build failure (which skips e2e) can't read as a false green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
❌ PR Checks Failed
❌ E2E FailuresHow to fix: These are the Playwright end-to-end tests. Reproduce locally with @enyineer The above code quality issues were found in this PR. Please fix them before merging. |
…ety)
The catalog retry-safety fix keyed system/group/env NAMES to the retry attempt
but left SYSTEM_DESCRIPTION a constant. On a serial-group retry the management
table lists every system - including the previous attempt's leftover row - so a
shared description matched two rows: `getByText(SYSTEM_DESCRIPTION)` tripped
strict mode ("resolved to 2 elements") and failed E2E shard 3. The local run
never retried, so it didn't surface.
Make the description per-attempt (`-r<n>`) like the names, so every value an
assertion matches on is unique to the attempt and a retry's leftover rows can't
collide. Audited all getByText/name assertions in the spec: the description was
the only remaining fixed data value.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
✅ All PR Checks Passed
@enyineer All quality checks have passed. This PR is ready for your review. |
…tempt Makes every spec retry-safe by construction and ends the per-spec whack-a-mole (empty-state-first + serial-mutate specs - catalog, incident, maintenance, secrets, status-page - all shared the same latent fragility). Root cause: the e2e DB is reset per FILE boot, not per Playwright retry, and a serial group retries from the top. So in-process retries re-ran against the previous attempt's polluted DB - global empty-state assertions failed, and fixed names/descriptions collided with leftover rows. Move retries from Playwright (same DB) to run-all at the FILE level: set Playwright retries:0, and on a spec failure re-run the whole `playwright test <file>` invocation (up to 3 attempts in CI). Each invocation re-boots the backend (webServer reuseExistingServer:false), which DROP/CREATEs the e2e DB, so every attempt starts from a fresh, empty, migration-reset database - the serial chain simply re-runs clean. A trace is captured only on a retry, so the happy path keeps no per-test tracing overhead. Verified locally with an induced flake: attempt 1 fails (no in-process retry), run-all re-boots, attempt 2 passes -> "all spec files green". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
✅ All PR Checks Passed
@enyineer All quality checks have passed. This PR is ready for your review. |
…B per attempt" This reverts commit b549ae7. The file-level retry was unnecessary machinery: the suite was already green on the previous commit (the empty-state split + per-attempt naming) using Playwright's built-in retries. Retrying harder masks transient flakiness rather than fixing it - the real fix is the per-spec robustness (test isolation + idempotent assertions), which removes the DETERMINISTIC fragility. Keep Playwright's standard CI retries as the thin safety net for genuine transient browser races; do not re-run whole files on a fresh DB. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
✅ All PR Checks Passed
@enyineer All quality checks have passed. This PR is ready for your review. |
Audit showed the retry-fragility is suite-wide (~15 mutating serial specs), not a handful: a serial group retries from the top against a DB reset only per file boot, so empty-state assertions and fixed-value matches collide with the prior attempt's leftover rows. Hardening each spec by hand (extract empty-state file + per-attempt naming) is whack-a-mole that every future spec would also need. Reinstate the structural fix instead (reverts the earlier revert 9592c2b): set Playwright retries:0 and retry a failed spec at the FILE level in run-all (3 attempts in CI). Each invocation re-boots the backend, which DROP/CREATEs the e2e DB - so every retry starts from a fresh, empty, migration-reset database. This makes the retries we already keep honor the suite's own per-file-fresh-DB design, fixing all specs (and future ones) uniformly. It is not "retry harder" - it gives each retry a clean slate, which is the actual root-cause fix. Verified locally with an induced flake: attempt 1 fails, run-all re-boots, attempt 2 passes. (catalog.spec keeps its per-test idempotency from the earlier commits as harmless defense-in-depth; no other spec needs per-spec changes now.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
… it) With the file-level retry giving every attempt a fresh DB, the per-spec robustness added to catalog earlier (empty-state split into catalog-empty.spec + per-attempt naming) is no longer needed. Restore catalog.spec.ts to its original inline form and remove catalog-empty.spec.ts, so the suite has ONE retry-safety mechanism (fresh DB per attempt) instead of a mix - nothing for future specs to cargo-cult. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
✅ All PR Checks Passed
@enyineer All quality checks have passed. This PR is ready for your review. |
Each spec file boots the backend, which re-ran ALL migrations (~100+ across ~25 plugin schemas) on every boot because the reset created an EMPTY database. Build the migrated schema ONCE per run and clone it instead. - template-db.ts: build the template by booting the REAL backend once against an empty DB (the exact production migration path + idempotent role/access-rule seeding), wait for readiness, stop it, and drain its connections so the template can be a CREATE DATABASE ... TEMPLATE source. Built from current migrations every run -> drift-proof, no checked-in dump. No admin user is seeded, so per-file onboarding is unchanged. - with-e2e-postgres.ts: build the template once after Postgres is up, before the spec loop (inside the try, so a build failure still tears the container down and fails loudly). - start-e2e-server.ts: reset by `CREATE DATABASE ... TEMPLATE` when the template exists (file copy -> boot-time migrations no-op), falling back to empty-create + migrate when it doesn't (direct test:e2e:file runs). Verified locally: template builds in ~3s, catalog spec passes through the clone path. Green CI proves the path is active (build failure would fail loudly). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
❌ PR Checks Failed
❌ Integration Test FailuresHow to fix: These are the real-services integration tests ( @enyineer The above code quality issues were found in this PR. Please fix them before merging. |
❌ PR Checks Failed
❌ Integration Test FailuresHow to fix: These are the real-services integration tests ( @enyineer The above code quality issues were found in this PR. Automated fixes have not resolved them after 3 attempts. Manual intervention is required. |
…le resets" This reverts commit 4e3202c. Measured against the no-template baseline, the template clone gave no reliable CI speedup (within run-to-run noise): migrations were never the bottleneck - the per-file FULL backend boot (initializing ~50 plugins to readiness) + onboarding dominate the ~24s/file, and the template only removes the small migration slice. Per the decision, drop the template-DB complexity and pursue boot-once (boot the backend once + isolate test data per worker) as the real lever instead. Also corrects a stale changeset entry that still described catalog's reverted per-spec retry-safety. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
❌ PR Checks Failed
❌ Integration Test FailuresHow to fix: These are the real-services integration tests ( @enyineer The above code quality issues were found in this PR. Automated fixes have not resolved them after 3 attempts. Manual intervention is required. |
The old harness (run-all.ts) rebooted the backend and reset the DB once PER SPEC FILE and ran files serially - ~24s/file of pure reboot overhead. A measured PoC showed the per-file reboot, not migrations, was the bottleneck (the template-DB approach moved nothing). So boot the backend ONCE per run/shard and run every spec in PARALLEL against one shared DB. This works because every spec is now DATA-ISOLATED: - Each namespaces the entities it creates (`const NS = ...`; unique suffix), so parallel specs sharing the DB never collide. - No spec asserts global / whole-DB state (empty lists, global counts). - Onboarding / "fresh install" empty-state assertions moved to a dedicated PRISTINE phase: `*.empty.spec.ts` in an `empty-state` Playwright project that the data specs depend on, so it runs first on the clean DB. dashboard, ai, notification, infrastructure, queue, gitops became `.empty` specs; the per-domain empties deleted during isolation are reconstructed in onboarding.empty.spec.ts. Harness: - playwright.config.ts: setup-admin -> empty-state -> chromium (parallel) -> member, with fullyParallel + workers. - with-e2e-postgres.ts: runs `playwright test` once; forwards `--shard=i/N`. - CI e2e job shards with Playwright's NATIVE --shard (matrix size = job-total). - Because data-isolated specs make in-process retries safe again, the file-level retry runner is retired: run-all.ts, shard.ts, shard.test.ts, and the PoC scaffolding are removed. Verified: full suite (168 tests, 34 files) green locally boot-once at workers=4 in ~80s (single machine), vs minutes per shard before. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE
✅ All PR Checks Passed
@enyineer All quality checks have passed. This PR is ready for your review. |
A large, verified body of work spanning six areas - correctness, security, testing, UX, docs, and a premium, consistent UI design language - plus a low-noise retune of anomaly-detection defaults.
Design system (premium UI rework)
@checkstack/uifoundation: surface elevation tokens, an aurora gradient signature, a colorblind-safe status triad, a comfortable/compact density model (provider + user toggle), polished skeleton/empty/error states, and honest token-driven chart primitives (time series, sparkline, radial gauge, request waterfall, uptime ribbon).Alert.Warning
Breaking change: the duplicate
InfoBannercomponent (and its sub-components) is removed - useAlert, a drop-in replacement with the same variants and composable parts.Anomaly-detection defaults (low-noise problem detection)
Security hardening
Testing
UX & accessibility
Refactors & docs
Notes
🤖 Generated with Claude Code
https://claude.ai/code/session_01XVkYp7R1AtNoBziSNUpTQE