Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
831ad49
feat: comprehensive review, hardening, and a platform-wide design-sys…
enyineer Jun 20, 2026
32c2b52
fix(healthcheck): collectors fail only on transport failure, not on a…
enyineer Jun 20, 2026
d7400cf
feat(healthcheck): finer per-run transport timing breakdown (request …
enyineer Jun 20, 2026
2dac41f
fix(healthcheck-http): issue request verbatim instead of IP-pinning (…
enyineer Jun 20, 2026
1d3e244
fix(e2e): disable Ryuk reaper to fix teardown hang on Bun
enyineer Jun 20, 2026
6e92343
feat(healthcheck): AI tool to list health checks assigned to a system
enyineer Jun 20, 2026
f6e5363
fix(slo): self-heal orphaned downtime windows on read
enyineer Jun 20, 2026
9951d79
fix(ai): re-fetch instead of trusting stale context on a resumed chat
enyineer Jun 20, 2026
b7c0871
fix(slo): close orphaned downtime at actual recovery time, not delete
enyineer Jun 20, 2026
29a1dc7
fix(ai): keep an active skill's voice through the final-answer step
enyineer Jun 20, 2026
b9e68ec
fix(ai): don't expose internal tools as a user how-to
enyineer Jun 20, 2026
fb067fa
fix(healthcheck-http): validate TLS cert in the timing probe
enyineer Jun 20, 2026
cc7e13e
Merge branch 'main' into chore/comprehensive-review-and-improvements
enyineer Jun 20, 2026
4462d8c
test(e2e): make the catalog spec retry-safe (fix the CI flake)
enyineer Jun 20, 2026
c685b15
ci(e2e): shard the E2E suite across 3 runners
enyineer Jun 20, 2026
a19e9c0
ci(e2e): derive shard matrix from spec count + build once, share to s…
enyineer Jun 20, 2026
1f6d9d0
test(e2e): make catalog system description per-attempt too (retry-saf…
enyineer Jun 20, 2026
b549ae7
test(e2e): retry failed specs at the file level for a fresh DB per at…
enyineer Jun 20, 2026
9592c2b
Revert "test(e2e): retry failed specs at the file level for a fresh D…
enyineer Jun 20, 2026
0af9edc
test(e2e): reinstate file-level retry as the structural retry-safety fix
enyineer Jun 20, 2026
6be250e
test(e2e): drop catalog's per-test idempotency (structural fix covers…
enyineer Jun 20, 2026
4e3202c
test(e2e): clone a migrated TEMPLATE database for fast per-file resets
enyineer Jun 20, 2026
a5778e5
Revert "test(e2e): clone a migrated TEMPLATE database for fast per-fi…
enyineer Jun 20, 2026
a02d2bd
test(e2e): migrate the suite to a boot-once + parallel model
enyineer Jun 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
21 changes: 21 additions & 0 deletions .changeset/a11y-select-and-label-associations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
"@checkstack/status-page-frontend": patch
"@checkstack/auth-frontend": patch
---

Fix accessibility labeling defects on status-page and auth forms.

Radix `SelectTrigger` renders a `combobox` whose accessible name comes from
`aria-label`/`aria-labelledby`, not from its `SelectValue` placeholder child, so
screen readers previously announced several comboboxes as unnamed. Every such
trigger in the status-page builder (system, heading level, group, visibility) and
in the auth team/scope/ownership/resource-grant pickers now carries an
`aria-label` matching its visible intent.

Form labels that were rendered as detached `<label>`/`<Label>` elements (no
`htmlFor`/`id` pairing) are now associated with their inputs, so clicking a label
focuses its field and assistive tech announces the field name. This covers the
"Create Application" dialog (Name, Description) in auth, and the status-page
builder fields (Title, Slug, Brand color, Logo URL, uptime Days, event-feed max
updates / max age). No visual or behavioral change beyond the added accessible
names and label associations.
14 changes: 14 additions & 0 deletions .changeset/ai-chat-no-tool-leak-in-howto.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
"@checkstack/ai-backend": minor
---

Stop the assistant from exposing its internal tools as a user how-to.

Asked "how do I add a system to the catalog?", the chat assistant answered with
the internal tool name (`catalog.createSystem`) and its input JSON schema - but
the operator cannot call tools and never sees them; that is the assistant's own
mechanism, not a workflow. The chat system prompt now instructs the model that
tools are its own (not a public API), and that a how-to must be answered in
product terms (the UI, grounded in docs) and/or by offering to do it for the
operator - never by presenting tool names, tool input JSON, or parameter schemas
as steps to follow. Chat-only; the headless runner is unchanged.
18 changes: 18 additions & 0 deletions .changeset/ai-chat-refetch-stale-context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
"@checkstack/ai-backend": minor
---

Tell the assistant to re-fetch when resuming an idle conversation.

The chat loop replays earlier tool results verbatim with no age annotation, and
the system prompt injects "current time" but never how long the thread has been
idle. So resuming an old chat, the model answered from stale captured data (a
check's old name, a "failing" status) instead of the current state.

The turn now measures the idle gap before the message (the conversation's
last-activity timestamp, captured before the new message is appended) and, once
it exceeds 10 minutes, folds a "Data freshness" directive into the system prompt
instructing the model to re-call the relevant read tools for current state
rather than trust results from earlier in the thread. The directive sits at the
volatile end of the prompt (next to the time line), so the cache-friendly stable
prefix is unaffected; an active back-and-forth never sees it.
15 changes: 15 additions & 0 deletions .changeset/ai-features-steering-and-conformance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
"@checkstack/ai-backend": minor
"@checkstack/ai-common": minor
---

Improve AI chat/agent steering, MCP conformance, doc grounding, and provider seams.

- Tool feedback self-correction: a validation failure or duplicate tool call now surfaces as a thrown tool error (a distinct AI-SDK `tool-error` result part) instead of an ordinary success value, so the model is told the call failed and retries. Confirm cards remain success results and carry a structured `status: "awaiting_operator"`. The headless agent runner surfaces tool failures the same way instead of returning `{ error }` as data.
- System prompts are now sectioned (clear `##` headings, blank-line separation) with the safety-critical access-scope and investigation rules near the top. The ~600-token automation-building playbook is no longer always-on: it loads only when an automation tool is in scope (or via the `automation-author` skill). Headless author overrides are wrapped in an `<author_instructions>` delimiter.
- Model-family seam: connections may declare `modelFamily` (`anthropic` | `openai` | `generic`, default `generic`). The transport stays `@ai-sdk/openai-compatible` for every value; capable families get a lighter-touch prompt-calibration note. Per-turn volatile preambles (memory/skill/summary) now follow the stable base prompt for prompt-cache friendliness on caching-capable gateways.
- MCP Streamable-HTTP conformance (spec `2025-06-18`): `tools/list` advertises `outputSchema` and `tools/call` returns `structuredContent` for tools that declare an output; `Mcp-Session-Id` is required and validated on post-initialize requests; the negotiated `protocolVersion` is echoed; cross-site `Origin` requests are refused.
- Doc grounding relevance is now a corpus-size-stable relative signal (top-hit gap to the runner-up) instead of an absolute BM25 threshold. The per-read result clamp budget derives from the connection's `contextWindowTokens` instead of a hardcoded constant.
- The topical pre-classifier round-trip can be disabled per connection (`disableTopicalClassifier`); the in-prompt off-topic decline then carries it.
- Steering de-duplication: the "when to call this / pass a UUID, not a name" trigger guidance now lives only in the tool `description` (where it travels with the tool), and the chat system prompt's investigation section keeps only cross-tool strategy and the universal id-discipline rule, so the two can no longer drift.
- Tool descriptions are now stable across permission modes: the per-mode note ("(auto-applied...)", "(requires human confirmation...)") is no longer appended to a tool's `description` at wire time. The conversation's mode is conveyed once by the system prompt's permission-mode line, keeping tool identity decoupled from conversation state.
17 changes: 17 additions & 0 deletions .changeset/ai-skill-style-survives-final-step.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
"@checkstack/ai-backend": minor
---

Keep an active chat Skill's voice in force through the final answer.

A user Skill (e.g. "write like a redneck") held during tool-calling steps but
normalized back to professional tone in the synthesized reply. Cause: the
multi-step loop's forced final-answer step (`prepareFinalAnswerStep`) REPLACES
the whole system prompt with a tool-less "write your final answer now, be
concise" instruction - dropping the skill preamble on the exact step that writes
the user-visible answer.

The final-answer step now carries the active skill guidance through (appended
after the base final-answer instruction, so the style is the last thing the model
reads), so the skill's voice governs the synthesized reply too instead of being
silently dropped after tool calls.
41 changes: 41 additions & 0 deletions .changeset/anomaly-default-posture-retune.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
"@checkstack/healthcheck-http-backend": patch
"@checkstack/healthcheck-dns-backend": patch
"@checkstack/healthcheck-grpc-backend": patch
"@checkstack/healthcheck-ping-backend": patch
"@checkstack/healthcheck-tcp-backend": patch
"@checkstack/healthcheck-tls-backend": patch
"@checkstack/healthcheck-redis-backend": patch
"@checkstack/healthcheck-postgres-backend": patch
"@checkstack/healthcheck-mysql-backend": patch
"@checkstack/healthcheck-ssh-backend": patch
"@checkstack/healthcheck-script-backend": patch
"@checkstack/healthcheck-jenkins-backend": patch
"@checkstack/healthcheck-rcon-backend": patch
"@checkstack/collector-hardware-backend": patch
---

Retune anomaly-detection defaults across every health-check strategy and the
hardware collector for a low-noise, problem-focused out-of-the-box experience.

The detection engine already learns a per-metric baseline, debounces with a
confirmation window, and applies practical-significance floors. This pass tunes
the per-metric **defaults** so a fresh install alerts only on genuine,
statistically-significant, problem-mapping deviations instead of flooding on
every metric that wiggles. 264 metrics were reviewed:

- **Default-disabled** the high-noise and un-baselineable classes that were
alerting for no good reason: raw identifiers and counts (status codes, error
and row counts, build counts, player and executor counts), config echoes and
near-constants (probe packet counts, CPU core count, total/swap memory),
payload-size and other run-to-run-volatile values, and deterministic values
like certificate days-remaining (governed by the check's own static-threshold
health logic, not statistics). These stay chartable and can be re-enabled per
field.
- **Hardened** the signals that should alert - latency/response/execution time
and availability/success/saturation percentages - with confirmation windows
and absolute + relative floors so brief spikes and sub-threshold jitter no
longer flap, and prefer percentage metrics over their absolute twins.

No detection-engine or schema changes; only per-metric `x-anomaly-*` defaults.
Users who had opted into any now-disabled metric keep their explicit override.
11 changes: 11 additions & 0 deletions .changeset/api-docs-muted-color-tokens.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
"@checkstack/api-docs-frontend": minor
---

Align the muted/unknown colors in the API docs viewer with semantic design
tokens. The gray fallback states (default access-type icon, unknown user-type
badge, and the `null`/`any`/`unknown` schema-type labels) now use
`text-muted-foreground` / `bg-muted` instead of hardcoded `text-gray-*` /
`bg-gray-*` classes, so they track the theme in light and dark mode. The
intentional categorical palettes (per-user-type badge colors and the
string/number/boolean syntax-highlight colors) are unchanged.
18 changes: 18 additions & 0 deletions .changeset/auth-dialog-form-quality.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
"@checkstack/auth-frontend": minor
---

Improve form quality in auth dialogs (role, scope-to-team, create application).

The Role and Scope-to-team dialog bodies are now wrapped in `<form onSubmit>`
with a `type="submit"` primary button, so pressing Enter submits the dialog
(matching the catalog System editor and Create User dialog). Mandatory fields
carry the `Label required` affordance and native `required`, the first field of
each dialog auto-focuses on open, and the scope-to-team Team / Access level
selects are now associated with their labels via `htmlFor`/`id`.

The Create Application dialog gains native `required` on the name input, a
disabled-until-`name.trim()`-is-non-empty Create button (aligning with the
Create User / System editor pattern), and an auto-focused name field; its body
is wrapped in `<form onSubmit>` so Enter submits. No behavioral change to the
underlying mutations or role/team logic.
7 changes: 7 additions & 0 deletions .changeset/auth-rate-limit-prune-expired.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
"@checkstack/auth-backend": minor
---

Periodically prune expired `better_auth_rate_limit` rows.

better-auth's shared-Postgres brute-force limiter only ever upserts counters, so the `better_auth_rate_limit` table grew one row per distinct `(ip, path)` key forever and nothing ever removed dead rows. auth-backend now schedules an hourly recurring queue job (cron `0 * * * *`, work-queue consumer group) that runs an idempotent `DELETE` of rows whose `lastRequest` is older than a conservative 24h TTL - far past any active limiter window, so a live counter is never removed. The sweep is exposed as `pruneExpiredBetterAuthRateLimits` for reuse and testing. No schema change (pruning is a DELETE). Pod-safe: a single consumer per fire runs the sweep, and the DELETE is shared-DB so duplicate fires are harmless.
11 changes: 11 additions & 0 deletions .changeset/automation-extract-error-message.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
"@checkstack/automation-backend": patch
---

refactor: use `extractErrorMessage` instead of `(error as Error).message`

All 24 `(error as Error).message` casts in `automation-backend`'s dispatch and
entity modules are replaced with the project-wide `extractErrorMessage(error)`
helper from `@checkstack/common`. This removes the unsafe `error as Error`
assumption (the same one the lint-banned `instanceof Error` would make) and
correctly handles non-Error throwables (strings, plain objects) in log output.
30 changes: 30 additions & 0 deletions .changeset/command-palette-navigation-coverage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
"@checkstack/dependency-backend": patch
"@checkstack/status-page-backend": patch
"@checkstack/satellite-backend": patch
"@checkstack/gitops-backend": patch
"@checkstack/secrets-backend": patch
"@checkstack/notification-backend": patch
"@checkstack/script-packages-backend": patch
---

Widen Cmd+K command-palette coverage to every top-level sidebar destination.

The command palette previously only surfaced commands from a handful of plugins,
so large feature areas were silently unreachable from search. Each of these
plugins now registers a "navigate to <feature>" command per top-level route via
`registerSearchProvider`, so every sidebar destination they own is reachable
from Cmd+K (entity search can come later):

- dependency: "Dependency Map"
- status-page: "Status pages"
- satellite: "Satellites"
- gitops: "GitOps", "Kind Registry"
- secrets: "Secrets"
- notification: "Notification Settings"
- script-packages: "Script Packages", "Script Sandbox"

Each command reuses the plugin's own route helper (`resolveRoute`) for its href
and carries the same access rule that gates its sidebar nav entry, so palette
visibility matches sidebar visibility. The notification command carries no
access rule, matching its authenticated-only nav entry.
21 changes: 21 additions & 0 deletions .changeset/confirmation-modal-typed-phrase.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
"@checkstack/ui": minor
"@checkstack/pluginmanager-frontend": patch
---

Fold the typed-phrase confirmation gate into the shared `ConfirmationModal`.

`ConfirmationModal` now accepts an optional `confirmPhrase` (plus
`confirmPhraseLabel` and `confirmPhrasePlaceholder`): when set, it renders an
input and keeps the confirm button disabled until the typed value matches the
phrase exactly. The typed value resets whenever the modal reopens. The `message`
prop is widened from `string` to `React.ReactNode` so callers can pass rich
descriptions; existing string call sites are unaffected. A pure
`isConfirmPhraseSatisfied` predicate backs the enable/disable logic and is unit
tested.

The pluginmanager install and uninstall flows now use `ConfirmationModal` with
`confirmPhrase`, and the parallel hand-rolled `TypedConfirmModal` (which lacked
focus trap, Escape-to-close, scroll-lock, and focus restoration) is removed.
Behavior and UX (phrase gate, danger/warning styling, confirm action) are
preserved, now on the accessible Radix Dialog base.
20 changes: 20 additions & 0 deletions .changeset/consolidate-search-trigger-copy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
"@checkstack/ui": patch
"@checkstack/command-frontend": patch
---

Consolidate the two search-trigger affordances onto a single source of truth.

The hero `CommandPalette` (in `@checkstack/ui`) and the wired navbar trigger
(`NavbarSearch` in command-frontend) had drifted in copy and shortcut-hint
rendering. Both now draw their wording and keyboard hint from one shared place:

- New `SEARCH_TRIGGER_LABEL` / `SEARCH_TRIGGER_PLACEHOLDER` constants and a
platform-aware `SearchShortcutHint` component (⌘K on Mac, Ctrl+K elsewhere) in
`@checkstack/ui`, consumed by both triggers so the copy and shortcut can no
longer diverge.
- The hero placeholder was corrected from the over-promising "Search systems,
incidents, or run commands..." to the accurate "Search and commands...", and
it now renders the same Mac/non-Mac shortcut hint the navbar uses.

No behavioral change to the global Cmd/Ctrl+K listener.
41 changes: 41 additions & 0 deletions .changeset/crypto-auth-depth-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
"@checkstack/backend-api": minor
"@checkstack/auth-backend": minor
"@checkstack/satellite-backend": patch
---

fix(security): crypto + auth depth hardening (at-rest encryption, brute-force scale, token timing)

Three concrete defects found and fixed during the deferred crypto + auth depth audit:

- **At-rest encryption (`@checkstack/backend-api`)**: AES-256-GCM decrypt now
rejects values whose IV is not exactly 12 bytes or whose auth tag is not the
full 16 bytes (128-bit). GCM accepts truncated tags, which weaken forgery
resistance; the encryptor only ever emits full tags, so short tags now hard-
error instead of being silently accepted. `isEncrypted` is also tightened to
require the exact decoded IV/tag lengths, not just a loose
`base64:base64:base64` shape, so a plaintext secret that merely resembles the
shape can no longer be misclassified as "already encrypted" and stored in
plaintext. The unique-nonce and tamper-rejection guarantees are now covered by
regression tests.

- **Brute-force protection scale bug (`@checkstack/auth-backend`)**: better-auth's
built-in rate limiter (sign-in, password reset) defaulted to per-pod in-memory
storage. With N replicas behind one database that multiplied the effective
limit by N (state-and-scale §14.5). The limiter is now backed by a shared
`better_auth_rate_limit` Postgres table via a `customStorage` adapter, so the
counter is global across all pods. Adds a new append-only migration for the
table. No behaviour change in local dev (limiter stays off when not in
production); no configuration required.

- **Satellite token timing oracle (`@checkstack/satellite-backend`)**:
`validateToken` previously skipped the bcrypt verify when the `clientId` did
not exist, leaking client-ID existence via response timing. It now always
verifies the supplied token (against a decoy hash when the row is missing) so
the missing-clientId path costs the same as the wrong-token path.

Audited and found clean (no change needed): the better-auth cookie/session/CSRF
posture (`httpOnly`, `sameSite=lax`, `Secure` derived from the https `BASE_URL`,
single trusted origin, fresh session on internal trusted-login), and
token/secret logging hygiene across the auth, satellite, and secrets paths (no
secret material is logged).
13 changes: 13 additions & 0 deletions .changeset/dependency-editor-form-quality.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
"@checkstack/dependency-frontend": patch
---

Improve form quality and accessibility of the dependency editor.

The "Depends on (upstream)" system picker and the impact-type select are now
associated with proper `<Label htmlFor>`/`id` pairings, so clicking a label
focuses its control and assistive tech announces the field name. Both mandatory
fields carry the `required` affordance (visible `*` plus screen-reader
"(required)"). Opening the add-dependency panel now autofocuses the system
picker so keyboard users can start selecting immediately. No behavioral change
beyond focus, labeling, and the required marker.
Loading
Loading