diff --git a/deploy/observability/AI_GOVERNANCE_DASHBOARD.md b/deploy/observability/AI_GOVERNANCE_DASHBOARD.md index 2df200e..40fa98c 100644 --- a/deploy/observability/AI_GOVERNANCE_DASHBOARD.md +++ b/deploy/observability/AI_GOVERNANCE_DASHBOARD.md @@ -135,13 +135,14 @@ traffic flows; this is expected, not an error: `aibridge_user_prompts` / `aibridge_tool_usages` yet). - **Firewall Sessions** (stat and table) read 0 (no rows in `boundary_sessions` yet). -- The Agent Firewall log stream currently carries only Boundary proxy lifecycle - lines. The upstream `coder/observability` boundary dashboard parses - `boundary_request` allow / deny audit events from the `coderd.agentrpc` - logger; those events are not emitted in this stack until egress traffic is - audited, so allow / deny breakdown panels are intentionally not included here - yet. They become populatable once Boundary audits real egress (and would also - benefit from a newer Coder that logs `boundary_request`). +- The Agent Firewall log stream (namespace `coder-workspaces`) carries Boundary + proxy lifecycle lines. The allow / deny audit breakdown is driven separately + by coderd's structured `boundary_request` log lines (namespace `coder`), + which Loki ingests as JSON with the audit fields nested under `fields`. The + Agent Firewall dashboard parses them with the LogQL `json` parser + (`fields.decision`, `fields.owner`, `fields.http_url`, and related fields), + so the **Egress Audit (allow / deny)** panels show live data while the + firewalled workspaces generate egress. Panels that already have data: provider health and inventory, total interceptions, active sessions, unique users, interceptions by provider / model / diff --git a/deploy/observability/dashboards-boundary.yaml b/deploy/observability/dashboards-boundary.yaml index b6ca942..58b83a6 100644 --- a/deploy/observability/dashboards-boundary.yaml +++ b/deploy/observability/dashboards-boundary.yaml @@ -22,9 +22,12 @@ # label here; the boundary_request line filter is the reliable narrow. # - Helm template includes resolved to literals (non-workspace-selector, # dashboard-range, dashboard-refresh) since this is rendered JSON, not a chart. -# - The upstream allow/deny audit panels need live audited egress to populate; -# boundary_request events are not emitted in this stack yet (placeholder AI -# key, no audited egress, older Coder). They are correct, not broken. +# - The allow/deny audit panels parse coderd's structured boundary_request log +# lines from Loki. coderd emits each line as JSON with the audit fields +# nested under "fields", so the LogQL uses the `json` parser (extracting +# fields.decision, fields.owner, fields.http_url, etc.) rather than `logfmt`, +# and the domain regexp matches the JSON-quoted http_url value. The +# firewalled workspaces emit live allow and deny events, so these populate. # - Operations row (batch counters, active agents, sessions, proxy log stream) # is retained from the in-repo dashboard so the view has populated panels # today: Prometheus (agent_boundary_*) and Loki proxy lifecycle have data; @@ -162,7 +165,7 @@ data: }, "options": { "mode": "markdown", - "content": "**Agent Firewall** (Coder's Boundary) audits and controls outbound network activity from Coder workspaces, giving security teams visibility into what AI agents reach.\n\n- **Egress Audit (allow / deny)**: per-request allow and deny decisions parsed from Loki `boundary_request` audit events (adapted from the upstream coder/observability boundary dashboard). These populate once Boundary audits real egress; until then they read empty by design, not error.\n- **Agent Firewall Operations**: forwarded log-proxy batch counters and active agents from Prometheus (agent_boundary_*), Boundary sessions from the Coder database, and the live proxy log stream from Loki.\n\nDisplay terminology is product-facing (\"Agent Firewall\"); PromQL still references agent_boundary_*, LogQL still matches the literal log text \"boundary\" / \"boundary_request\", and the Coder database keeps its boundary_* table names." + "content": "**Agent Firewall** (Coder's Boundary) audits and controls outbound network activity from Coder workspaces, giving security teams visibility into what AI agents reach.\n\n- **Egress Audit (allow / deny)**: per-request allow and deny decisions parsed from Loki `boundary_request` audit events (adapted from the upstream coder/observability boundary dashboard). These show live allow and deny decisions parsed from coderd's structured boundary_request events; the firewalled workspaces are actively audited.\n- **Agent Firewall Operations**: forwarded log-proxy batch counters and active agents from Prometheus (agent_boundary_*), Boundary sessions from the Coder database, and the live proxy log stream from Loki.\n\nDisplay terminology is product-facing (\"Agent Firewall\"); PromQL still references agent_boundary_*, LogQL still matches the literal log text \"boundary\" / \"boundary_request\", and the Coder database keeps its boundary_* table names." } }, { @@ -260,7 +263,7 @@ data: }, "direction": "backward", "editorMode": "code", - "expr": "sum by (decision) (count_over_time({namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | logfmt | decision=~`deny|allow` | owner=~`$owner` | domain=~`$domain` | template_id=~`$template_id` | template_version_id=~`$template_version_id` [$__range]))", + "expr": "sum by (decision) (count_over_time({namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | json decision=`fields.decision`, owner=`fields.owner`, template_id=`fields.template_id`, template_version_id=`fields.template_version_id` | decision=~`deny|allow` | owner=~`$owner` | template_id=~`$template_id` | template_version_id=~`$template_version_id` | regexp `\"http_url\":\"(?Phttps?)://(?P[^/:\"]+)` | domain=~`$domain` [$__range]))", "queryType": "range", "refId": "A" } @@ -339,7 +342,7 @@ data: }, "direction": "backward", "editorMode": "code", - "expr": "topk(20, sum by (domain) (count_over_time({namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | logfmt | decision=`allow` | owner=~`$owner` | template_id=~`$template_id` | template_version_id=~`$template_version_id` | regexp `http_url=(?Phttps?)://(?P[^/:]+)` | domain=~`$domain` [$__auto])))", + "expr": "topk(20, sum by (domain) (count_over_time({namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | json decision=`fields.decision`, owner=`fields.owner`, template_id=`fields.template_id`, template_version_id=`fields.template_version_id` | decision=`allow` | owner=~`$owner` | template_id=~`$template_id` | template_version_id=~`$template_version_id` | regexp `\"http_url\":\"(?Phttps?)://(?P[^/:\"]+)` | domain=~`$domain` [$__auto])))", "legendFormat": "", "queryType": "instant", "refId": "A" @@ -448,7 +451,7 @@ data: }, "direction": "backward", "editorMode": "code", - "expr": "topk(20, sum by (domain) (count_over_time({namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | logfmt | decision=`deny` | owner=~`$owner` | template_id=~`$template_id` | template_version_id=~`$template_version_id` | regexp `http_url=(?Phttps?)://(?P[^/:]+)` | domain=~`$domain` [$__auto])))", + "expr": "topk(20, sum by (domain) (count_over_time({namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | json decision=`fields.decision`, owner=`fields.owner`, template_id=`fields.template_id`, template_version_id=`fields.template_version_id` | decision=`deny` | owner=~`$owner` | template_id=~`$template_id` | template_version_id=~`$template_version_id` | regexp `\"http_url\":\"(?Phttps?)://(?P[^/:\"]+)` | domain=~`$domain` [$__auto])))", "legendFormat": "", "queryType": "instant", "refId": "A" @@ -550,7 +553,7 @@ data: }, "direction": "backward", "editorMode": "code", - "expr": "{namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | logfmt | decision=`allow` | owner=~`$owner` | template_id=~`$template_id` | template_version_id=~`$template_version_id` | regexp `http_url=https?://(?P[^/?# ]+)(?P/[^?# ]*)?` | domain=~`$domain` | line_format `time=\"{{ .event_time }}\" method=\"{{ .http_method }}\" domain=\"{{ .domain }}\" path=\"{{ .path }}\" owner=\"{{ .owner }}\" workspace_name=\"{{ .workspace_name }}\" template_id=\"{{ .template_id }}\" template_version_id=\"{{ .template_version_id }}\"`", + "expr": "{namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | json decision=`fields.decision`, owner=`fields.owner`, workspace_name=`fields.workspace_name`, template_id=`fields.template_id`, template_version_id=`fields.template_version_id`, http_method=`fields.http_method`, event_time=`fields.event_time` | decision=`allow` | owner=~`$owner` | template_id=~`$template_id` | template_version_id=~`$template_version_id` | regexp `\"http_url\":\"https?://(?P[^/?#:\" ]+)(?P/[^?#\" ]*)?` | domain=~`$domain` | line_format `time=\"{{ .event_time }}\" method=\"{{ .http_method }}\" domain=\"{{ .domain }}\" path=\"{{ .path }}\" owner=\"{{ .owner }}\" workspace_name=\"{{ .workspace_name }}\" template_id=\"{{ .template_id }}\" template_version_id=\"{{ .template_version_id }}\"`", "queryType": "range", "refId": "A" } @@ -687,7 +690,7 @@ data: }, "direction": "backward", "editorMode": "code", - "expr": "{namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | logfmt | decision=`deny` | owner=~`$owner` | template_id=~`$template_id` | template_version_id=~`$template_version_id` | regexp `http_url=https?://(?P[^/?# ]+)(?P/[^?# ]*)?` | domain=~`$domain` | line_format `time=\"{{ .event_time }}\" method=\"{{ .http_method }}\" domain=\"{{ .domain }}\" path=\"{{ .path }}\" owner=\"{{ .owner }}\" workspace_name=\"{{ .workspace_name }}\" template_id=\"{{ .template_id }}\" template_version_id=\"{{ .template_version_id }}\"`", + "expr": "{namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | json decision=`fields.decision`, owner=`fields.owner`, workspace_name=`fields.workspace_name`, template_id=`fields.template_id`, template_version_id=`fields.template_version_id`, http_method=`fields.http_method`, event_time=`fields.event_time` | decision=`deny` | owner=~`$owner` | template_id=~`$template_id` | template_version_id=~`$template_version_id` | regexp `\"http_url\":\"https?://(?P[^/?#:\" ]+)(?P/[^?#\" ]*)?` | domain=~`$domain` | line_format `time=\"{{ .event_time }}\" method=\"{{ .http_method }}\" domain=\"{{ .domain }}\" path=\"{{ .path }}\" owner=\"{{ .owner }}\" workspace_name=\"{{ .workspace_name }}\" template_id=\"{{ .template_id }}\" template_version_id=\"{{ .template_version_id }}\"`", "queryType": "range", "refId": "A" }