Skip to content

fix(deploy/observability): parse boundary_request JSON in firewall dashboard#43

Open
ausbru87 wants to merge 1 commit into
mainfrom
ws-2x/fix-boundary-dashboard
Open

fix(deploy/observability): parse boundary_request JSON in firewall dashboard#43
ausbru87 wants to merge 1 commit into
mainfrom
ws-2x/fix-boundary-dashboard

Conversation

@ausbru87

@ausbru87 ausbru87 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

The Agent Firewall (Coder Boundary) Grafana dashboard showed "No data" on its egress audit panels even though coderd is actively emitting boundary_request events for the austenplatform/firewall-test workspace.

Root cause: the five Loki audit panels parsed the log line with | logfmt, but coderd emits boundary_request as JSON with the audit fields nested under fields (for example fields.decision, fields.http_url). logfmt extracted nothing, so the decision, owner, template_id, and template_version_id label filters never matched and every panel returned empty. The domain/path regexp also assumed the logfmt http_url= form, which does not exist in the JSON line.

Changes

  • deploy/observability/dashboards-boundary.yaml: switch the five audit panels (Request Totals, Top Allowed Domains, Top Denied Domains, Most recent allowed requests, Most recent denied requests) from | logfmt to the LogQL json parser with explicit field extraction (decision=fields.decision, `owner=`fields.owner, etc.). Update the domain/path regexp to match the JSON-quoted http_url value. Label names are preserved, so the existing field overrides, transformations, and line_format templates need no changes.
  • Correct the stale "not emitted yet / read empty by design" wording in the dashboard header text panel, the YAML comment block, and deploy/observability/AI_GOVERNANCE_DASHBOARD.md.

No datasource, scrape, or promtail changes were needed: the datasource UIDs (loki, prometheus, aibridge-postgres) already match, Loki ingests coderd logs under namespace="coder", and the Prometheus throughput metric agent_boundary_log_proxy_batches_forwarded_total is scraped and correct.

Root cause and live verification evidence

Live boundary_request line as stored in Loki (JSON, fields nested):

{"ts":"...","msg":"boundary_request","fields":{"owner":"austenplatform","workspace_name":"firewall-test","decision":"allow","http_url":"https://...","http_method":"POST",...}}

Broken vs fixed Loki query (1h range, firewall-test):

# BROKEN (old): returns []
sum by (decision) (count_over_time({namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | logfmt | decision=~`deny|allow` [1h]))

# FIXED (new): returns allow + deny series
sum by (decision) (count_over_time({namespace=~`(coder|coder-workspaces)`} |= `boundary_request` | json decision=`fields.decision` | decision=~`deny|allow` [1h]))

End-to-end validation on the usgov-coderdemo stack:

  • Embedded dashboard JSON parses cleanly (16 panels); Grafana accepted a temp import of the corrected dashboard.
  • Replaying the corrected stored exprs against Loki returned data for all five panels, e.g. Most recent denied requests parsed GET raw.githubusercontent.com/anthropics/claude-code/.../CHANGELOG.md (deny).
  • The live provisioned dashboard was updated (ConfigMap applied, sidecar reloaded): 0 audit panels on logfmt, 5 on json.
  • Grafana's own /api/ds/query datasource proxy rendered decision=allow (value 3) and decision=deny (value 19) for the corrected dashboard.

Generated by Coder Agents, on behalf of @ausbru87.

…shboard

The Agent Firewall dashboard's Loki audit panels showed "No data" because
they parsed coderd's boundary_request lines with `| logfmt`. coderd emits
those lines as JSON with the audit fields nested under "fields", so logfmt
extracted nothing and the decision/owner/template filters never matched.

Switch the five audit panels (Request Totals, Top Allowed/Denied Domains,
Most recent allowed/denied requests) to the LogQL `json` parser with explicit
field extraction (decision=`fields.decision`, owner=`fields.owner`, ...), and
update the domain/path regexp to match the JSON-quoted `http_url` value rather
than the logfmt `http_url=` form. Label names are preserved, so panel
overrides, transformations, and line_format templates are unchanged.

Also correct the stale "not emitted yet / read empty by design" wording in the
dashboard header and AI_GOVERNANCE_DASHBOARD.md: boundary_request events are
emitted and the panels now show live allow/deny data.

Verified live against the usgov-coderdemo stack: Loki returns allow and deny
series for the firewall-test workspace, and Grafana's datasource proxy renders
allow=3 / deny=19 for the corrected dashboard.

Generated by Coder Agents.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant