Skip to content

[maestrod] Add Grafana dashboard (per-route AI token usage + RED + vision)#146

Open
olihou wants to merge 6 commits into
masterfrom
maestrod-grafana-dashboard
Open

[maestrod] Add Grafana dashboard (per-route AI token usage + RED + vision)#146
olihou wants to merge 6 commits into
masterfrom
maestrod-grafana-dashboard

Conversation

@olihou

@olihou olihou commented Jun 17, 2026

Copy link
Copy Markdown

What

Adds a Grafana dashboard to the maestrod chart, mirroring the document-engine Grafana-sidecar pattern (the serviceMonitor already exists).

  • dashboards/maestrod-single-namespace.json — 14 panels across four rows:
    • AI token usage — per-route token rate + range totals by route/model (nutrient.ai.tokens_total), MEAI token distribution, AI call latency p50/p95, attempts/empty/warnings. Token counts only — no cost/pricing applied.
    • Per-route HTTP RED — request rate, error ratio, p95 latency per /run/* route.
    • Vision quality & throughput — document throughput, words/zones per page, classification confidence, artifact export latency.
    • Process health — working-set memory + CPU only (no GC/JIT/threadpool noise).
  • templates/monitoring/grafana-dashboard.ConfigMap.yaml — ConfigMap labeled grafana_dashboard: "1" for sidecar auto-discovery; .Files.Get + placeholder replace + indent 4.
  • values.yaml / values.schema.jsonobservability.metrics.grafanaDashboard block (disabled by default): enabled, configMap.labels, title, tags.
  • README.md — regenerated via helm-docs for the new values.
  • Chart bump 0.6.2 → 0.7.0 + CHANGELOG entry.

Panel labels are written to be self-explanatory: every timeseries carries a descriptive y-axis label (tokens/s, latency (s), error ratio, requests/s, cores, count / page, …) and the per-route token table renames raw Prometheus labels to readable column headers (Route / Model / Token type / Tokens (range total)).

How to enable

  1. observability.metrics.serviceMonitor.enabled=true (Prometheus scrapes /metrics).
  2. observability.metrics.grafanaDashboard.enabled=true (renders the labeled ConfigMap).
  3. A Grafana instance with the dashboard sidecar watching the grafana_dashboard label auto-imports it.

Validation

No helm binary on the authoring host — validated by parsing values.schema.json and simulating the ConfigMap's placeholder substitution (incl. the tags join): the rendered dashboard is valid JSON, all 14 panels reference real metric families emitted by maestrod's /metrics. README.md regenerated with helm-docs 1.14.2 so the generate CI check passes.

Notes

  • The per-route token metric (nutrient.ai.tokens_total with an operation label) is added by the companion daemon PR: PSPDFKit/GdPicture#2804. The two per-route token panels stay empty until a daemon build carrying that change is deployed; the by-model MEAI token panel (gen_ai_client_token_usage) works today.
  • Dashboard authored against schemaVersion 39; Grafana 12 auto-migrates on import. Not yet visually QA'd in a live Grafana — worth an import-and-tweak pass on layout if desired.

Companion

Daemon-side metric: PSPDFKit/GdPicture#2804.

🤖 Generated with Claude Code

olihou and others added 2 commits June 17, 2026 18:10
…sion)

Mirror the document-engine Grafana-sidecar pattern for the maestrod chart:

- dashboards/maestrod-single-namespace.json: 18-panel dashboard across
  four rows — AI tokens & cost (per-route nutrient.ai.tokens_total, MEAI
  token distribution, AI latency p50/p95, attempts/empty/warnings),
  per-route HTTP RED, vision quality/throughput, and process health
  (working-set memory + CPU only). Uses the standard placeholder tokens.
- templates/monitoring/grafana-dashboard.ConfigMap.yaml: ConfigMap labeled
  grafana_dashboard:"1" for sidecar auto-discovery; replaces placeholders
  and indents the dashboard JSON.
- values.yaml + values.schema.json: observability.metrics.grafanaDashboard
  block (disabled by default) with configMap.labels, title, tags.
- Bump chart 0.6.2 -> 0.7.0; CHANGELOG entry.

Requires the existing serviceMonitor (or another scrape path) and a Grafana
sidecar watching the grafana_dashboard label.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
helm-docs output for the new observability.metrics.grafanaDashboard
block; fixes the `generate` CI check (README drift).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@olihou olihou marked this pull request as ready for review June 18, 2026 06:46
@olihou olihou requested a review from tomassurin June 18, 2026 14:53
olihou and others added 3 commits June 18, 2026 21:57
- Rename the per-route token table's raw label columns to readable
  headers (operation->Route, gen_ai_request_model->Model,
  gen_ai_token_type->Token type, Value->Tokens (range total)); drop the
  Time column and order them.
- Add a descriptive y-axis label to every timeseries panel (tokens/s,
  latency (s), error ratio, requests/s, cores, count / page, …) so each
  graph states what it plots alongside the unit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The dashboard never computed price — rename the "AI tokens & cost" row
to "AI token usage" and reword the range-total panel to make clear it
shows token counts only, no cost/pricing applied.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tomassurin tomassurin requested a review from lazyoldbear June 18, 2026 19:59
data:
maestrod-{{ printf "%s-%s" .Release.Namespace .Release.Name }}.json: |-
{{ .Files.Get "dashboards/maestrod-single-namespace.json"
| replace "<<<<DASHBOARD_TITLE>>>>" (tpl .Values.observability.metrics.grafanaDashboard.title $)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observability.metrics.grafanaDashboard.title and tags are user-facing values, but values containing JSON-significant characters break the dashboard payload. A valid Helm value such as title: Maestrod "prod" or a tag containing " renders a ConfigMap whose JSON cannot be imported by the Grafana sidecar.

Evidence: the template inserts raw strings into JSON placeholders via replace/join, while the dashboard keeps those placeholders inside JSON string/array literals.

Fix: JSON-encode these values instead of raw replacement. For example, replace the whole tags literal with a toJson value, and render the title placeholder from an escaped/encoded string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants