☸ Collect Kubernetes logs and cluster context into one archive
Repo: github.com/hrodrig/groot · Releases: GitHub Releases · Spec: docs/SPECIFICATIONS.md · Deploy: deploy/ · Changelog: CHANGELOG.md · Roadmap: docs/ROADMAP.md · Article: GROOT on DEV — one archive for cluster diagnostics
Terminal demo recorded with VHS. Regenerate: make install && bash -c "vhs docs/demo.tape" · docs/demo.tape
GROOT is a read-only Kubernetes log and context collector: a single groot collect pulls pod logs, control-plane logs, events, and selected API snapshots—in parallel, from YAML—and packs them into one .tar.gz. It does not analyze the cluster or render a diagnosis; it archives evidence you can attach to tickets, hand to teammates, or retain for compliance.
That workflow supports incident response, troubleshooting, and root cause analysis (RCA): one reproducible bundle replaces scattered kubectl copy-paste, so you can reconstruct what the cluster looked like when you ran collect and shorten postmortems.
- README badge reference
- Specifications (behavior contract)
- Roadmap (planned work)
- Deploy (Helm / CronJob)
- Features
- Requirements
- Install or update
- Quick start
- First run
- Usage examples
- Config
- Configuration reference (all keys)
- Resolution and precedence
- Output naming
- Console output modes
- Typical collected data
- Notifications
- Upload (S3 / GCS)
- In-cluster deploy (Helm / CronJob)
- Secret redaction
- Rootless container
- Security note
- Get involved
- License
- Cobra CLI with
collectcommand - Viper YAML config + environment variable override
- Concurrent Kubernetes API calls for faster collection
- Worker/node and control plane oriented log gathering
- Output folder +
.tar.gzarchive generation - Optional notifications (Slack, Discord, Teams, PagerDuty, Telegram, email/SMTP, generic webhooks with templates and HMAC)
- Notify on failure (collect abort or partial job failures above a threshold)
- HTTP retry/backoff for transient webhook errors
- Optional secret redaction in collected log files
- Helm chart and flat CronJob manifests for scheduled in-cluster collection (
deploy/) - Rootless container image support
Libraries (see go.mod): Cobra v1.10.2, Viper v1.21.0, client-go for cluster access (no kubectl binary required).
- A valid kubeconfig (or in-cluster config) and network reachability to the Kubernetes API
- RBAC permissions to read logs/resources
- Go 1.26+ if you build from source (
make build) or usego installfor the CLI
Pre-built .deb, .rpm, .tar.gz (and .zip on Windows) are on GitHub Releases and latest release. The release badge at the top of this README shows the current tag at a glance.
Why not a single latest URL for every file? GitHub’s …/releases/latest/download/<file> only works if the asset filename is identical on every release. From v0.6.0 onward, basenames embed the git tag (for example groot_v0.6.0_amd64.deb), matching the URL path (…/download/v0.6.0/groot_v0.6.0_amd64.deb). Older releases used groot_0.5.0_* (no v in the basename). Options: pick names from the release page, use the snippet below, or use the badge.
# Latest published release tag (python3 or jq). Basename uses the tag (v0.6.0+).
TAG="$(curl -fsSL https://api.github.com/repos/hrodrig/groot/releases/latest | python3 -c 'import json,sys; print(json.load(sys.stdin)["tag_name"])')"
# Alternative: TAG="$(curl -fsSL https://api.github.com/repos/hrodrig/groot/releases/latest | jq -r .tag_name)"
[ -n "$TAG" ] || { echo "Could not resolve tag (empty). Install python3 or jq, or set TAG manually from the Releases page." >&2; exit 1; }
DEB="groot_${TAG}_amd64.deb"
URL="https://github.com/hrodrig/groot/releases/download/${TAG}/${DEB}"
TMP="/tmp/${DEB}"
# Download to /tmp so user _apt can read the file (apt often cannot read ~/.deb when $HOME is mode 700).
if ! curl -fsSL "$URL" -o "$TMP"; then
echo "Download failed (curl exit $?). Check URL: $URL" >&2
exit 1
fi
if [ ! -f "$TMP" ]; then
echo "Expected $TMP after download — not found." >&2
exit 1
fi
sudo apt install "$TMP"Paste the block as a whole, or chain with &&, so apt does not run after a failed curl. curl -f exits non‑zero on HTTP errors (404, etc.).
apt + _apt / “Permission denied” under $HOME: if you curl the .deb into ~ and run sudo apt install ./groot_….deb, Debian/Ubuntu may warn that _apt cannot read the file (home directory not world-executable). Use /tmp as above, or sudo cp "$DEB" /tmp/ then sudo apt install "/tmp/$DEB".
404 on groot_v0.1.6_amd64.deb: the file on GitHub is groot_0.1.6_amd64.deb (no v in the basename). Empty TAG: if jq/python3 failed, you get .../download//groot__amd64.deb and ./groot__amd64.deb from apt.
groot is installed to /usr/bin. The package drops a sample at /etc/groot/groot.yml.sample (from configs/groot.yml.sample in the repo) as a template; it is not read unless you pass --config. With no --config, discovery is ./groot.yml, then ~/.groot/groot.yml, then /etc/groot/groot.yml, then built-in defaults. Use a per-user file under ~/.groot/, sudo cp /etc/groot/groot.yml.sample /etc/groot/groot.yml for a machine-wide config, or --config /path/to/file.yaml. Use arm64 in the download filename on ARM64.
| Format | Example (tag v0.6.0 in URL path and basename since v0.6.0) |
|---|---|
.deb |
curl -fsSL -o /tmp/groot_v0.6.0_amd64.deb https://github.com/hrodrig/groot/releases/download/v0.6.0/groot_v0.6.0_amd64.deb then sudo apt install /tmp/groot_v0.6.0_amd64.deb (use /tmp so _apt can read the file if $HOME is 700) |
.rpm |
curl -fsSLO https://github.com/hrodrig/groot/releases/download/v0.6.0/groot_v0.6.0_amd64.rpm then sudo rpm -Uvh groot_v0.6.0_amd64.rpm or sudo dnf install ./groot_v0.6.0_amd64.rpm |
.tar.gz |
curl -fsSLO https://github.com/hrodrig/groot/releases/download/v0.6.0/groot_v0.6.0_linux_amd64.tar.gz then tar xzf groot_v0.6.0_linux_amd64.tar.gz and run ./groot inside the extracted directory |
Update: download a newer release and run the same install command again (rpm -Uvh, apt install over the .deb, or replace the tarball tree).
Basename change in 0.6.x: release artifacts switch from groot_0.5.0_* to groot_v0.6.0_* (v-prefixed tag in the basename, matching pgwd/kzero). Scripts that used VER="${TAG#v}" and groot_${VER}_* must use groot_${TAG}_* instead.
Supply chain (v0.6.0+): each release attaches SPDX and CycloneDX SBOMs, Cosign signatures for checksums.txt and GHCR images. Verify checksums with cosign verify-blob (see release assets); verify images with cosign verify --certificate-identity-regexp=… ghcr.io/hrodrig/groot:v0.6.0.
Windows: use the .zip asset for your arch, unpack, and run groot.exe on a host that can reach the Kubernetes API with a valid kubeconfig (or in-cluster credentials).
Then configure and run groot collect (or groot --print-sample-config > groot.yml first).
Build from a clone of this repository:
make build
./bin/groot --print-sample-config > groot.yml
# Edit groot.yml: replace sample values with your cluster settings (namespaces, targets,
# kubeconfig, output paths, optional notify webhooks/tokens) before collecting.
./bin/groot collectIf you installed from a release package, use groot on your PATH instead of ./bin/groot.
brew tap hrodrig/groot
brew install --cask hrodrig/groot/grootUpgrading keeps the same tap; new releases are picked up automatically:
brew upgrade --cask hrodrig/groot/grootThe cask installs the groot binary to $(brew --prefix)/bin/groot and adds it to your PATH (already on it in default Homebrew setups). A sample config is not bundled with the cask; generate it with groot --print-sample-config > ~/.config/groot/groot.yml and edit.
The tap repo lives at github.com/hrodrig/homebrew-groot (
Casks/groot.rb). GoReleaser updates it on every tag via thehomebrew_casks:stanza in.goreleaser.yaml. Add CI secretHOMEBREW_TAP_TOKEN(PAT withreposcope on the tap) — or set--skip=homebrew_casksin the release job and runscripts/update-homebrew-cask.shagainst a local clone of the tap.
From any machine with Go 1.26+ (installs to $(go env GOPATH)/bin; ensure that directory is on your PATH):
go install github.com/hrodrig/groot/cmd/groot@latestUse a release tag instead of @latest if you want a pinned version (for example @v0.6.0). Documentation for the module: pkg.go.dev/github.com/hrodrig/groot.
Useful runtime flags (global or with collect):
--versionprints version, commit, branch, and build date--test-connectionvalidates Kubernetes connectivity and exits--verboseshows each executed command asCMD, plusOK/ERRresults--quietsuppresses normal console output (INFO/WARN/CMD/OK) and only prints errors; notify integrations still run (Slack, Discord, Teams, PagerDuty, Telegram, generic) unless you disable them in config or use--no-notify--no-notifyskips all notifications after a successful collect (useful for cron when you only want the archive). Same effect as envGROOT_NO_NOTIFY=1(ortrue/yes, case-insensitive)--no-uploadskips post-collect S3/GCS upload whenupload.enabledis true. Same effect as envGROOT_NO_UPLOAD=1--no-colordisables ANSI colors--message "label text"appends a sanitized suffix to archive and capture-related output names--kubeconfig /path/to/configoverrides kubeconfig from file/envcollectonly:--sincelimits pod log collection to lines newer than a duration (same semantics as the Kubernetes--sincefilter on pod logs). A bare number is treated as hours (for example--since=24→24h). Other forms follow Go durations (24h,45m,90s). Overridescollection.pod_logs_sincefrom config when passed.
If you do not have a config file yet, print a sample and save it:
./bin/groot --print-sample-config > groot.ymlThe sample YAML is written to standard output, so shell redirection (>) works as shown. If you use an older groot binary where > produced an empty file, redirect stderr instead: groot --print-sample-config 2> groot.yml.
The generated file is a template only. Open groot.yml and set your own values for your environment—for example kubeconfig (if not using the default), collection.namespaces, workloads under collection.targets (deployments, StatefulSets, DaemonSets, Jobs, CronJobs, Helm releases), output_dir / file_prefix, collection.redact_secrets, and any notify.* URLs or secrets. Until you do, the sample names and disabled notification blocks will not match a real cluster.
Then run:
./bin/groot collectDefault config discovery order (when --config is not provided). The first existing file wins; if none exist, built-in defaults apply, then GROOT_* environment variables override where applicable:
./groot.yml~/.groot/groot.yml/etc/groot/groot.yml- built-in defaults (then
GROOT_*env overrides where applicable)
The .deb / .rpm sample at /etc/groot/groot.yml.sample is not part of this chain; copy it to groot.yml or pass --config /etc/groot/groot.yml.sample explicitly.
You can always override file discovery with --config (see Usage examples).
Paths below use ./bin/groot after make build; if you installed from Releases or make install, use groot on your PATH the same way (for example groot collect ...).
Paths ./groot.yml, ~/.groot/groot.yml, and /etc/groot/groot.yml are discovered automatically (see First run). Any other path—including /etc/groot/groot.yml.sample—must be passed explicitly:
./bin/groot collect --config /path/to/my-groot.yml
./bin/groot collect --config ./groot-mi-test.ymlFrom the repository root, after editing your copy:
./bin/groot collect --config groot-mi-test.yml./bin/groot --config ./groot-mi-test.yml --test-connection
./bin/groot collect --config ./groot-mi-test.yml --test-connectionConsole only (Slack/Discord/etc. still run if enabled in YAML):
./bin/groot collect --config /path/to/groot.yml --quietSkip all notify channels for this run (archive still created); same as env GROOT_NO_NOTIFY=1 / true / yes:
./bin/groot collect --config /path/to/groot.yml --quiet --no-notify
0 * * * * GROOT_NO_NOTIFY=1 /usr/local/bin/groot collect --config /home/you/.groot/prod.yml --quietPrint API jobs and output paths without writing disk or sending notify:
./bin/groot collect --config groot.yml --list-jobs
# pod-logs-default-api -> default/my-pod__node-1.log args=[logs -n default my-pod --all-containers]When notify.on_failure.enabled: true, Groot can alert on abort (archive error, timeout, …) or when failed >= min_failed_jobs on a completed run (in addition to the normal success notify). Respects --no-notify.
notify:
slack:
enabled: true
webhook_url: "https://hooks.slack.com/services/..."
on_failure:
enabled: true
on_abort: true
min_failed_jobs: 2Scan collected *.log files and replace likely secrets before archiving:
collection:
redact_secrets: true
redact_patterns:
- '(?i)my-internal-token\s*=\s*\S+'./bin/groot collect --config groot.yml --message "staging-network-audit-2026-04-28"Same as collection.pod_logs_since in YAML; bare number = hours (here, last 24 hours of pod logs):
./bin/groot collect --config groot.yml --since=24Empty *.log files are normal when you narrow the window: with --since, the API only returns lines newer than that duration. If a pod was quiet during the window, Groot still writes the file (often 0 bytes) — that is not a Groot bug, it means no stdout/stderr in that interval. Widen the window, drop --since for a test run, or raise pod_log_tail_lines to confirm the workload emitted output during the capture.
./bin/groot collect --config groot.yml --kubeconfig /path/to/other-kubeconfigEdit groot.yml (or any file passed with --config) and align every section with your cluster and operational needs. Do not rely on the shipped sample as a drop-in configuration.
The annotated template (every key explained in comments) is configs/groot.yml.sample — identical to groot --print-sample-config output. Use that file when you want line-by-line guidance next to the YAML.
Sample config (abbreviated; full annotated template in configs/groot.yml.sample — same as groot --print-sample-config):
kubeconfig: ""
output_dir: "./out"
file_prefix: "groot-capture"
collection:
timeout: 20m
worker_concurrency: 6
namespaces:
- kube-system
- default
targets:
default:
deployments:
- api
jobs:
- batch-import
cronjobs:
- nightly-sync
include_pod_logs: true
pod_log_tail_lines: 1500
# pod_logs_since: "24"
include_node_details: true
include_node_logs: true
include_pod_metrics: true
redact_secrets: false
extra_kubectl:
- "get ingress -A"
- "get pvc -A"
notify:
on_failure:
enabled: false
on_abort: true
min_failed_jobs: 1
retry:
max_attempts: 3
initial_backoff: 1s
max_backoff: 10s
slack:
enabled: false
webhook_url: ""
generic:
enabled: false
webhook_url: ""
json_key: "text"
# body_template: '{"text":"{{summary}}","failed":{{failed}},"event":"{{event}}"}'
# hmac_secret: "" # or env GROOT_NOTIFY_GENERIC_HMAC_SECRET
email:
enabled: false
host: ""
port: 587
from: ""
to: ""| Key | What it does |
|---|---|
kubeconfig |
Path to the kubeconfig file used to build the client-go REST config (same discovery rules as client-go / clientcmd). Empty: use KUBECONFIG if set, then the default kubeconfig locations (for example ~/.kube/config), or in-cluster credentials when Groot runs as a pod. groot --kubeconfig overrides this for a single run (see Resolution and precedence). |
output_dir |
Base directory: each run creates <file_prefix>-<timestamp>[-since-<slug>]/, then <sessionBase>-<cluster>[-<message>].tar.gz beside it. Supports ~ and ${VAR} expansion. |
file_prefix |
Prefix for capture directory and archive basename (default groot-capture). Example session: groot-capture-20260606-120000-my-cluster.tar.gz. |
collection |
Tuning for timeouts, parallelism, namespaces, pod logs, optional extra_kubectl argv lines, redaction, etc. (see below). |
notify |
Optional webhooks, email, and failure alerts after collect (see Notifications). |
Pod ↔ node placement at capture start is in extras/all-pod-node-placement.tsv (fourth column pod_log_file when Groot collects that pod’s log). After all jobs finish, extras/all-pods-rca.tsv merges that placement with cluster-wide pod metrics from metrics.k8s.io (when include_pod_metrics is on — the same snapshot top pods -A would show) so you get namespace, pod, node, cpu_cores, memory_bytes, pod_log_file in one table — cluster-wide and aligned with Groot’s log paths for RCA handoff.
| Key | What it does |
|---|---|
timeout |
Maximum wall time for the whole groot collect run (Go context deadline). |
worker_concurrency |
Number of parallel collection workers (concurrent API jobs). |
namespaces |
For each entry, Groot lists namespace-scoped resources through the API (pods, services, Deployments, ReplicaSets, StatefulSets, DaemonSets), writes JSON sections to <ns>/resources.txt, and ensures <ns>/ exists under the capture tree. |
targets |
Per-namespace pod log filters only. Keys are namespace names. Under each: deployments, statefulsets, daemonsets, jobs, cronjobs, helm_releases (string lists). If a namespace has at least one non-empty list, only pods whose labels match those workloads get log jobs. Empty/missing entry → broad pod logs for that NS. |
include_pod_logs |
When true, collects pod logs for workload and control-plane pods via the API (subject to targets, pod_log_tail_lines, pod_logs_since). When false, skips all pod log jobs. |
include_previous_logs |
When true, also collects previous-container logs into *.previous.log (same semantics as --previous on pod logs; marked optional so a missing previous container does not fail the run). |
pod_log_tail_lines |
When >0, passes --tail N to pod log commands. 0 means no --tail (full log stream — can be very large). |
pod_logs_since |
When set, passes --since=… to pod log commands only (digits-only = hours, e.g. 24 → 24h; otherwise a Go duration like 24h, 45m). groot collect --since overrides this when the flag is set. The capture directory and .tar.gz basename include since-<slug> after the timestamp so runs with a log window are identifiable on disk (see Output naming). |
include_node_details |
When true, for each node writes describe-style summaries and node metrics (when the metrics API is available) under nodes/. |
include_node_logs |
When true, for each node: (1) GET /api/v1/nodes/<node>/proxy/logs/?query=kubelet (optional &tailLines=N) → nodes/<node>-kubelet.log (kubelet via node log query; Kubernetes 1.27+, RBAC nodes/proxy, kubelet log-query settings — see Node log query); (2) GET /api/v1/nodes/<node>/proxy/logs/messages → nodes/<node>-messages.log (host /var/log/messages when the kubelet serves it). The messages job is optional (failure does not fail the run) because many nodes use journald only or do not expose that path. |
node_log_tail_lines |
When >0, appends tailLines to the kubelet log query (default 5000). 0 omits tailLines (server default limit). |
include_pod_metrics |
When true, writes cluster-wide pod CPU/memory to extras/all-pods-top.txt (via metrics.k8s.io; requires metrics-server or an equivalent metrics provider). |
redact_secrets |
When true, scans collected *.log files and replaces likely secret values with [REDACTED] before the manifest and archive (see Secret redaction). Default false. |
redact_patterns |
Optional list of extra regex patterns (RE2 syntax). Invalid patterns fail at collect time. |
extra_kubectl |
List of extra read-only argv lines (allowlisted verbs; split on whitespace, no shell). Groot executes them in-process with client-go. See the note below on allowed verbs. |
| Block / field | What it does |
|---|---|
on_failure: enabled, on_abort, min_failed_jobs |
Optional alerts when collect aborts or when failed >= min_failed_jobs on a completed run. Respects --no-notify. |
retry: max_attempts, initial_backoff, max_backoff |
Retries transient 5xx and network errors for HTTP notify clients (webhooks, PagerDuty). |
slack, discord, teams: enabled, webhook_url |
POST a one-line summary to incoming webhook URL(s). Multiple URLs: ;. Env: GROOT_NOTIFY_*_WEBHOOK_URL. |
pagerduty: enabled, routing_key, severity, source |
Events API v2 trigger. Env: GROOT_NOTIFY_PAGERDUTY_ROUTING_KEY. |
telegram: enabled, token, chat_id |
Bot API. Env: GROOT_NOTIFY_TELEGRAM_TOKEN, GROOT_NOTIFY_TELEGRAM_CHAT_ID. |
generic: webhook_url, json_key, headers, extra_fields, body_template, hmac_secret, hmac_header |
Custom JSON POST; see Notifications. Env: GROOT_NOTIFY_GENERIC_WEBHOOK_URL, GROOT_NOTIFY_GENERIC_HMAC_SECRET. |
email: host, port, username, password, from, to, use_tls |
Plain-text summary via SMTP (STARTTLS on 587 by default). Env: GROOT_NOTIFY_EMAIL_*. |
Environment variables use the GROOT_ prefix (Viper). Nested YAML keys map to env names by replacing . with _ (for example collection.timeout → GROOT_COLLECTION_TIMEOUT). kubeconfig in YAML still loses to the process KUBECONFIG env when that is set (see Resolution and precedence).
Common examples:
GROOT_OUTPUT_DIR,GROOT_FILE_PREFIXGROOT_COLLECTION_TIMEOUT,GROOT_COLLECTION_WORKER_CONCURRENCY,GROOT_COLLECTION_INCLUDE_POD_LOGS(boolean),GROOT_COLLECTION_POD_LOG_TAIL_LINES,GROOT_COLLECTION_POD_LOGS_SINCE, …- Notify secrets (also read when
enabled: trueand the YAML field is empty):GROOT_NOTIFY_SLACK_WEBHOOK_URL,GROOT_NOTIFY_DISCORD_WEBHOOK_URL,GROOT_NOTIFY_TEAMS_WEBHOOK_URL,GROOT_NOTIFY_TELEGRAM_TOKEN,GROOT_NOTIFY_TELEGRAM_CHAT_ID,GROOT_NOTIFY_GENERIC_WEBHOOK_URL,GROOT_NOTIFY_GENERIC_HMAC_SECRET,GROOT_NOTIFY_PAGERDUTY_ROUTING_KEY,GROOT_NOTIFY_EMAIL_HOST,GROOT_NOTIFY_EMAIL_USERNAME,GROOT_NOTIFY_EMAIL_PASSWORD,GROOT_NOTIFY_EMAIL_FROM,GROOT_NOTIFY_EMAIL_TO GROOT_NO_NOTIFY=1(ortrue/yes): same as--no-notifyfor a run
collection.extra_kubectl: Each string is split on whitespace into argv tokens (no shell). At load time, Groot only accepts read-oriented leading verbs: get, describe, top, logs, api-resources, api-versions, version, cluster-info, plus config view … and auth can-i …. Anything else fails collect immediately with a configuration error so a typo or copy-paste cannot turn extras into destructive verbs (delete, exec, apply, etc.).
When a notification channel is enabled and required credentials are missing, groot fails fast with a clear configuration error.
Configuration file precedence:
--configexplicit path./groot.yml~/.groot/groot.yml/etc/groot/groot.yml- defaults
kubeconfig precedence:
--kubeconfig /path/to/configKUBECONFIGkubeconfigvalue in YAML- if all empty, client-go /
clientcmddefault kubeconfig discovery (including in-cluster when applicable)
Workload filter behavior (collection.targets):
- per namespace, you can define
deployments,statefulsets,daemonsets,jobs,cronjobs, andhelm_releases - if a namespace has targets with at least one non-empty list, pod logs for that namespace are limited to matching pods
- if a namespace has no targets entry, or all lists are empty, pod logs stay broad for that namespace
jobs/cronjobsmatch label keys plusjob-nameon Job podshelm_releasesmatchesapp.kubernetes.io/instance
pod_log_tail_lines behavior:
0: collect full logs (no--tail; use when you need the entire log stream)>0: collect only the last N lines per pod- applies to both current and
--previouspod logs
pod_logs_since and collect --since (pod logs only):
- applies
--since/ time-window filtering to workload and control-plane pod log jobs; other capture jobs are unchanged - in YAML or env, a string of digits only is interpreted as whole hours (
"24"→24h); otherwise the value must parse as a Go duration (24h,45m, …) groot collect --since=…overridescollection.pod_logs_sincefor that run when the flag is set
include_previous_logs behavior:
true: also collects previous-container pod logs into*.previous.log(optional jobs; same idea as--previouson pod logs)false: collects only current pod logs
output_dir path expansion:
- supports
~(home directory), for example~/tmp/groot-out - supports environment variables, for example
${HOME}/tmp/groot-out
Capture output names use file_prefix (default groot-capture):
- directory:
<file_prefix>-<timestamp>or<file_prefix>-<timestamp>-since-<slug>whenpod_logs_since/--sinceis set - archive:
<sessionBase>-<cluster>[-<message>].tar.gz(for examplegroot-capture-20260606-120000-my-cluster.tar.gz)
When pod_logs_since is set, <slug> is a filesystem-safe form of the duration (for example 12h, 45m).
--message is sanitized before use:
- lowercase
- trims leading/trailing spaces
- removes accents/diacritics
- converts spaces and
_to- - removes unsupported filesystem characters
- collapses repeated dashes
Example:
- input:
--message "network routing issue" - suffix:
network-routing-issue - output:
groot-capture-20260428-123200-my-cluster-network-routing-issue.tar.gz - with
pod_logs_since(or--since) set to12hand no message:groot-capture-20260428-123200-since-12h-my-cluster.tar.gz
Directory layout:
nodes/extras/- one directory per configured namespace (for example
kube-system/,default/) - pod log files:
<pod>__<node>.log(and.previous.logwhen enabled), same pattern for control-plane pods underkube-system/ - after archive creation, the timestamp directory is automatically removed
Inside the .tar.gz, every path is prefixed with the capture folder name (<session>/…, for example 20260502-174207/kube-system/… or 20260503-081049-since-12h/kube-system/…). Extracting into a shared directory (for example ~/tmp/groot-out) keeps each run under its own subdirectory instead of mixing kube-system/, production/, etc. at the extraction root. Archives produced by older Groot versions may still have a flat layout at the tar root.
- default: summary
INFOlines --verbose: adds per-commandCMD/OK/ERR--quiet: suppresses normal console output, prints only errors; does not disable webhooks/API notifications--no-notify: skips every notify channel for this run (config can still haveenabled: true; use from cron when you want silence to external systems). Env equivalent:GROOT_NO_NOTIFY=1--no-color: disables ANSI colors
These artifacts mirror common read-only inspection commands (all via client-go):
extras/cluster-info.txt— discovery / server summary (cluster-info)extras/nodes-wide.txt— all nodes, wide columns (get nodes -o wide)extras/all-pods-wide.txt— all pods cluster-wide, wide columns (get pods -A -o wide)extras/all-cluster-events.log— all events, sorted by last timestamp (get events -A)- Under
nodes/— per-node describe-style output and node metrics when enabled (describe node,top node) - Pod logs — streams all containers like
logs -n <ns> <pod> --all-containers→ files named<pod>__<node>.logunder each namespace directory (pending/unscheduled pods useunknown-node) - Control plane pod logs in
kube-system(tier=control-plane, when available) use the same<pod>__<node>.logpattern extras/kubeconfig.txtderived from kubeconfig (context,cluster,user,server)extras/manifest.json— archive manifest (version, cluster, job counts, file paths)
Groot sends a one-line summary after collect. Channels are independent; enable only what you need.
Success message (all channels):
GROOT finished. total=42 success=40 failed=2 duration=3m12s output=/out/groot-capture-… archive=/out/groot-capture-….tar.gz
Failure messages (when notify.on_failure.enabled: true):
GROOT FAILED. reason=archive logs: … total=42 success=40 failed=2 …
GROOT finished with failures. total=42 success=40 failed=2 …
notify:
slack:
enabled: true
webhook_url: "https://hooks.slack.com/services/T…/B…/…;https://hooks.slack.com/services/…"
discord:
enabled: true
webhook_url: "https://discord.com/api/webhooks/…"Env fallbacks: GROOT_NOTIFY_SLACK_WEBHOOK_URL, GROOT_NOTIFY_DISCORD_WEBHOOK_URL, GROOT_NOTIFY_TEAMS_WEBHOOK_URL. Discord truncates content to 2000 runes.
notify:
pagerduty:
enabled: true
routing_key: "your-integration-key"
severity: warning # critical | error | warning | info
source: grootEnv: GROOT_NOTIFY_PAGERDUTY_ROUTING_KEY. Expects HTTP 202. custom_details includes job counts, duration, paths.
notify:
telegram:
enabled: true
token: "123456:ABC…"
chat_id: "-1001234567890;123456789"Env: GROOT_NOTIFY_TELEGRAM_TOKEN, GROOT_NOTIFY_TELEGRAM_CHAT_ID.
Simple (default shape):
notify:
generic:
enabled: true
webhook_url: "https://internal.example/hooks/groot"
json_key: "text"
extra_fields:
source: "groot"
environment: "production"POST body: {"text":"<summary>","source":"groot","environment":"production"}.
Custom JSON template (placeholders: {{summary}}, {{text}}, {{event}}, {{total}}, {{success}}, {{failed}}, {{duration}}, {{output_dir}}, {{archive_path}}, {{reason}}):
notify:
generic:
enabled: true
webhook_url: "https://internal.example/hooks/groot"
body_template: '{"event":"{{event}}","message":"{{summary}}","stats":{"total":{{total}},"failed":{{failed}}}}'
headers:
Authorization: "Bearer ${INTERNAL_TOKEN}"
hmac_secret: "shared-signing-key"
hmac_header: "X-Groot-Signature"When hmac_secret is set, Groot sends X-Groot-Signature: sha256=<hex> (HMAC-SHA256 over the raw POST body). Env: GROOT_NOTIFY_GENERIC_HMAC_SECRET.
notify:
email:
enabled: true
host: smtp.example.com
port: 587
username: groot-bot
password: "${SMTP_PASSWORD}"
from: groot@example.com
to: "ops@example.com;oncall@example.com"
use_tls: false # true for implicit TLS (e.g. port 465)Env: GROOT_NOTIFY_EMAIL_HOST, GROOT_NOTIFY_EMAIL_USERNAME, GROOT_NOTIFY_EMAIL_PASSWORD, GROOT_NOTIFY_EMAIL_FROM, GROOT_NOTIFY_EMAIL_TO.
notify:
on_failure:
enabled: true
on_abort: true # alert when collect aborts (timeout, archive error, …)
min_failed_jobs: 1 # also alert when failed jobs >= this (success path)
retry:
max_attempts: 3
initial_backoff: 1s
max_backoff: 10s
slack:
enabled: true
webhook_url: "https://hooks.slack.com/services/…"Partial-failure alerts are in addition to the normal success notify. --no-notify / GROOT_NO_NOTIFY=1 skips all channels including failure alerts.
HTTP clients retry transient 5xx and network errors only; 4xx fails immediately.
After a successful collect, optionally upload the .tar.gz to object storage. Credentials come from the standard AWS / Google SDK env vars (not from long-lived keys in YAML).
upload:
enabled: true
continue_on_error: true
s3:
enabled: true
bucket: my-archives
region: us-east-1
key_prefix: groot/prodEnv overrides: GROOT_UPLOAD_S3_BUCKET, GROOT_UPLOAD_S3_REGION, GROOT_UPLOAD_S3_KEY_PREFIX, GROOT_UPLOAD_S3_ENDPOINT (S3-compatible), GROOT_UPLOAD_GCS_BUCKET, GROOT_UPLOAD_GCS_KEY_PREFIX. Upload runs after notify; failures are logged but do not fail the collect. Skip with --no-upload or GROOT_NO_UPLOAD=1.
Minimum IAM: S3 s3:PutObject on the bucket/prefix; GCS roles/storage.objectCreator (or tighter custom role).
Run groot collect on a schedule inside the cluster. Image: ghcr.io/hrodrig/groot (see Releases).
helm upgrade --install groot ./deploy/helm/groot \
--namespace groot --create-namespace \
--set image.tag=0.6.0 \
--set schedule="0 */6 * * *"Embed your config (notify, namespaces, redaction):
helm upgrade --install groot ./deploy/helm/groot \
--namespace groot --create-namespace \
--set-file config.grootYml=./groot.yml \
--set image.tag=0.5.0Archives land on the /out volume (PVC by default). See deploy/helm/groot/README.md for values reference.
kubectl apply -f deploy/k8s/cronjob.yamlEdit the ConfigMap groot-config and image tag before production use. Includes Namespace, ServiceAccount, ClusterRole, PVC, and CronJob.
More detail: deploy/README.md.
Collected logs may contain passwords, tokens, or API keys. Enable an optional scrub pass before the archive is written:
collection:
redact_secrets: true
redact_patterns:
- '(?i)corp-internal-key\s*=\s*\S+'Behavior:
- Scans only
*.logfiles under the capture tree - Built-in patterns match common key names (
password,token,Bearer …,api_key, …) - Matches are replaced with
[REDACTED] - Off by default; does not guarantee all secrets are removed — treat archives as sensitive
Example line before/after:
authorization: Bearer eyJhbGciOiJIUzI1NiIs…
authorization: [REDACTED]
make docker-build
make docker-buildx
make scan
docker run --rm \
-v "$HOME/.kube:/home/nonroot/.kube:ro" \
-v "$(pwd)/out:/app/out" \
groot:localFor strict rootless runtime, use Podman:
podman build -t groot:local .
podman run --rm \
-v "$HOME/.kube:/home/nonroot/.kube:ro" \
-v "$(pwd)/out:/app/out" \
groot:localCollected logs may contain sensitive data (secrets, PII, credentials). Handle archives according to your security policy.
Optional collection.redact_secrets reduces accidental exposure in *.log files but is not a guarantee — review before sharing archives externally. Restrict access to output_dir and in-cluster PVCs; keep notify credentials in env/Secrets, not committed ConfigMaps.
Found Groot useful? We'd love your help to make it better. You can:
- Report bugs or suggest features — open an issue
- Contribute code — see CONTRIBUTING.md for how to submit a pull request
- Star the repo — it helps others discover Groot
Thanks for using Groot.
Groot is distributed under the MIT License. The full text is in LICENSE in this repository.

