diff --git a/README.md b/README.md index 6794ab95..0970e599 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,7 @@ Investigate faults proactively and improve CI. - [🏥 CI Doctor](docs/ci-doctor.md) - Monitor CI workflows and investigate failures automatically - [🚀 CI Coach](docs/ci-coach.md) - Optimize CI workflows for speed and cost efficiency - [💰 Cost Tracker](docs/cost-tracker.md) - Post per-run agent spend summaries on pull requests using token-usage.jsonl from gh-aw's firewall +- [🔍 Log Watcher](docs/log-watcher.md) - Scan run logs and token data for errors, retry loops, and anomalies after every agent workflow run ### Code Review Workflows diff --git a/docs/log-watcher.md b/docs/log-watcher.md new file mode 100644 index 00000000..07027475 --- /dev/null +++ b/docs/log-watcher.md @@ -0,0 +1,125 @@ +# 🔍 Log Watcher + +> For an overview of all available workflows, see the [main README](../README.md). + +**Automated agent run diagnostics that scan logs and token data for errors, retry loops, and anomalies after every agent workflow run** + +The [Log Watcher workflow](../workflows/log-watcher.md?plain=1) fires after your configured agent workflows complete, downloads the `agent-artifacts` artifact written by gh-aw's firewall, scans the run logs for error patterns and retry loops, analyses token usage for anomalies, and posts a health diagnosis on the associated pull request or creates a diagnosis issue. + +## Installation + +```bash +# Install the 'gh aw' extension +gh extension install github/gh-aw + +# Add the workflow to your repository +gh aw add-wizard githubnext/agentics/log-watcher +``` + +This walks you through adding the workflow to your repository. + +## How It Works + +```mermaid +graph LR + A[Agent Workflow Completes] --> B[Download agent-artifacts] + B --> C{artifact found?} + C -->|No| D[Exit silently] + C -->|Yes| E[Download run logs] + E --> F[Scan for errors / retries / timeouts] + F --> G[Analyse token-usage.jsonl] + G --> H{Health level} + H -->|Healthy| I[Brief summary comment] + H -->|Degraded / Failed| J[Full diagnosis comment or issue] + G --> K{High-cost failure?} + K -->|Yes| L[Alert issue] +``` + +The workflow reads both the GitHub Actions run logs (via `gh run view --log`) and the +`token-usage.jsonl` file from the `agent-artifacts` artifact. It combines log signals +(errors, timeouts, retry loops) with token metrics (output ratio, cache efficiency, model +mixing) to assign a health level and write a plain-English diagnosis. + +Runs that do not produce an `agent-artifacts` artifact (non-agent CI workflows) are +skipped silently. + +## Usage + +### Configuration + +After installing, open the workflow file and update the `workflows` list under `on.workflow_run` +to match the names of your agent workflows: + +```yaml +on: + workflow_run: + workflows: ["agent-implement", "agent-pr-fix"] # your workflow names here + types: + - completed +``` + +To adjust the high-cost failure alert threshold, find the `50 000` token value in the +workflow body and change it to match your budget. + +After editing run `gh aw compile` to update the workflow and commit all changes to the +default branch. + +### Health levels + +| Level | Meaning | +|-------|---------| +| ✅ Healthy | No errors or anomalies; run completed successfully | +| ⚠️ Degraded | Warnings or token anomalies present, but run completed | +| ❌ Failed | Run failed or was cancelled, or critical errors were found | + +Healthy runs produce a brief, collapsed summary. Degraded and failed runs produce a full +diagnosis with log excerpts and token metric details. + +### What it detects + +**Log patterns** +- Errors, exceptions, and fatal messages +- Timeout and rate-limit signals (including HTTP 429) +- Retry loops - tools called more than 5 times in a single run +- Context-window truncation warnings + +**Token anomalies** +- High output ratio - agent producing far more tokens than it reads (sign of looping) +- Low cache efficiency - cache misses on long, expensive runs +- Unusually high total token count +- Unexpected model mixing within a single run + +### Data sources + +Log Watcher reads two data sources from the completed run: + +1. **Run logs** - downloaded via `gh run view --log`; these are the standard GitHub Actions + step logs for every job in the workflow. +2. **`token-usage.jsonl`** - read from `sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl` + inside the `agent-artifacts` artifact. This file is written automatically by gh-aw's + firewall on every agent run. + +No additional configuration is needed beyond enabling the firewall (the default). + +## Learn More + +- [token-usage.jsonl reference](https://github.github.com/gh-aw/reference/token-usage/) +- [gh-aw firewall documentation](https://github.github.com/gh-aw/reference/firewall/) +- [CI Doctor workflow](ci-doctor.md) - investigate CI failures automatically +- [Cost Tracker workflow](cost-tracker.md) - post per-run spend summaries on pull requests + +## Going Further + +Log Watcher works standalone - no external services required. For teams that want +persistent run history, cross-repo anomaly trends, and budget alerts over time, add +[AgentMeter](https://agentmeter.app) to your agent workflow: + +```yaml +- uses: agentmeter/agentmeter-action@v1 + with: + api-key: ${{ secrets.AGENTMETER_API_KEY }} +``` + +AgentMeter ingests the same token data and surfaces per-repo trend charts, so you can +spot gradual drift - rising output ratios, declining cache efficiency, model changes - +across dozens of runs rather than one at a time. diff --git a/workflows/log-watcher.md b/workflows/log-watcher.md new file mode 100644 index 00000000..0936fdc5 --- /dev/null +++ b/workflows/log-watcher.md @@ -0,0 +1,215 @@ +--- +description: | + Automated agent run log watcher that fires after monitored workflows complete, downloads + the agent-artifacts artifact written by gh-aw's firewall, scans run logs and token data + for error patterns, retry loops, timeout signals, and token anomalies, then posts a + diagnostic summary on the associated pull request or creates a diagnosis issue. + +on: + workflow_run: + workflows: ["agent-implement", "agent-pr-fix"] # Edit to match your agent workflow names + types: + - completed + branches: + - main + +permissions: read-all + +network: defaults + +safe-outputs: + add-comment: + target: "*" + create-issue: + title-prefix: "[log-watcher] " + labels: [automation, agent-health] + max: 3 + +tools: + github: + toolsets: [default] + bash: true + +timeout-minutes: 10 + +--- + +# Agent Run Log Watcher + +You are the Agent Run Log Watcher. Your job is to analyse the logs and token data from a +completed agent workflow run, detect anomalies and error patterns, and post a concise +diagnostic summary where the team will see it. + +## Current Context + +- **Repository**: ${{ github.repository }} +- **Run**: [#${{ github.event.workflow_run.run_number }}](${{ github.event.workflow_run.html_url }}) +- **Run ID**: ${{ github.event.workflow_run.id }} +- **Conclusion**: ${{ github.event.workflow_run.conclusion }} +- **Head SHA**: ${{ github.event.workflow_run.head_sha }} + +## Instructions + +### Step 1: Download the agent-artifacts artifact + +```bash +gh run download ${{ github.event.workflow_run.id }} \ + --name agent-artifacts \ + --dir /tmp/agent-artifacts \ + --repo ${{ github.repository }} 2>&1 +echo "exit: $?" +``` + +**If this command fails** (artifact does not exist), the run did not come from an agent +workflow or the gh-aw firewall was not enabled. Exit silently - produce no output. + +### Step 2: Download the run logs + +```bash +gh run view ${{ github.event.workflow_run.id }} \ + --log \ + --repo ${{ github.repository }} > /tmp/run-logs.txt 2>&1 +wc -l /tmp/run-logs.txt +``` + +If the log download fails, continue with token analysis only. Note the failure in the +diagnosis. + +### Step 3: Scan run logs for anomalies + +Read `/tmp/run-logs.txt` and scan for the following patterns. Record every match with its +line number and a short excerpt (≤ 120 characters). + +**Error signals** + +```bash +grep -in "error\|exception\|fatal\|failed\|failure" /tmp/run-logs.txt | head -40 +``` + +**Timeout and rate-limit signals** + +```bash +grep -in "timeout\|timed out\|rate.limit\|429\|too many requests\|context deadline" /tmp/run-logs.txt | head -20 +``` + +**Retry and loop signals** (repeated tool calls are the most common agent failure mode) + +```bash +grep -in "retry\|retrying\|attempt [0-9]\|tool_call\|function_call" /tmp/run-logs.txt | head -40 +``` + +Count how many times each distinct tool name appears across all tool call lines. Flag any +tool called more than 5 times as a **possible retry loop**. + +**Truncation signals** + +```bash +grep -in "context.window\|max.token\|truncat\|token limit" /tmp/run-logs.txt | head -20 +``` + +### Step 4: Analyse token-usage.jsonl + +Read token data: + +```bash +cat /tmp/agent-artifacts/sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl 2>/dev/null +``` + +Each line is a JSON object: + +```json +{"model":"claude-sonnet-4-5","input_tokens":1200,"output_tokens":340,"cache_read_input_tokens":500,"cache_creation_input_tokens":100} +``` + +Calculate the following metrics across all lines: + +| Metric | Formula | Flag if… | +|--------|---------|----------| +| **Output ratio** | total_output / total_input | > 0.5 (agent producing more than it reads) | +| **Cache efficiency** | cache_read / (cache_read + cache_creation) | < 0.2 on runs with > 5000 total tokens | +| **Total tokens** | sum of all token fields | > 100 000 (high-cost run) | +| **Model count** | distinct model names | > 2 (unexpected model mixing) | + +Flagged metrics are anomalies - include them in the diagnosis. + +### Step 5: Determine run health + +Assign one of three health levels: + +| Level | Criteria | +|-------|----------| +| ✅ **Healthy** | No errors, no flagged metrics, conclusion is `success` | +| ⚠️ **Degraded** | Warnings or flagged metrics present, but conclusion is `success` | +| ❌ **Failed** | Conclusion is `failure` or `cancelled`, or critical errors found | + +### Step 6: Find the associated pull request + +```bash +gh api "repos/${{ github.repository }}/actions/runs/${{ github.event.workflow_run.id }}" \ + --jq '.pull_requests[0].number // empty' +``` + +### Step 7: Post the diagnosis + +Build the report using this template. Fill in `$HEALTH`, `$SUMMARY`, and the findings +tables: + +```markdown +## Agent run diagnosis $HEALTH + +| | | +|---|---| +| **Run** | [#${{ github.event.workflow_run.run_number }}](${{ github.event.workflow_run.html_url }}) | +| **Conclusion** | ${{ github.event.workflow_run.conclusion }} | +| **Health** | $HEALTH | + +$SUMMARY + +
+Log findings + +| Category | Count | Sample | +|----------|------:|-------| +[one row per finding category that had matches; omit empty categories] + +
+ +
+Token anomalies + +| Metric | Value | Status | +|--------|------:|-------| +[one row per metric from Step 4; mark anomalies with ⚠️] + +
+ +*Logs and token data from gh-aw's firewall artifact.* +``` + +**$SUMMARY** should be 1-3 plain-English sentences that state what happened and, if the +run is degraded or failed, the most likely cause. + +**If a PR number was found**: post as a comment on that PR using `add_comment`. + +**If no PR was found**: create an issue using `create_issue` with title: +`[log-watcher] #${{ github.event.workflow_run.run_number }}: $HEALTH` + +### Step 8: Critical anomaly alert (optional) + +If health is ❌ **Failed** AND total tokens exceed 50 000 (high-cost failure), create a +second issue using `create_issue` with title: +`[log-watcher] High-cost failure: run #${{ github.event.workflow_run.run_number }}` + +Include the full diagnosis and a direct link to the run. Adjust the 50 000-token threshold +in the workflow to match your budget. + +## Guidelines + +- **Silent on non-agent runs**: If the artifact does not exist, produce no output at all. +- **One report per run**: Do not create more than one comment or issue per triggering run. +- **Healthy runs are brief**: If health is ✅, keep the report short - one-line summary, + collapsed details. Do not create noise for runs that are working fine. +- **Be specific**: When flagging an error, quote the relevant log line. Vague warnings are + not useful. +- **No retries**: Exit silently on transient download failures; the next run produces its + own report.