githubnext · adamhenson · May 8, 2026 · May 8, 2026
diff --git a/README.md b/README.md
@@ -19,6 +19,7 @@ Investigate faults proactively and improve CI.
 - [🏥 CI Doctor](docs/ci-doctor.md) - Monitor CI workflows and investigate failures automatically
 - [🚀 CI Coach](docs/ci-coach.md) - Optimize CI workflows for speed and cost efficiency
 - [💰 Cost Tracker](docs/cost-tracker.md) - Post per-run agent spend summaries on pull requests using token-usage.jsonl from gh-aw's firewall
+- [🔍 Log Watcher](docs/log-watcher.md) - Scan run logs and token data for errors, retry loops, and anomalies after every agent workflow run
 
 ### Code Review Workflows
 

diff --git a/docs/log-watcher.md b/docs/log-watcher.md
@@ -0,0 +1,125 @@
+# 🔍 Log Watcher
+
+> For an overview of all available workflows, see the [main README](../README.md).
+
+**Automated agent run diagnostics that scan logs and token data for errors, retry loops, and anomalies after every agent workflow run**
+
+The [Log Watcher workflow](../workflows/log-watcher.md?plain=1) fires after your configured agent workflows complete, downloads the `agent-artifacts` artifact written by gh-aw's firewall, scans the run logs for error patterns and retry loops, analyses token usage for anomalies, and posts a health diagnosis on the associated pull request or creates a diagnosis issue.
+
+## Installation
+
+```bash
+# Install the 'gh aw' extension
+gh extension install github/gh-aw
+
+# Add the workflow to your repository
+gh aw add-wizard githubnext/agentics/log-watcher
+```
+
+This walks you through adding the workflow to your repository.
+
+## How It Works
+
+```mermaid
+graph LR
+    A[Agent Workflow Completes] --> B[Download agent-artifacts]
+    B --> C{artifact found?}
+    C -->|No| D[Exit silently]
+    C -->|Yes| E[Download run logs]
+    E --> F[Scan for errors / retries / timeouts]
+    F --> G[Analyse token-usage.jsonl]
+    G --> H{Health level}
+    H -->|Healthy| I[Brief summary comment]
+    H -->|Degraded / Failed| J[Full diagnosis comment or issue]
+    G --> K{High-cost failure?}
+    K -->|Yes| L[Alert issue]
+```
+
+The workflow reads both the GitHub Actions run logs (via `gh run view --log`) and the
+`token-usage.jsonl` file from the `agent-artifacts` artifact. It combines log signals
+(errors, timeouts, retry loops) with token metrics (output ratio, cache efficiency, model
+mixing) to assign a health level and write a plain-English diagnosis.
+
+Runs that do not produce an `agent-artifacts` artifact (non-agent CI workflows) are
+skipped silently.
+
+## Usage
+
+### Configuration
+
+After installing, open the workflow file and update the `workflows` list under `on.workflow_run`
+to match the names of your agent workflows:
+
+```yaml
+on:
+  workflow_run:
+    workflows: ["agent-implement", "agent-pr-fix"]  # your workflow names here
+    types:
+      - completed
+```
+
+To adjust the high-cost failure alert threshold, find the `50 000` token value in the
+workflow body and change it to match your budget.
+
+After editing run `gh aw compile` to update the workflow and commit all changes to the
+default branch.
+
+### Health levels
+
+| Level | Meaning |
+|-------|---------|
+| ✅ Healthy | No errors or anomalies; run completed successfully |
+| ⚠️ Degraded | Warnings or token anomalies present, but run completed |
+| ❌ Failed | Run failed or was cancelled, or critical errors were found |
+
+Healthy runs produce a brief, collapsed summary. Degraded and failed runs produce a full
+diagnosis with log excerpts and token metric details.
+
+### What it detects
+
+**Log patterns**
+- Errors, exceptions, and fatal messages
+- Timeout and rate-limit signals (including HTTP 429)
+- Retry loops - tools called more than 5 times in a single run
+- Context-window truncation warnings
+
+**Token anomalies**
+- High output ratio - agent producing far more tokens than it reads (sign of looping)
+- Low cache efficiency - cache misses on long, expensive runs
+- Unusually high total token count
+- Unexpected model mixing within a single run
+
+### Data sources
+
+Log Watcher reads two data sources from the completed run:
+
+1. **Run logs** - downloaded via `gh run view --log`; these are the standard GitHub Actions
+   step logs for every job in the workflow.
+2. **`token-usage.jsonl`** - read from `sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl`
+   inside the `agent-artifacts` artifact. This file is written automatically by gh-aw's
+   firewall on every agent run.
+
+No additional configuration is needed beyond enabling the firewall (the default).
+
+## Learn More
+
+- [token-usage.jsonl reference](https://github.github.com/gh-aw/reference/token-usage/)
+- [gh-aw firewall documentation](https://github.github.com/gh-aw/reference/firewall/)
+- [CI Doctor workflow](ci-doctor.md) - investigate CI failures automatically
+- [Cost Tracker workflow](cost-tracker.md) - post per-run spend summaries on pull requests
+
+## Going Further
+
+Log Watcher works standalone - no external services required. For teams that want
+persistent run history, cross-repo anomaly trends, and budget alerts over time, add
+[AgentMeter](https://agentmeter.app) to your agent workflow:
+
+```yaml
+- uses: agentmeter/agentmeter-action@v1
+  with:
+    api-key: ${{ secrets.AGENTMETER_API_KEY }}
+```
+
+AgentMeter ingests the same token data and surfaces per-repo trend charts, so you can
+spot gradual drift - rising output ratios, declining cache efficiency, model changes -
+across dozens of runs rather than one at a time.
diff --git a/workflows/log-watcher.md b/workflows/log-watcher.md
@@ -0,0 +1,215 @@
+---
+description: |
+  Automated agent run log watcher that fires after monitored workflows complete, downloads
+  the agent-artifacts artifact written by gh-aw's firewall, scans run logs and token data
+  for error patterns, retry loops, timeout signals, and token anomalies, then posts a
+  diagnostic summary on the associated pull request or creates a diagnosis issue.
+
+on:
+  workflow_run:
+    workflows: ["agent-implement", "agent-pr-fix"]  # Edit to match your agent workflow names
+    types:
+      - completed
+    branches:
+      - main
+
+permissions: read-all
+
+network: defaults
+
+safe-outputs:
+  add-comment:
+    target: "*"
+  create-issue:
+    title-prefix: "[log-watcher] "
+    labels: [automation, agent-health]
+    max: 3
+
+tools:
+  github:
+    toolsets: [default]
+  bash: true
+
+timeout-minutes: 10
+
+---
+
+# Agent Run Log Watcher
+
+You are the Agent Run Log Watcher. Your job is to analyse the logs and token data from a
+completed agent workflow run, detect anomalies and error patterns, and post a concise
+diagnostic summary where the team will see it.
+
+## Current Context
+
+- **Repository**: ${{ github.repository }}
+- **Run**: [#${{ github.event.workflow_run.run_number }}](${{ github.event.workflow_run.html_url }})
+- **Run ID**: ${{ github.event.workflow_run.id }}
+- **Conclusion**: ${{ github.event.workflow_run.conclusion }}
+- **Head SHA**: ${{ github.event.workflow_run.head_sha }}
+
+## Instructions
+
+### Step 1: Download the agent-artifacts artifact
+
+```bash
+gh run download ${{ github.event.workflow_run.id }} \
+  --name agent-artifacts \
+  --dir /tmp/agent-artifacts \
+  --repo ${{ github.repository }} 2>&1
+echo "exit: $?"
+```
+
+**If this command fails** (artifact does not exist), the run did not come from an agent
+workflow or the gh-aw firewall was not enabled. Exit silently - produce no output.
+
+### Step 2: Download the run logs
+
+```bash
+gh run view ${{ github.event.workflow_run.id }} \
+  --log \
+  --repo ${{ github.repository }} > /tmp/run-logs.txt 2>&1
+wc -l /tmp/run-logs.txt
+```
+
+If the log download fails, continue with token analysis only. Note the failure in the
+diagnosis.
+
+### Step 3: Scan run logs for anomalies
+
+Read `/tmp/run-logs.txt` and scan for the following patterns. Record every match with its
+line number and a short excerpt (≤ 120 characters).
+
+**Error signals**
+
+```bash
+grep -in "error\|exception\|fatal\|failed\|failure" /tmp/run-logs.txt | head -40
+```
+
+**Timeout and rate-limit signals**
+
+```bash
+grep -in "timeout\|timed out\|rate.limit\|429\|too many requests\|context deadline" /tmp/run-logs.txt | head -20
+```
+
+**Retry and loop signals** (repeated tool calls are the most common agent failure mode)
+
+```bash
+grep -in "retry\|retrying\|attempt [0-9]\|tool_call\|function_call" /tmp/run-logs.txt | head -40
+```
+
+Count how many times each distinct tool name appears across all tool call lines. Flag any
+tool called more than 5 times as a **possible retry loop**.
+
+**Truncation signals**
+
+```bash
+grep -in "context.window\|max.token\|truncat\|token limit" /tmp/run-logs.txt | head -20
+```
+
+### Step 4: Analyse token-usage.jsonl
+
+Read token data:
+
+```bash
+cat /tmp/agent-artifacts/sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl 2>/dev/null
+```
+
+Each line is a JSON object:
+
+```json
+{"model":"claude-sonnet-4-5","input_tokens":1200,"output_tokens":340,"cache_read_input_tokens":500,"cache_creation_input_tokens":100}
+```
+
+Calculate the following metrics across all lines:
+
+| Metric | Formula | Flag if… |
+|--------|---------|----------|
+| **Output ratio** | total_output / total_input | > 0.5 (agent producing more than it reads) |
+| **Cache efficiency** | cache_read / (cache_read + cache_creation) | < 0.2 on runs with > 5000 total tokens |
+| **Total tokens** | sum of all token fields | > 100 000 (high-cost run) |
+| **Model count** | distinct model names | > 2 (unexpected model mixing) |
+
+Flagged metrics are anomalies - include them in the diagnosis.
+
+### Step 5: Determine run health
+
+Assign one of three health levels:
+
+| Level | Criteria |
+|-------|----------|
+| ✅ **Healthy** | No errors, no flagged metrics, conclusion is `success` |
+| ⚠️ **Degraded** | Warnings or flagged metrics present, but conclusion is `success` |
+| ❌ **Failed** | Conclusion is `failure` or `cancelled`, or critical errors found |
+
+### Step 6: Find the associated pull request
+
+```bash
+gh api "repos/${{ github.repository }}/actions/runs/${{ github.event.workflow_run.id }}" \
+  --jq '.pull_requests[0].number // empty'
+```
+
+### Step 7: Post the diagnosis
+
+Build the report using this template. Fill in `$HEALTH`, `$SUMMARY`, and the findings
+tables:
+
+```markdown
+## Agent run diagnosis $HEALTH
+
+| | |
+|---|---|
+| **Run** | [#${{ github.event.workflow_run.run_number }}](${{ github.event.workflow_run.html_url }}) |
+| **Conclusion** | ${{ github.event.workflow_run.conclusion }} |
+| **Health** | $HEALTH |
+
+$SUMMARY
+
+<details>
+<summary>Log findings</summary>
+
+| Category | Count | Sample |
+|----------|------:|-------|
+[one row per finding category that had matches; omit empty categories]
+
+</details>
+
+<details>
+<summary>Token anomalies</summary>
+
+| Metric | Value | Status |
+|--------|------:|-------|
+[one row per metric from Step 4; mark anomalies with ⚠️]
+
+</details>
+
+*Logs and token data from gh-aw's firewall artifact.*
+```
+
+**$SUMMARY** should be 1-3 plain-English sentences that state what happened and, if the
+run is degraded or failed, the most likely cause.
+
+**If a PR number was found**: post as a comment on that PR using `add_comment`.
+
+**If no PR was found**: create an issue using `create_issue` with title:
+`[log-watcher] #${{ github.event.workflow_run.run_number }}: $HEALTH`
+
+### Step 8: Critical anomaly alert (optional)
+
+If health is ❌ **Failed** AND total tokens exceed 50 000 (high-cost failure), create a
+second issue using `create_issue` with title:
+`[log-watcher] High-cost failure: run #${{ github.event.workflow_run.run_number }}`
+
+Include the full diagnosis and a direct link to the run. Adjust the 50 000-token threshold
+in the workflow to match your budget.
+
+## Guidelines
+
+- **Silent on non-agent runs**: If the artifact does not exist, produce no output at all.
+- **One report per run**: Do not create more than one comment or issue per triggering run.
+- **Healthy runs are brief**: If health is ✅, keep the report short - one-line summary,
+  collapsed details. Do not create noise for runs that are working fine.
+- **Be specific**: When flagging an error, quote the relevant log line. Vague warnings are
+  not useful.
+- **No retries**: Exit silently on transient download failures; the next run produces its
+  own report.