Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Investigate faults proactively and improve CI.
- [🏥 CI Doctor](docs/ci-doctor.md) - Monitor CI workflows and investigate failures automatically
- [🚀 CI Coach](docs/ci-coach.md) - Optimize CI workflows for speed and cost efficiency
- [💰 Cost Tracker](docs/cost-tracker.md) - Post per-run agent spend summaries on pull requests using token-usage.jsonl from gh-aw's firewall
- [🔍 Log Watcher](docs/log-watcher.md) - Scan run logs and token data for errors, retry loops, and anomalies after every agent workflow run

### Code Review Workflows

Expand Down
125 changes: 125 additions & 0 deletions docs/log-watcher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# 🔍 Log Watcher

> For an overview of all available workflows, see the [main README](../README.md).

**Automated agent run diagnostics that scan logs and token data for errors, retry loops, and anomalies after every agent workflow run**

The [Log Watcher workflow](../workflows/log-watcher.md?plain=1) fires after your configured agent workflows complete, downloads the `agent-artifacts` artifact written by gh-aw's firewall, scans the run logs for error patterns and retry loops, analyses token usage for anomalies, and posts a health diagnosis on the associated pull request or creates a diagnosis issue.

## Installation

```bash
# Install the 'gh aw' extension
gh extension install github/gh-aw

# Add the workflow to your repository
gh aw add-wizard githubnext/agentics/log-watcher
```

This walks you through adding the workflow to your repository.

## How It Works

```mermaid
graph LR
A[Agent Workflow Completes] --> B[Download agent-artifacts]
B --> C{artifact found?}
C -->|No| D[Exit silently]
C -->|Yes| E[Download run logs]
E --> F[Scan for errors / retries / timeouts]
F --> G[Analyse token-usage.jsonl]
G --> H{Health level}
H -->|Healthy| I[Brief summary comment]
H -->|Degraded / Failed| J[Full diagnosis comment or issue]
G --> K{High-cost failure?}
K -->|Yes| L[Alert issue]
```

The workflow reads both the GitHub Actions run logs (via `gh run view --log`) and the
`token-usage.jsonl` file from the `agent-artifacts` artifact. It combines log signals
(errors, timeouts, retry loops) with token metrics (output ratio, cache efficiency, model
mixing) to assign a health level and write a plain-English diagnosis.

Runs that do not produce an `agent-artifacts` artifact (non-agent CI workflows) are
skipped silently.

## Usage

### Configuration

After installing, open the workflow file and update the `workflows` list under `on.workflow_run`
to match the names of your agent workflows:

```yaml
on:
workflow_run:
workflows: ["agent-implement", "agent-pr-fix"] # your workflow names here
types:
- completed
```

To adjust the high-cost failure alert threshold, find the `50 000` token value in the
workflow body and change it to match your budget.

After editing run `gh aw compile` to update the workflow and commit all changes to the
default branch.

### Health levels

| Level | Meaning |
|-------|---------|
| ✅ Healthy | No errors or anomalies; run completed successfully |
| ⚠️ Degraded | Warnings or token anomalies present, but run completed |
| ❌ Failed | Run failed or was cancelled, or critical errors were found |

Healthy runs produce a brief, collapsed summary. Degraded and failed runs produce a full
diagnosis with log excerpts and token metric details.

### What it detects

**Log patterns**
- Errors, exceptions, and fatal messages
- Timeout and rate-limit signals (including HTTP 429)
- Retry loops - tools called more than 5 times in a single run
- Context-window truncation warnings

**Token anomalies**
- High output ratio - agent producing far more tokens than it reads (sign of looping)
- Low cache efficiency - cache misses on long, expensive runs
- Unusually high total token count
- Unexpected model mixing within a single run

### Data sources

Log Watcher reads two data sources from the completed run:

1. **Run logs** - downloaded via `gh run view --log`; these are the standard GitHub Actions
step logs for every job in the workflow.
2. **`token-usage.jsonl`** - read from `sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl`
inside the `agent-artifacts` artifact. This file is written automatically by gh-aw's
firewall on every agent run.

No additional configuration is needed beyond enabling the firewall (the default).

## Learn More

- [token-usage.jsonl reference](https://github.github.com/gh-aw/reference/token-usage/)
- [gh-aw firewall documentation](https://github.github.com/gh-aw/reference/firewall/)
- [CI Doctor workflow](ci-doctor.md) - investigate CI failures automatically
- [Cost Tracker workflow](cost-tracker.md) - post per-run spend summaries on pull requests

## Going Further

Log Watcher works standalone - no external services required. For teams that want
persistent run history, cross-repo anomaly trends, and budget alerts over time, add
[AgentMeter](https://agentmeter.app) to your agent workflow:

```yaml
- uses: agentmeter/agentmeter-action@v1
with:
api-key: ${{ secrets.AGENTMETER_API_KEY }}
```

AgentMeter ingests the same token data and surfaces per-repo trend charts, so you can
spot gradual drift - rising output ratios, declining cache efficiency, model changes -
across dozens of runs rather than one at a time.
215 changes: 215 additions & 0 deletions workflows/log-watcher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
---
description: |
Automated agent run log watcher that fires after monitored workflows complete, downloads
the agent-artifacts artifact written by gh-aw's firewall, scans run logs and token data
for error patterns, retry loops, timeout signals, and token anomalies, then posts a
diagnostic summary on the associated pull request or creates a diagnosis issue.

on:
workflow_run:
workflows: ["agent-implement", "agent-pr-fix"] # Edit to match your agent workflow names
types:
- completed
branches:
- main

permissions: read-all

network: defaults

safe-outputs:
add-comment:
target: "*"
create-issue:
title-prefix: "[log-watcher] "
labels: [automation, agent-health]
max: 3

tools:
github:
toolsets: [default]
bash: true

timeout-minutes: 10

---

# Agent Run Log Watcher

You are the Agent Run Log Watcher. Your job is to analyse the logs and token data from a
completed agent workflow run, detect anomalies and error patterns, and post a concise
diagnostic summary where the team will see it.

## Current Context

- **Repository**: ${{ github.repository }}
- **Run**: [#${{ github.event.workflow_run.run_number }}](${{ github.event.workflow_run.html_url }})
- **Run ID**: ${{ github.event.workflow_run.id }}
- **Conclusion**: ${{ github.event.workflow_run.conclusion }}
- **Head SHA**: ${{ github.event.workflow_run.head_sha }}

## Instructions

### Step 1: Download the agent-artifacts artifact

```bash
gh run download ${{ github.event.workflow_run.id }} \
--name agent-artifacts \
--dir /tmp/agent-artifacts \
--repo ${{ github.repository }} 2>&1
echo "exit: $?"
```

**If this command fails** (artifact does not exist), the run did not come from an agent
workflow or the gh-aw firewall was not enabled. Exit silently - produce no output.

### Step 2: Download the run logs

```bash
gh run view ${{ github.event.workflow_run.id }} \
--log \
--repo ${{ github.repository }} > /tmp/run-logs.txt 2>&1
wc -l /tmp/run-logs.txt
```

If the log download fails, continue with token analysis only. Note the failure in the
diagnosis.

### Step 3: Scan run logs for anomalies

Read `/tmp/run-logs.txt` and scan for the following patterns. Record every match with its
line number and a short excerpt (≤ 120 characters).

**Error signals**

```bash
grep -in "error\|exception\|fatal\|failed\|failure" /tmp/run-logs.txt | head -40
```

**Timeout and rate-limit signals**

```bash
grep -in "timeout\|timed out\|rate.limit\|429\|too many requests\|context deadline" /tmp/run-logs.txt | head -20
```

**Retry and loop signals** (repeated tool calls are the most common agent failure mode)

```bash
grep -in "retry\|retrying\|attempt [0-9]\|tool_call\|function_call" /tmp/run-logs.txt | head -40
```

Count how many times each distinct tool name appears across all tool call lines. Flag any
tool called more than 5 times as a **possible retry loop**.

**Truncation signals**

```bash
grep -in "context.window\|max.token\|truncat\|token limit" /tmp/run-logs.txt | head -20
```

### Step 4: Analyse token-usage.jsonl

Read token data:

```bash
cat /tmp/agent-artifacts/sandbox/firewall/logs/api-proxy-logs/token-usage.jsonl 2>/dev/null
```

Each line is a JSON object:

```json
{"model":"claude-sonnet-4-5","input_tokens":1200,"output_tokens":340,"cache_read_input_tokens":500,"cache_creation_input_tokens":100}
```

Calculate the following metrics across all lines:

| Metric | Formula | Flag if… |
|--------|---------|----------|
| **Output ratio** | total_output / total_input | > 0.5 (agent producing more than it reads) |
| **Cache efficiency** | cache_read / (cache_read + cache_creation) | < 0.2 on runs with > 5000 total tokens |
| **Total tokens** | sum of all token fields | > 100 000 (high-cost run) |
| **Model count** | distinct model names | > 2 (unexpected model mixing) |

Flagged metrics are anomalies - include them in the diagnosis.

### Step 5: Determine run health

Assign one of three health levels:

| Level | Criteria |
|-------|----------|
| ✅ **Healthy** | No errors, no flagged metrics, conclusion is `success` |
| ⚠️ **Degraded** | Warnings or flagged metrics present, but conclusion is `success` |
| ❌ **Failed** | Conclusion is `failure` or `cancelled`, or critical errors found |

### Step 6: Find the associated pull request

```bash
gh api "repos/${{ github.repository }}/actions/runs/${{ github.event.workflow_run.id }}" \
--jq '.pull_requests[0].number // empty'
```

### Step 7: Post the diagnosis

Build the report using this template. Fill in `$HEALTH`, `$SUMMARY`, and the findings
tables:

```markdown
## Agent run diagnosis $HEALTH

| | |
|---|---|
| **Run** | [#${{ github.event.workflow_run.run_number }}](${{ github.event.workflow_run.html_url }}) |
| **Conclusion** | ${{ github.event.workflow_run.conclusion }} |
| **Health** | $HEALTH |

$SUMMARY

<details>
<summary>Log findings</summary>

| Category | Count | Sample |
|----------|------:|-------|
[one row per finding category that had matches; omit empty categories]

</details>

<details>
<summary>Token anomalies</summary>

| Metric | Value | Status |
|--------|------:|-------|
[one row per metric from Step 4; mark anomalies with ⚠️]

</details>

*Logs and token data from gh-aw's firewall artifact.*
```

**$SUMMARY** should be 1-3 plain-English sentences that state what happened and, if the
run is degraded or failed, the most likely cause.

**If a PR number was found**: post as a comment on that PR using `add_comment`.

**If no PR was found**: create an issue using `create_issue` with title:
`[log-watcher] #${{ github.event.workflow_run.run_number }}: $HEALTH`

### Step 8: Critical anomaly alert (optional)

If health is ❌ **Failed** AND total tokens exceed 50 000 (high-cost failure), create a
second issue using `create_issue` with title:
`[log-watcher] High-cost failure: run #${{ github.event.workflow_run.run_number }}`

Include the full diagnosis and a direct link to the run. Adjust the 50 000-token threshold
in the workflow to match your budget.

## Guidelines

- **Silent on non-agent runs**: If the artifact does not exist, produce no output at all.
- **One report per run**: Do not create more than one comment or issue per triggering run.
- **Healthy runs are brief**: If health is ✅, keep the report short - one-line summary,
collapsed details. Do not create noise for runs that are working fine.
- **Be specific**: When flagging an error, quote the relevant log line. Vague warnings are
not useful.
- **No retries**: Exit silently on transient download failures; the next run produces its
own report.
Loading