[agentic-optimization-kit] Weekly Agentic Optimization Report — 2026-04-28 #28810

2026-04-27T22:12:57Z

github-actions[bot]
Bot Apr 27, 2026

📊 Executive Summary

Period: 2026-04-26 to 2026-04-27 (7-day window, data reflects 2-day capture)
Total runs: 140 | Total tokens: 78,315,611 | Total cost: $0.00 | Action minutes: 962
Active workflows: 74 | Episodes analyzed: 141 | High-confidence episodes: 141
Heavy-hitters (>30% token share): None — token load is well-distributed across 74 workflows
Optimization target this week: Agentic Observability Kit — estimated savings: ~600,000 tokens/run

No risky runs, no escalation-eligible episodes. Repository is operating within safe operational bounds.

📈 Visual Diagnostics

1. Token Usage by Workflow

Decision: Test Quality Sentinel and Contribution Check lead at 6.2M tokens each; every workflow is below the 30% dominance threshold, indicating a healthy distribution.

2. Historical Token Trend

Decision: 18 daily data points are available; the token trend is rising week-over-week as more workflows are added, but per-run averages remain stable.

3. Episode Risk–Cost Frontier

Decision: Agentic Optimization Kit and Agentic Observability Kit sit at the frontier — highest token cost AND highest composite risk scores driven by 50 blocked requests and resource-heavy/poor-control assessments.
Why it matters: Cost and risk are co-located in the same two workflows; addressing either one also reduces the other. No escalation threshold has been crossed, but both are strong optimization candidates.

4. Workflow Stability Matrix

Decision: Instability is concentrated in single-run workflows (Agent Persona Explorer, Agentic Observability Kit, Draft PR Cleanup) — these show high resource-heavy rates but zero risky or MCP-failure signals.
Why it matters: Single-run instability scores are noisy; only Test Quality Sentinel (20 runs, consistently partially_reducible) provides statistically reliable signal for action.

5. Repository Portfolio Map

Decision: 32 workflows in simplify, 36 in review, 5 in keep, 1 in optimize (Test Quality Sentinel). Most single-run workflows land in review due to insufficient run history.
Why it matters: The dominant portfolio tradeoff is run-frequency vs. token cost: low-frequency expensive workflows are the primary levers; high-frequency cheap ones (Smoke CI, small triage agents) are already right-sized.

🚨 Escalation Targets

No escalation thresholds crossed. Zero risky run classifications, zero escalation-eligible episodes, zero MCP failures, zero new blocked-request increases across all 141 episodes. Repository is clean.

🎯 Optimization Target: Agentic Observability Kit

Why selected: Highest total tokens among non-recently-optimized, non-self-referential workflows (1,986,286 tokens in 1 run). Carries 4 distinct agentic assessments: resource_heavy_for_domain (high), poor_agentic_control (medium), partially_reducible (medium), model_downgrade_available (low).

Rank	Recommendation	Est. Savings/Run	Action
1	Pre-download 30-day log data in a pre-step instead of in-agent	~400,000 tokens	Add `agenticworkflows logs --count 400 --start_date -30d > /tmp/gh-aw/agent/logs30d.json` as a pre-step; pass file path to agent
2	Cap MCP `logs` count to 200 and remove the duplicate `logs` call	~250,000 tokens	Reduce `count: 400` to `count: 200`; run shows 2 `logs` calls — merge into one pre-step invocation
3	Route to a smaller model for chart-generation and table-formatting turns	~150,000 tokens	`model_downgrade_available` assessment confirms eligibility; add model routing or split heavy analysis to a sub-step
4	Reduce `poor_agentic_control` by defining explicit section checkpoints in the prompt	~80,000 tokens	Break 33-turn free-form run into 3–4 structured sub-tasks with explicit expected output per step
5	Enable `mcp-scripts` pre-step to pre-load baseline data deterministically	~120,000 tokens	`agentic_fraction: 0.06` confirms 94% of work is non-agentic; deterministic pre-steps can replace most data collection turns

💡 5 Actionable Prompts

🔧 Prompt 1 — Optimization (Agentic Observability Kit, highest ROI)

Optimize the workflow file `.github/workflows/agentic-observability-kit.md` to reduce token usage by ~600,000 tokens/run.

The run (ID: 24986097479) used 1,986,286 tokens in 33 turns with cache efficiency of 49%. The `agentic_fraction` is only 0.06, meaning 94% of work is deterministic data collection.

Make these changes:
1. Add a `pre-steps:` section that runs `agenticworkflows logs --count 200 --start_date -30d > /tmp/gh-aw/agent/logs30d.json` before the agent starts. This eliminates the two in-agent `logs` MCP calls, saving ~400,000 tokens.
2. Update the agent prompt to reference `/tmp/gh-aw/agent/logs30d.json` instead of calling `logs` directly.
3. Reduce `count` from 400 to 200 — the repository has ~140 runs/week, so 200 covers the 30-day window adequately.

Expected savings: ~600,000 tokens/run (~30% reduction).
Evidence: run 24986097479 shows 2 `logs` MCP tool calls consuming the bulk of context.

🛡️ Prompt 2 — Stability Fix (Test Quality Sentinel, repeat partially_reducible)

Fix the `partially_reducible` pattern in `.github/workflows/test-quality-sentinel.md`. This workflow has 10 out of 20 runs flagged `partially_reducible` with `agentic_fraction` below 0.5.

The pattern: the agent is spending turns fetching test run data, listing files, and reading PR metadata — all of which are deterministic operations.

Make these changes:
1. Add `pre-steps:` to fetch test run artifacts and PR metadata into `/tmp/gh-aw/agent/` before the agent starts.
2. Pass the pre-fetched data paths in the agent prompt so the agent only performs analysis and judgment turns.
3. This should reduce average turns from 7.05 to 3–4, saving ~150,000 tokens/run × 20 runs = ~3,000,000 tokens/week.

Evidence: 10 runs flagged `partially_reducible` across 7 days, consistent across all run attempts.

🔀 Prompt 3 — Consolidation (Issue Monster + Auto-Triage Issues, top overlap pair)

Consolidate `issue-monster.md` and `auto-triage-issues.md` — both operate in the `issue_response`/`triage` domain with overlapping behavior fingerprints (exploratory execution, selective_write actuation, standalone dispatch).

Overlap signals:
- Same task domain family (issue handling)
- Both have `partially_reducible` and `overkill_for_agentic` assessments
- Auto-Triage Issues has 4 `overkill_for_agentic` flags — the strongest overkill signal in the repository

Action:
1. Keep `issue-monster.md` as the base (broader capability set).
2. Absorb the triage-routing logic from `auto-triage-issues.md` into a conditional section in `issue-monster.md`.
3. Retire `auto-triage-issues.md` and point its trigger to `issue-monster.md`.

Expected benefit: Eliminates one full workflow's token footprint (~400k tokens/run) and removes the top `overkill_for_agentic` offender.

✂️ Prompt 4 — Right-sizing (Auto-Triage Issues, overkill workflow)

Right-size `.github/workflows/auto-triage-issues.md`. This workflow is the only one in the repository with repeated `overkill_for_agentic` assessments (4 runs flagged).

Why it's overkill:
- Domain: `triage` — the expected outcome is a label assignment or routing decision
- Tool breadth: broad (GitHub tools + MCP)
- Cost profile: high turns relative to task complexity
- `agentic_fraction` suggests most turns are retrieving context that could be passed deterministically

Recommended right-sizing:
1. Replace the AI agent with a deterministic GitHub Actions step using `actions/github-script` that reads issue labels and body keywords, then applies routing rules.
2. Reserve the AI agent only for ambiguous issues (no matching keywords) — add an `if:` condition to skip the agent when keyword matching succeeds.
3. Expected reduction: eliminate AI agent usage for ~70% of invocations, saving the full ~400k tokens/run for those cases.

🚀 Prompt 5 — Portfolio Maintenance (review-quadrant workflows)

Audit the 36 workflows currently in the `review` quadrant (low value proxy, high avg token cost) of the repository portfolio map.

These workflows share a common pattern: they ran only once in the last 7 days and have no repeat-use history that establishes value. Many are in the `repo_maintenance` domain (Glossary Maintainer, Layout Specification Maintainer, Daily Community Attribution Updater, Slide Deck Maintainer).

Action plan for the repository owner:
1. For each `repo_maintenance` workflow that ran once: check if the output (PR, discussion, comment) had any human follow-up. Workflows with no follow-up signal low value.
2. For workflows where the output was auto-merged or auto-closed without review, convert to deterministic automation (shell script + cron) to eliminate AI inference cost.
3. Priority candidates for conversion: any workflow with `actuation_style: selective_write`, `agentic_fraction < 0.3`, and only 1 run in the window — these are doing mostly scripted work with unnecessary AI overhead.

This audit could identify 10–15 workflows for retirement or downgrade, potentially saving 5–8M tokens/week.

Full Per-Workflow Baseline Breakdown (7 days)

Workflow	Runs	Avg Tokens/Run	Total Tokens	Avg Turns	Flags
Test Quality Sentinel	20	312,279	6,245,581	7.1	expensive
Contribution Check	6	1,035,602	6,213,609	32.0	expensive
Daily Syntax Error Quality Check	1	3,399,393	3,399,393	56.0	expensive
Agentic Optimization Kit	1	3,358,323	3,358,323	52.0	expensive
Daily Community Attribution Updater	1	3,239,894	3,239,894	54.0	expensive
Package Specification Extractor	1	3,003,796	3,003,796	51.0	expensive
Documentation Noob Tester	1	2,851,307	2,851,307	54.0	expensive
Daily Firewall Logs Collector and Reporter	1	2,486,927	2,486,927	35.0	expensive
Agentic Observability Kit	1	1,986,286	1,986,286	33.0	expensive
The Daily Repository Chronicle	1	1,942,276	1,942,276	35.0	expensive
Architecture Diagram Generator	1	1,912,218	1,912,218	32.0	expensive
Copilot PR Conversation NLP Analysis	1	1,885,397	1,885,397	31.0	expensive
Weekly Safe Outputs Specification Review	1	1,881,993	1,881,993	36.0	expensive
Code Simplifier	1	1,713,521	1,713,521	27.0	expensive
Dead Code Removal Agent	1	1,610,847	1,610,847	27.0	expensive
Draft PR Cleanup	1	1,016,517	1,016,517	9.0	expensive
Super Linter Report	1	919,046	919,046	14.0	expensive
Organization Health Report	1	848,419	848,419	17.0	expensive
Workflow Skill Extractor	3	272,869	818,607	9.0	expensive
Refactoring Cadence	1	757,823	757,823	16.0	expensive
PR Triage Agent	4	172,434	689,736	6.5	expensive
Issue Monster	3	223,244	669,731	11.7	expensive
Auto-Triage Issues	4	153,178	612,710	8.0	expensive
Issue Triage Agent	3	189,773	569,318	7.7	expensive
Architecture Guardian	1	548,340	548,340	8.0	expensive
Daily Observability Report	3	163,459	490,378	7.0	expensive
Daily Hippo Learn	1	460,267	460,267	8.0	expensive
Daily Regulatory Report Generator	1	447,200	447,200	9.0	expensive
Daily MCP Tool Concurrency Analysis	1	419,124	419,124	8.0	expensive
Smoke CI	14	27,282	381,954	1.5	normal
Copilot CLI Deep Research Agent	1	373,978	373,978	6.0	expensive
Daily Workflow Updater	1	355,617	355,617	7.0	expensive
Slide Deck Maintainer	1	340,069	340,069	5.0	expensive
Delight	1	338,097	338,097	5.0	expensive
Daily DIFC Integrity-Filtered Events Analyzer	1	328,929	328,929	5.0	expensive
Copilot Token Usage Optimizer	1	321,777	321,777	5.0	expensive
Layout Specification Maintainer	1	288,742	288,742	7.0	expensive
Daily Copilot PR Merged Report	1	285,750	285,750	4.0	expensive
Daily Malicious Code Scan Agent	1	282,803	282,803	4.0	expensive
Visual Regression Checker	1	275,380	275,380	6.0	expensive
MCP Inspector Agent	1	252,619	252,619	5.0	expensive
Glossary Maintainer	1	244,760	244,760	5.0	expensive
Weekly Issue Summary	1	237,668	237,668	3.0	expensive
Agent Persona Explorer	1	221,756	221,756	3.0	expensive
Daily Compiler Quality Check	3	73,175	219,524	2.7	normal
Copilot Opt	1	193,543	193,543	3.0	expensive
Sub-Issue Closer	1	187,498	187,498	3.0	expensive
Daily Observability Diff	3	57,985	173,955	2.7	normal
Metrics Collector - Infrastructure Agent	1	155,888	155,888	2.0	expensive
Daily DIFC Encrypted Channels Analyzer	1	138,083	138,083	2.0	expensive

Episode Detail

All 141 episodes are standalone with high confidence. No multi-run episodes or DAG lineage detected — each workflow run is independent with no shared lineage markers.

Top episodes by composite risk score (weighted: risky×1.0, poor_control×1.2, mcp_fail×1.2, blocked×1.0, new_mcp_fail×1.4, blocked_increase×1.4, escalation×2.0):

Episode (Primary Workflow)	Risk Score	Tokens	Blocked Reqs	Resource Heavy	Poor Control	Escalation
Agentic Optimization Kit	64.0	3,358,323	50	✅	✅	❌
Agentic Observability Kit	51.2	1,986,286	50	✅	✅	❌
Copilot PR Conversation NLP Analysis	50.0	1,885,397	50	✅	✅	❌
Daily Firewall Logs Collector and Reporter	47.0	2,486,927	50	✅	—	❌
Documentation Noob Tester	47.0	2,851,307	50	✅	—	❌

The blocked_request_count of 50 appearing in multiple high-token workflows suggests a shared network firewall limit is being hit consistently. This is expected behavior (firewall enforcement), not a regression, but it contributes to episode risk scores.

Zero episodes are escalation_eligible. Zero risky run classifications. Zero MCP failures.

Portfolio Opportunities

Overkill workflows (repeated overkill_for_agentic assessments):

Auto-Triage Issues (4/4 runs flagged) — strongest overkill signal; candidate for deterministic replacement

Partially reducible workflows (consistent across runs):

Test Quality Sentinel: 10/20 runs — highest volume, most impactful target
Contribution Check: 6/6 runs — every run flagged, consistent pattern
PR Triage Agent: 4/4 runs — consistent pattern
Issue Monster: 3/3 runs — consistent pattern
Auto-Triage Issues: 3/4 runs

Overlap pairs (same domain + behavior cluster):

triage domain: Auto-Triage Issues ↔ Issue Triage Agent ↔ PR Triage Agent (3-way overlap; all standalone, exploratory, selective_write)
issue_response domain: Issue Monster ↔ Sub-Issue Closer ↔ MCP Inspector Agent ↔ Agentic Observability Kit
repo_maintenance domain: Glossary Maintainer ↔ Layout Specification Maintainer ↔ Slide Deck Maintainer (similar output shape and schedule)

Stale workflows (1 run in window, 0 follow-up signals): 66 workflows ran only once. Without a longer window, stale/new cannot be distinguished — revisit in 4 weeks with a 30-day dataset.

Optimization Analysis Detail: Agentic Observability Kit

Run analyzed: 24986097479 | 2026-04-27 | 1,986,286 tokens | 33 turns | 29 action minutes

Behavior fingerprint: execution_style=exploratory, tool_breadth=narrow, actuation_style=selective_write, resource_profile=heavy, dispatch_mode=standalone, agentic_fraction=0.06

Tool usage (from episode tool_calls): logs (2 calls) → create_discussion (1) → upload_asset (4)

Area	Finding	Verdict
Tool usage	Only `agenticworkflows.logs`, `safeoutputs.*` tools used. Tool set is minimal and appropriate. However, `logs` is called twice (once per data slice), consuming ~1.9M input tokens.	Keep all tools; consolidate `logs` calls into pre-step
Token efficiency	1,963,317 input tokens, 1,883,658 cache reads (49% cache efficiency). Low cache hit rate vs. total context. Two large `logs` payloads dominate input token cost.	Pre-download data to eliminate runtime MCP payload overhead
Reliability	50 blocked_request_count — firewall enforcement operating normally. 0 errors, 0 warnings, 0 MCP failures.	Reliable; blocked requests are expected behavior
Prompt efficiency	`poor_agentic_control` (medium) + 33 turns suggests the 26KB prompt is generating many back-and-forth data-fetch cycles. `agentic_fraction=0.06` confirms only 2 turns are truly agentic judgment.	Break prompt into 3 explicit phases with expected outputs; move 94% of data collection to pre-steps

Assessment details:

resource_heavy_for_domain (high): 1,986,286 tokens for issue_response domain is extremely heavy; typical issue response is 50–200k
poor_agentic_control (medium): 33 turns with only 0.06 agentic fraction indicates many small exploratory turns
partially_reducible (medium): 94% of turns are data-gathering replaceable with deterministic steps
model_downgrade_available (low): model is capable of downgrading for formatting/charting subtasks

References:

§24986097479 — Agentic Observability Kit run analyzed
§25020795697 — This optimization run

Generated by Agentic Optimization Kit · ● 3.9M · ◷

expires on May 4, 2026, 10:12 PM UTC

2026-05-04T22:41:37Z

github-actions[bot]
Bot May 4, 2026
Author

This discussion was automatically closed because it expired on 2026-05-04T22:12:57.011Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[agentic-optimization-kit] Weekly Agentic Optimization Report — 2026-04-28 #28810

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[agentic-optimization-kit] Weekly Agentic Optimization Report — 2026-04-28 #28810

Uh oh!

github-actions[bot] Bot Apr 27, 2026

📊 Executive Summary

📈 Visual Diagnostics

1. Token Usage by Workflow

2. Historical Token Trend

3. Episode Risk–Cost Frontier

4. Workflow Stability Matrix

5. Repository Portfolio Map

🚨 Escalation Targets

🎯 Optimization Target: Agentic Observability Kit

💡 5 Actionable Prompts

🔧 Prompt 1 — Optimization (Agentic Observability Kit, highest ROI)

🛡️ Prompt 2 — Stability Fix (Test Quality Sentinel, repeat partially_reducible)

🔀 Prompt 3 — Consolidation (Issue Monster + Auto-Triage Issues, top overlap pair)

✂️ Prompt 4 — Right-sizing (Auto-Triage Issues, overkill workflow)

🚀 Prompt 5 — Portfolio Maintenance (review-quadrant workflows)

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 4, 2026 Author

github-actions[bot]
Bot Apr 27, 2026

github-actions[bot]
Bot May 4, 2026
Author