Problem Statement
Two co-scheduled workflows that fire at the same cron time (23:44 UTC) exhausted the GitHub App installation-level API rate limit during guard/firewall policy determination, causing both to fail before any agent work was done.
Root Cause
Explicit error from Daily Observability Report (§24642041999):
Failed to determine automatic guard policy: API rate limit exceeded for installation.
Request ID: BC40:E2DAD:30788C:C36940:69E5694A
Timestamp: 2026-04-19 23:46:18 UTC
The guard policy check hits the GitHub REST API using the shared installation token. When multiple workflows start concurrently, their pre-agent firewall/guard initialization steps race to consume the same installation rate limit budget.
Daily Safe Output Tool Optimizer (§24642045134) failed at the same second with only exit code 1 — 0 tokens used, 0 tool calls, 2.1m duration. Audit confirmed stable behavior vs. prior success, indicating an infrastructure-level pre-agent failure consistent with the same rate limit event.
Both workflows were triggered by the same scheduled push to SHA 5285e620ea8613b56ed99803643062193bfbdafd at 23:44 UTC.
Affected Workflows & Run IDs
| Workflow |
Run |
Engine |
Created (UTC) |
Conclusion |
| Daily Observability Report for AWF Firewall and MCP Gateway |
§24642041999 |
codex |
2026-04-19 23:44:31 |
❌ failure |
| Daily Safe Output Tool Optimizer |
§24642045134 |
claude |
2026-04-19 23:44:41 |
❌ failure |
Proposed Remediation
- Stagger cron schedules — offset co-scheduled daily workflows by 3–5 minutes so their guard/firewall initialization steps do not compete for the same installation rate limit bucket simultaneously. These two workflows currently share the same cron time (
44 23 * * * or equivalent).
- Add retry with backoff at the guard policy fetch layer — a single transient rate limit should be retryable rather than fatal.
- Investigate rate limit headroom — if many workflows fire at the same time daily, a broader audit of shared-schedule clustering may be needed.
Success Criteria
- Next scheduled run of both workflows completes successfully with guard policy determined
- No
API rate limit exceeded for installation errors in the 23:44–00:00 UTC window after cron schedules are staggered
References
References:
Generated by [aw] Failure Investigator (6h) · ● 328.8K · ◷
Problem Statement
Two co-scheduled workflows that fire at the same cron time (23:44 UTC) exhausted the GitHub App installation-level API rate limit during guard/firewall policy determination, causing both to fail before any agent work was done.
Root Cause
Explicit error from Daily Observability Report (§24642041999):
The guard policy check hits the GitHub REST API using the shared installation token. When multiple workflows start concurrently, their pre-agent firewall/guard initialization steps race to consume the same installation rate limit budget.
Daily Safe Output Tool Optimizer (§24642045134) failed at the same second with only
exit code 1— 0 tokens used, 0 tool calls, 2.1m duration. Audit confirmed stable behavior vs. prior success, indicating an infrastructure-level pre-agent failure consistent with the same rate limit event.Both workflows were triggered by the same scheduled push to SHA
5285e620ea8613b56ed99803643062193bfbdafdat 23:44 UTC.Affected Workflows & Run IDs
Proposed Remediation
44 23 * * *or equivalent).Success Criteria
API rate limit exceeded for installationerrors in the 23:44–00:00 UTC window after cron schedules are staggeredReferences
References: