Skip to content

[aw-failures] GitHub App installation rate limit exhaustion from co-scheduled workflows at 23:44 UTC #27251

@github-actions

Description

@github-actions

Problem Statement

Two co-scheduled workflows that fire at the same cron time (23:44 UTC) exhausted the GitHub App installation-level API rate limit during guard/firewall policy determination, causing both to fail before any agent work was done.

Root Cause

Explicit error from Daily Observability Report (§24642041999):

Failed to determine automatic guard policy: API rate limit exceeded for installation.
Request ID: BC40:E2DAD:30788C:C36940:69E5694A
Timestamp: 2026-04-19 23:46:18 UTC

The guard policy check hits the GitHub REST API using the shared installation token. When multiple workflows start concurrently, their pre-agent firewall/guard initialization steps race to consume the same installation rate limit budget.

Daily Safe Output Tool Optimizer (§24642045134) failed at the same second with only exit code 1 — 0 tokens used, 0 tool calls, 2.1m duration. Audit confirmed stable behavior vs. prior success, indicating an infrastructure-level pre-agent failure consistent with the same rate limit event.

Both workflows were triggered by the same scheduled push to SHA 5285e620ea8613b56ed99803643062193bfbdafd at 23:44 UTC.

Affected Workflows & Run IDs

Workflow Run Engine Created (UTC) Conclusion
Daily Observability Report for AWF Firewall and MCP Gateway §24642041999 codex 2026-04-19 23:44:31 ❌ failure
Daily Safe Output Tool Optimizer §24642045134 claude 2026-04-19 23:44:41 ❌ failure

Proposed Remediation

  1. Stagger cron schedules — offset co-scheduled daily workflows by 3–5 minutes so their guard/firewall initialization steps do not compete for the same installation rate limit bucket simultaneously. These two workflows currently share the same cron time (44 23 * * * or equivalent).
  2. Add retry with backoff at the guard policy fetch layer — a single transient rate limit should be retryable rather than fatal.
  3. Investigate rate limit headroom — if many workflows fire at the same time daily, a broader audit of shared-schedule clustering may be needed.

Success Criteria

  • Next scheduled run of both workflows completes successfully with guard policy determined
  • No API rate limit exceeded for installation errors in the 23:44–00:00 UTC window after cron schedules are staggered

References

References:

Generated by [aw] Failure Investigator (6h) · ● 328.8K ·

  • expires on Apr 27, 2026, 1:21 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions