You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Analysis Period: Last 30 days Total PRs: 1,000 | Merged: 771 (78.0%) | Closed: 214 (21.4%) | Open: 15 (1.5%)
Today's overall success rate of 78.0% is consistent with recent historical averages, continuing a stable pattern since mid-April.
Prompt Categories and Success Rates
Category
Total
Merged
Success Rate
Feature Addition
39
32
82.1% ✅
Other
16
13
81.3% ✅
Bug Fix
598
470
78.6%
Testing
289
225
77.9%
Documentation
35
26
74.3%
Refactoring
8
5
62.5%⚠️
Note: Today's dataset shows a significant shift — testing PRs jumped from ~1% to 29% of all PRs. Bug fix PRs dropped from ~89% to 60%. This likely reflects a recent change in workflow focus.
PR #30210: "Four workflows were generating [aw] X failed issues because the agent finished successfully without calling any safe-output tool..." → Merged ✅
PR #30199: "Fixes the CI lint failure in [run link]. The job failed because..." → Merged ✅
PR #30198: "GH_AW_INFO_VERSION was only set in the generate_aw_info step's env block, which runs afterSetup Scripts..." → Merged ✅
❌ Unsuccessful Prompt Patterns
Common characteristics in closed PRs:
Average prompt length: 1,140 words (slightly shorter)
PR #30211: "The Design Decision Gate agent was hitting the max-turns: 12 hard limit on every run..." → Closed ❌
PR #30165: "All 10 MCP tool handlers returned nil as their structured output, forcing AI clients to re-parse..." → Closed ❌
PR #30070: "Several functions making GitHub API calls for action SHA resolution used hardcoded context.Background()..." → Closed ❌
Key Insights
Feature prompts outperform bug fix prompts (82.1% vs 78.6%) when they clearly describe what is being added and why. Feature PRs that specify the missing capability and its impact tend to merge well.
Merged prompts are ~8% longer (1,230 vs 1,140 words) — more detail and context appears to correlate with success. Prompts referencing specific code paths (string, model, span, template) perform better than vague meta-descriptions (plan, progress, details).
Refactoring PRs have the lowest success rate (62.5%), consistent with previous weeks. Structural/refactoring changes face more scrutiny and are more likely to be superseded or deprioritized.
Recommendations
DO: Reference the specific error, output, or behavior being changed — use concrete identifiers like function names, step IDs, or error messages rather than general descriptions.
DO: For feature additions, specify both the missing capability AND its impact (e.g., "X was missing, causing Y — this PR adds Z").
AVOID: Meta-planning language (plan, resolve, progress, details) in prompt bodies — these correlate with closed PRs and suggest the prompt was written before the solution was fully understood.
Historical Trends
Date
PRs
Success Rate
Notable
2026-05-04
1,000
78.0%
Testing PRs surge (29%)
2026-04-30
1,000
77.8%
Bug fix dominant (88%)
2026-04-27
1,000
78.0%
Feature PRs high (18%)
2026-04-26
1,000
78.1%
Stable pattern
2026-04-25
1,000
78.2%
Stable pattern
2026-04-24
1,000
78.1%
Stable pattern
2026-04-23
1,000
78.8%
Recent high
Trend: Success rates have been remarkably stable at ~78% over the past 2 weeks. Today's notable shift is the dramatic increase in testing-related PRs (from ~0.5% to 29%), possibly reflecting a focused testing sprint. The overall rate held steady despite this compositional change, indicating testing PRs merge at similar rates to bug fix PRs.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Analysis Period: Last 30 days
Total PRs: 1,000 | Merged: 771 (78.0%) | Closed: 214 (21.4%) | Open: 15 (1.5%)
Today's overall success rate of 78.0% is consistent with recent historical averages, continuing a stable pattern since mid-April.
Prompt Categories and Success Rates
Prompt Analysis
✅ Successful Prompt Patterns
Common characteristics in merged PRs:
changes,missing,string,added,model,experiment,span,comment,error,templatefix,add,implement,upgradeExample successful prompts:
[aw] X failedissues because the agent finished successfully without calling any safe-output tool..." → Merged ✅GH_AW_INFO_VERSIONwas only set in thegenerate_aw_infostep's env block, which runs afterSetup Scripts..." → Merged ✅❌ Unsuccessful Prompt Patterns
Common characteristics in closed PRs:
workflows,github,copilot,smoke,description,details,lock,plan,resolve,progressplan,progress,details,description); scope creep (broad workflow/system changes)Example unsuccessful prompts:
max-turns: 12hard limit on every run..." → Closed ❌nilas their structured output, forcing AI clients to re-parse..." → Closed ❌context.Background()..." → Closed ❌Key Insights
string,model,span,template) perform better than vague meta-descriptions (plan,progress,details).Recommendations
plan,resolve,progress,details) in prompt bodies — these correlate with closed PRs and suggest the prompt was written before the solution was fully understood.Historical Trends
Trend: Success rates have been remarkably stable at ~78% over the past 2 weeks. Today's notable shift is the dramatic increase in testing-related PRs (from ~0.5% to 29%), possibly reflecting a focused testing sprint. The overall rate held steady despite this compositional change, indicating testing PRs merge at similar rates to bug fix PRs.
References:
Beta Was this translation helpful? Give feedback.
All reactions