[copilot-cli-research] Copilot CLI Deep Research - 2026-05-07 #30757

2026-05-07T04:57:47Z

github-actions[bot]
Bot May 7, 2026

Analysis Date: 2026-05-07
Repository: github/gh-aw
Scope: 217 total workflows, 96 using Copilot engine (44%), 17th consecutive research run

📊 Executive Summary

Research Topic: Copilot CLI Optimization Opportunities
Key Findings:

startup-timeout and tool-timeout remain at 0% adoption for 17 consecutive runs — the most persistent unaddressed gap in the repository
51 Copilot workflows (53%) operate without network restrictions, including 33 with code-editing capabilities (edit/bash tools)
5 custom agent files remain entirely unused since they were created (April 2026), representing wasted maintenance burden
max-continuations adoption is stagnant at 2 workflows despite being a Copilot-exclusive feature with high impact for long-running tasks
Model selection is implicit in 93/96 Copilot workflows — no explicit model specified, relying on org-level defaults

This is the 17th consecutive run of this research workflow. Persistent gaps from previous runs remain unaddressed. The repository has grown from 93 to 96 Copilot workflows since last analysis, but feature adoption rates have barely changed.

Critical Findings

🔴 High Priority Issues

1. Timeout Configuration Never Used (17 consecutive runs)
startup-timeout and tool-timeout have been at 0% adoption across every single analysis run since this workflow began. These settings control how long Copilot CLI waits for MCP tool responses, directly preventing hung workflow runs that waste GitHub Actions minutes. Every complex workflow that calls external MCP servers is at risk of indefinite hangs.

2. 51 Copilot Workflows Without Network Restrictions
Of 96 Copilot workflows, 51 (53%) have no network: configuration. Critically, 33 of these have edit or bash tools — meaning the agent can make code changes AND potentially reach arbitrary external endpoints. Only 11 workflows use the AWF sandbox.

3. 10 Workflows with Unrestricted bash: true
These workflows give Copilot CLI unrestricted shell access: artifacts-summary.md, daily-cli-performance.md, daily-secrets-analysis.md, daily-sentrux-report.md, daily-workflow-updater.md, dead-code-remover.md, mcp-inspector.md, q.md, refactoring-cadence.md, video-analyzer.md. All should be scoped to specific command patterns.

🟡 Medium Priority Opportunities

4. max-continuations Adoption at 2/96 (Copilot-exclusive feature)
Only smoke-copilot.md and test-quality-sentinel.md use max-continuations. This is the only way to allow Copilot to run multi-step tasks beyond a single autopilot continuation. Scheduled analysis workflows like daily-malicious-code-scan.md, daily-architecture-diagram.md, and repository-quality-improver.md could produce significantly more thorough results.

5. 5 Unused Custom Agent Files (Since April)
The following agent files exist in .github/agents/ but are referenced by zero workflows:

grumpy-reviewer.agent.md — code review persona
w3c-specification-writer.agent.md — spec writing
create-safe-output-type.agent.md — safe output scaffolding
custom-engine-implementation.agent.md — engine development
interactive-agent-designer.agent.md — interactive agent design

These should either be wired to workflows or removed to reduce confusion.

View Full Analysis

1️⃣ Current State Analysis

View Copilot CLI Capabilities Inventory

Copilot CLI Capabilities Inventory

Engine Configuration Options (from pkg/workflow/copilot_engine_execution.go):

engine.version — Pin CLI version (e.g., "0.0.422")
engine.model — Override model via COPILOT_MODEL env var
engine.agent — Pass --agent <id> to use a .github/agents/ file
engine.args — Additional raw CLI arguments
engine.bare — Add --no-custom-instructions to skip AGENTS.md/context loading
engine.harness — Replace the built-in copilot_harness.cjs retry wrapper
engine.api-target — Custom GHEC/GHES API endpoint hostname
engine.command — Custom executable path

Copilot-Exclusive Features:

max-continuations — Autopilot mode via --autopilot --max-autopilot-continues N
tools.startup-timeout — GH_AW_STARTUP_TIMEOUT env var (seconds to wait for MCP tools)
tools.tool-timeout — GH_AW_TOOL_TIMEOUT env var (timeout per MCP tool call)
AWF sandbox (agent: awf) — Full network firewall with allowlist
BYOK mode — Via COPILOT_PROVIDER_* env vars

Always-On Flags (set by compiler automatically):

--add-dir /tmp/gh-aw/ — Grants access to workflow artifacts
--log-level all --log-dir <logsFolder> — Full logging for audit
--disable-builtin-mcps — Disables Copilot's own built-in MCPs (uses gh-aw gateway instead)
--add-dir "${GITHUB_WORKSPACE}" — Grants access to repo files

View Usage Statistics

Usage Statistics (Copilot workflows, n=96)

Feature	Count	%	Trend
Network config	45	47%	→ stable
Strict mode	62	65%	↑ +5 from last run
Safe-outputs	78	81%	↑ growing
Cache-memory tool	30	31%	→ stable
Sandbox (AWF)	11	11%	→ stable
Custom agent (any)	22	23%	→ stable
Version pinning	14	15%	↑ up from 6
Bare mode	9	9%	→ stable
Max-continuations	2	2%	→ stagnant
Model specified	3	3%	→ stable
startup-timeout	0	0%	→ 17th run at 0%
tool-timeout	0	0%	→ 17th run at 0%
api-target	0	0%	→ never used
harness	0	0%	→ never used

2️⃣ Feature Usage Matrix

Feature Category	Available Features	Used	Not Used	Usage Rate
Engine Config	version, model, agent, args, bare, harness, api-target, command	version(15%), model(3%), agent(23%), bare(9%)	args, harness, api-target, command	~38%
Runtime Tuning	startup-timeout, tool-timeout, max-continuations	max-continuations(2%)	startup-timeout, tool-timeout	~1%
Security	network, strict, sandbox, bash-scoping	network(47%), strict(65%), sandbox(11%)	Full bash scoping	~41%
MCP Toolsets	default, repos, issues, pull_requests, discussions, actions, code_security, all	default, most	all (over-broad)	~85%
Agent Files	11 files in .github/agents/	6 (55%)	5 (45%)	55%
Persistence	cache-memory, repo-memory	cache(31%), repo(17%)	n/a	varies

3️⃣ Missed Opportunities

View High Priority Opportunities

🔴 High Priority

Opportunity 1: Add `startup-timeout` and `tool-timeout` to Complex Workflows

What: tools.startup-timeout (seconds to wait for MCP initialization) and tools.tool-timeout (timeout per tool call) are Copilot CLI features exposed via GH_AW_STARTUP_TIMEOUT and GH_AW_TOOL_TIMEOUT.

Why It Matters: Without these, a single hung MCP server call can cause a workflow to spin at the maximum timeout-minutes limit, consuming all GitHub Actions minutes. These settings add graceful error recovery.

Where: Every workflow using external MCP servers (brave, playwright, http MCPs) and complex bash operations.

How to Implement:

tools:
  startup-timeout: 120  # seconds to wait for MCP server to initialize
  tool-timeout: 60      # seconds per individual tool call
  github:
    toolsets: [default]

Opportunity 2: Add Network Restrictions to Code-Editing Workflows

What: 33 workflows with edit or bash tools have no network: configuration. This means the agent can reach any external host during code-editing sessions.

Why It Matters: Network restrictions (especially defaults which includes github.com + npm/pip registries) prevent exfiltration during code modification sessions.

Where: All workflows with edit: or bash: tools that lack network: config.

How to Implement:

network:
  allowed:
    - defaults  # includes github.com, npm, pip registries

For workflows that only need GitHub access:

network:
  allowed:
    - github

Opportunity 3: Scope `bash: true` to Specific Commands

What: 10 workflows use bash: true (unrestricted shell access). The compiler supports per-command scoping.

Where: artifacts-summary.md, daily-cli-performance.md, daily-secrets-analysis.md, daily-sentrux-report.md, daily-workflow-updater.md, dead-code-remover.md, mcp-inspector.md, q.md, refactoring-cadence.md, video-analyzer.md

How to Implement:

tools:
  bash:
    - "jq *"
    - "cat *"
    - "grep *"
    - "git *"

View Medium Priority Opportunities

🟡 Medium Priority

Opportunity 4: Expand `max-continuations` for Long-Running Scheduled Tasks

What: Only 2 workflows use max-continuations. This Copilot-exclusive feature lets the CLI chain multiple autopilot runs for tasks too complex for one pass.

Why It Matters: Long scheduled workflows hitting timeout-minutes limits could instead run multiple focused continuations.

Where: Scheduled workflows with timeout ≥ 30 minutes: daily-malicious-code-scan.md, daily-architecture-diagram.md, repository-quality-improver.md, weekly-blog-post-writer.md

How to Implement:

engine:
  id: copilot
  model: gpt-5
max-continuations: 3  # Run up to 3 autopilot continuations

Opportunity 5: Wire or Remove 5 Unused Agent Files

What: 5 custom agent files in .github/agents/ have never been referenced by any workflow. They define specialized personas/behaviors but aren't used.

Unused files:

grumpy-reviewer.agent.md — useful for PR review workflows
w3c-specification-writer.agent.md — useful for documentation workflows
create-safe-output-type.agent.md — useful for gh-aw development
custom-engine-implementation.agent.md — useful for engine development
interactive-agent-designer.agent.md — useful for workflow creation

How to Implement: Either wire grumpy-reviewer to a PR review workflow:

engine:
  id: copilot
  agent: grumpy-reviewer

Or delete unused files to reduce repository clutter.

Opportunity 6: Add `cache-memory` to 15+ Scheduled Analysis Workflows

What: 66 copilot workflows run on schedules but don't use cache-memory for state persistence. This means each run starts from scratch, potentially repeating analysis.

Where: architecture-guardian.md, cli-consistency-checker.md, copilot-token-audit.md, daily-assign-issue-to-user.md, daily-compiler-threat-spec-optimizer.md

How to Implement:

tools:
  cache-memory:
    path: /tmp/gh-aw/cache-memory/workflow-state.json

View Low Priority Opportunities

🟢 Low Priority

Opportunity 7: Version Pinning for Production-Critical Workflows

What: 82/96 Copilot workflows use version: latest (implicit). 14 already pin versions — a healthy practice.

Why: Pinning prevents unexpected breakage from CLI updates in stable production workflows.

Where: auto-triage-issues.md, daily-community-attribution.md, bot-detection.md

engine:
  id: copilot
  version: "0.0.422"  # pin current stable version

Opportunity 8: Explicit Model Selection for Cost-Sensitive Workflows

What: Only 3 Copilot workflows specify a model explicitly. Lightweight tasks could use gpt-5.4-mini to reduce token costs.

Where: Read-only analysis workflows, label-only workflows, simple classification tasks.

engine:
  id: copilot
  model: gpt-5.4-mini  # for lightweight classification/labeling

4️⃣ Specific Workflow Recommendations

View Workflow-Specific Recommendations

`auto-triage-issues.md`

Current: Uses model: gpt-5-mini ✅ — good lightweight model choice
Missing: No startup-timeout/tool-timeout
Recommendation: Add tools.startup-timeout: 60 to handle github MCP init delays

`dead-code-remover.md`

Current: bash: true (unrestricted) + no network config
Recommendation: Scope bash to ["grep *", "cat *", "git *"]; add network: allowed: [defaults]

`weekly-blog-post-writer.md`

Current: Uses AWF sandbox + 45-minute timeout + no max-continuations
Recommendation: Ideal candidate for max-continuations: 2 to allow deeper research phases

`daily-malicious-code-scan.md`

Current: Runs scheduled, 45-minute timeout, no cache-memory, no max-continuations
Recommendation: Add cache-memory for previously scanned patterns + max-continuations: 2

5️⃣ Trends & Insights

View Historical Trends (17 Runs)

Changes Since Run #16 (2026-05-06)

📈 Total workflows: 214 → 217 (+3)
📈 Copilot workflows: 93 → 96 (+3)
📈 Strict mode: 56/93 → 62/96 (+6 — significant jump)
📈 Version pinning: 6 → 14 (+8 — healthy adoption)
→ Max-continuations: 2 (unchanged — stagnant)
→ startup/tool-timeout: 0 (unchanged — 17th consecutive run)
→ Unused agent files: 5 (unchanged — since April)

Multi-Run Persistent Gaps

All of the following have been 0% for every analysis run since this workflow began:

startup-timeout — never used, ever
tool-timeout — never used, ever
engine.api-target — never used, ever
engine.harness — never used, ever

Positive Trends

Version pinning grew from 0 → 6 → 14 over recent runs
Strict mode adoption growing steadily
Safe-outputs adoption is high (81%)
Toolset specificity is generally good (most workflows avoid [all])

6️⃣ Best Practice Guidelines

Based on 17 runs of analysis, recommended best practices:

Always set startup-timeout and tool-timeout: 60s startup / 30s per-tool is a safe default for workflows using external MCPs
Network restrictions for code-editing workflows: Any workflow with edit: or bash: should have network: allowed: [defaults] minimum
Scope bash: to specific commands: Never use bash: true — enumerate the commands you actually need
Use strict: true by default: 35% of Copilot workflows still don't have this
Explicit model for lightweight tasks: Use gpt-5.4-mini for classification/labeling workflows to reduce costs
Wire or remove agent files: Don't maintain agent files that aren't used in any workflow

7️⃣ Action Items

Immediate Actions (this week):

Add tools.startup-timeout: 60 and tools.tool-timeout: 30 to at least 5 complex MCP-heavy workflows
Scope bash: true → specific commands in the 10 affected workflows

Short-term (this month):

Add network restrictions to 33 code-editing workflows without network: config
Wire grumpy-reviewer.agent.md to a PR review workflow, or delete unused agent files
Add max-continuations: 2 to weekly-blog-post-writer.md and daily-malicious-code-scan.md

Long-term (this quarter):

Adopt startup-timeout/tool-timeout as standard defaults across all MCP-using workflows
Evaluate whether engine.harness customization would benefit complex retry scenarios
Explore engine.api-target for GHE/GHEC deployments

View Supporting Evidence & Methodology

📚 References

Copilot Engine source: pkg/workflow/copilot_engine_execution.go
Engine documentation: docs/src/content/docs/reference/engines.md
Agent files: .github/agents/
Previous research: /tmp/gh-aw/repo-memory/default/copilot-research-notes.md

Research Methodology

Analysis performed by:

Searching all pkg/workflow/copilot_*.go files for available CLI flags and configuration options
Scanning all 217 .github/workflows/*.md files for frontmatter configuration patterns
Cross-referencing feature availability (Phase 1) with actual usage (Phase 2)
Comparing against 16 prior analysis runs stored in repo-memory
Filtering for Copilot-specific features vs. cross-engine features

Generated by Copilot CLI Deep Research (§25476599555) — Run #17

Generated by Copilot CLI Deep Research Agent · ● 45.4M · ◷

expires on May 8, 2026, 4:57 AM UTC

2026-05-08T04:51:31Z

github-actions[bot]
Bot May 8, 2026
Author

This discussion has been marked as outdated by Copilot CLI Deep Research Agent.

A newer discussion is available at Discussion #30930.

0 replies

[copilot-cli-research] Copilot CLI Deep Research - 2026-05-07 #30757

Uh oh!

github-actions[bot] Bot May 7, 2026

📊 Executive Summary

Critical Findings

🔴 High Priority Issues

🟡 Medium Priority Opportunities

1️⃣ Current State Analysis

Copilot CLI Capabilities Inventory

Usage Statistics (Copilot workflows, n=96)

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

🔴 High Priority

Opportunity 1: Add startup-timeout and tool-timeout to Complex Workflows

Opportunity 2: Add Network Restrictions to Code-Editing Workflows

Opportunity 3: Scope bash: true to Specific Commands

🟡 Medium Priority

Opportunity 4: Expand max-continuations for Long-Running Scheduled Tasks

Opportunity 5: Wire or Remove 5 Unused Agent Files

Opportunity 6: Add cache-memory to 15+ Scheduled Analysis Workflows

🟢 Low Priority

Opportunity 7: Version Pinning for Production-Critical Workflows

Opportunity 8: Explicit Model Selection for Cost-Sensitive Workflows

4️⃣ Specific Workflow Recommendations

auto-triage-issues.md

dead-code-remover.md

weekly-blog-post-writer.md

daily-malicious-code-scan.md

5️⃣ Trends & Insights

Changes Since Run #16 (2026-05-06)

Multi-Run Persistent Gaps

Positive Trends

6️⃣ Best Practice Guidelines

7️⃣ Action Items

📚 References

Research Methodology

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 8, 2026 Author

github-actions[bot]
Bot May 7, 2026

Opportunity 1: Add `startup-timeout` and `tool-timeout` to Complex Workflows

Opportunity 3: Scope `bash: true` to Specific Commands

Opportunity 4: Expand `max-continuations` for Long-Running Scheduled Tasks

Opportunity 6: Add `cache-memory` to 15+ Scheduled Analysis Workflows

`auto-triage-issues.md`

`dead-code-remover.md`

`weekly-blog-post-writer.md`

`daily-malicious-code-scan.md`

github-actions[bot]
Bot May 8, 2026
Author