Add harness engineering skill 🤖🤖🤖#1945
Conversation
🔍 Skill Validator Results✅ All checks passed
Summary
Full validator output```text Found 1 skill(s) [harness-engineering] 📊 harness-engineering: 1,605 BPE tokens [chars/4: 1,971] (detailed ✓), 18 sections, 3 code blocks ✅ All checks passed (1 skill(s)) ``` |
aaronpowell
left a comment
There was a problem hiding this comment.
This reads a lot like reimplementing a memory system, which is already built into Copilot.
How do you see it different, and what behaviours do you expect to change by adding this on top of memory?
|
@aaronpowell Thanks, fair pushback. I agree there is overlap with Copilot Memory. If this were only docs/decisions and docs/failures, I’d also describe it as a repo-local memory system. The distinction I’m trying to make is that memory is only one part of the harness. Copilot Memory can remember repo facts; the harness makes the important facts repo-owned and operational: versioned in the codebase, reviewable in PRs, tied to checks where practical, and measurable through Doctor/review/effectiveness reports. A related distinction is human judgment, but I would frame it narrowly. This is not a human-approval system for every agent step. The harness defines where human judgment is required when automation cannot prove safety: structural changes may need ADR review or an explicit skip reason, repeated failures need a named detection/prevention path, and if no automated check is practical the repo records the manual review point. That makes memory auditable and reviewable instead of hidden context. So the behavior change I expect is not simply “the agent remembers more.” It is that repeated failures should turn into a failure note plus a named detection/prevention path, structural changes should trigger ADR review, agents should run the repo’s documented gates, and maintainers can see when rules, checks, memory, and evidence are disconnected. The benchmark evidence supports that narrower claim around repo-local workflow/schema preservation and diagnosability. It does not prove that memory-harness is generally more correct than workflow-only guidance, and I would not claim that. |
Pull Request Checklist
npm startand verified thatREADME.mdis up to date.stagedbranch for this pull request.Description
Adds a new
harness-engineeringskill for adopting repository-level guardrails for coding agents.The skill helps users turn repeated AI coding-agent mistakes into durable repository artifacts: agent instructions, enforceable checks, failure memory, drift checks, adoption reports, and review workflows. It is intentionally prompt-first and repository-specific, so it tells the agent to inspect the target repository before adding harness pieces instead of copying a generic template.
This is distinct from existing AI-readiness or AGENTS.md generation resources because it focuses on recurrence prevention after a concrete failure or repeated review pattern, and on tying every high-risk rule to a test, lint rule, CI gate, drift check, or manual review point.
Type of Contribution
Additional Notes
Validation run locally:
npm cinpm run skill:validatenpm run plugin:validatenpm startgit diff --checknode_modulesgh skill install /tmp/awesome-copilot-pr-work harness-engineering --from-local --agent github-copilot --scope project --forcein a temporary smoke-test repositorycopilot -p "Use the harness-engineering skill..."dry-run confirmed GitHub Copilot CLI loaded the skill and summarized the first three workflow steps without editing filesBy submitting this pull request, I confirm that my contribution abides by the Code of Conduct and will be licensed under the MIT License.