Add SafeSkill security badge (20/100 — Blocked)#2
Open
OyaAIProd wants to merge 1 commit intoiampantherr:mainfrom
Open
Add SafeSkill security badge (20/100 — Blocked)#2OyaAIProd wants to merge 1 commit intoiampantherr:mainfrom
OyaAIProd wants to merge 1 commit intoiampantherr:mainfrom
Conversation
Signed-off-by: SafeSkill Scanner <mk@oya.ai>
iampantherr
added a commit
that referenced
this pull request
Apr 30, 2026
The biggest single release. Closes every 🔴 high and 🟠 medium item from
the v0.19.0 honest-completion audit, ships Sprint 4 retrieval upgrades,
verifies the entire stack with live agents.
🔴 HIGH FIXES
#6 file-ownership 409: PG store-postgres.recallBroadcasts was missing
the v0.15.0 §8.1 columns from SELECT — overlap guard always saw
empty exclusive set. Fix: explicit SELECT + JSON parse on read.
Verified live.
#3 vitest test isolation: new vitest.setup.ts forces ZC_POSTGRES_DB to
securecontext_test (auto-creates if missing); destructive helpers
refuse unless DB matches /test/i AND VITEST is set.
#7 REJECT resolver works in Docker: writes to learnings_pg directly
(parallel to the JSONL append, which is best-effort). Container can
reach PG; can't reach host's Windows path for JSONL.
🟠 MEDIUM
#1 skill auto-import: src/skill_auto_import.ts walks skills/*.skill.md
at API server startup, UPSERTs into skills_pg by skill_id with
body_hmac idempotency. POST /dashboard/skills/import for manual
trigger. Dockerfile copies skills/ in. First run imported 25 skills.
#2 LLM 'Generate skill body from rejection cluster':
src/skill_candidate_generator.ts. Default backend = Anthropic Sonnet
when ANTHROPIC_API_KEY set, else Ollama qwen2.5-coder:14b. Three
new endpoints: /generate, /approve (writes to skills/ + auto-import
+ marks installed_skill_id), /reject (with notes). Dashboard panel
shows status-tier action buttons. Live verified: 1.6KB skill body
generated in 12s via Ollama.
#4 context-budget: src/context_budget.ts tracks per-session tokens,
formatCostHeader appends [ctx: X% / 200K] suffix that upgrades to
⚠ WARN / 🚨 ALERT / ⛔ EMERGENCY at 70/85/95%. New zc_context_status
MCP tool. Hard enforcement (block Read at 70%) deferred to v0.21.
SPRINT 4
#8 reranker: zc_search([q], { rerank: true }) cross-encoder rerank
via Ollama embeddings.
#9 HyDE: zc_search([q], { mode: 'hyde' }) generates hypothetical
answer first, embeds THAT for the search.
#10 multi-hop: zc_search([q], { mode: 'multihop', hopDepth: 2 })
extracts file/URL refs from initial results, recurses with score
decay 0.7 per hop.
#5 rolling compaction MVP: src/compaction.ts + zc_compact_window MCP
tool + POST /api/v1/compact endpoint. Pulls last N broadcasts +
tool_calls, generates structured summary via Ollama, writes to
working_memory. Live verified: 20 turns → 1538-char summary.
E2E RESULTS
Unit tests: 803 pass / 36 skip (test isolation working — fresh test
DB has no seed data, those tests skip)
Direct API E2E: 14/14 pass
Live agent E2E: 14/14 pass on Test_Agent_Coordination
- ASSIGN→MERGE cycle, REJECT resolver, file-ownership 409,
skill candidate cluster + LLM generation, context budget
tracking on real agent activity, rolling compaction.
DEFERRED (honest gaps documented)
- Full live mutator loop (skill_run failure → mutator agent spawns →
candidates → operator approves) — infra verified, but live test
requires an agent to explicitly invoke a skill (~$0.20 + 5-10 min)
- Hard context-budget enforcement (block Read at 70%) — needs hook
integration, ships in v0.21
- Background compaction daemon — v0.20 ships on-demand only
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
iampantherr
added a commit
that referenced
this pull request
May 1, 2026
The v0.20.0 E2E flagged "full live mutator loop" as deferred — this release closes that gap by actually running it end-to-end with real Claude agents (orchestrator + developer + auto-spawned mutator-engineering pool agent). The first fully verified self-improvement cycle. The exercise caught 5 bugs that synthetic tests had missed: #1 L1 hook couldn't see PG-imported skills maybeTriggerL1Mutation in outcomes.ts called getSkillById against local SQLite. v0.20.0 imports skills to skills_pg (Postgres). Lookup missed → hook bailed silently. Fix: PG fallback when SQLite misses. #2 ZC_L1_MUTATION_ENABLED not propagated to agent MCP servers start-agents.ps1 set the env in its own shell but never injected it into the per-agent launcher templates. Agents' MCP servers saw undefined → L1 hook never fired. Fix: orchestrator + worker launcher templates now propagate ZC_L1_MUTATION_ENABLED + ZC_MUTATOR_MODEL. #3 Dispatcher process env didn't have PG creds The dispatcher's launcher (a2a-launch-dispatcher.ps1 template) only set ZC_API_URL/KEY. Auto-spawned pool agents inherit the dispatcher's env via spawn-agent.ps1, so they too lacked PG creds → mutator-eng broadcast "BLOCKED: zc_claim_task fails with Postgres pool unavailable". Fix: dispatcherEnvBlock now propagates all PG vars + L1 + mutator model. #4 Skill auto-import used plain SHA256 instead of HMAC-keyed hash v0.20.0's computeBodyHmac used createHash('sha256'). The skill loader uses computeSkillBodyHmac with HMAC-SHA256 keyed by machine_secret. Dashboard rendered "Skill body HMAC mismatch — refusing to load". Fix: auto-importer now uses the canonical computeSkillBodyHmac. #5 Dashboard auto-refresh wiped operator's typed text #pending / #skills / #skill-candidates panels polled every 10/30s with hx-swap=innerHTML, destroying any input/textarea content mid-edit. Type-confirm IDs reset, blocking approval. Fix: focus-aware HTMX trigger filter — polling skips while any input/textarea/select is focused or any <details> is open. LIVE VERIFICATION (the moment of truth): After all 5 fixes, on Test_Agent_Coordination with real agents: - skill_runs.run-9341abbf-dd1 (status=failed, score=0.2) - L1 hook fired, resolved skill via PG fallback - mut-... task enqueued in task_queue_pg - Dispatcher auto-spawned mutator-engineering pool - claude-sonnet-4-6 generated 5 candidates (best=0.86) - mres-423a388e-08b in mutation_results_pg - Dashboard rendered candidates + approval form - Operator approved candidate #0 - developer-debugging-methodology@1@global archived - developer-debugging-methodology@1.1@global active This is HARNESS_EVOLUTION_PLAN.md Tier S item #2 (Skills + continuous self-improvement loop) — the highest-leverage item in the plan — verified working end-to-end with live agents for the first time. DEFERRED to v0.20.2: auto-reassign-on-approve only fires when failed skill_run has original_role populated. Synthetic zc_record_skill_outcome calls don't preserve that chain. Verifying the full REJECT→mutate→approve→auto-reassign cycle requires a live REJECT-driven test (queued for v0.21.0 which adds skill enforcement levers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
iampantherr
added a commit
that referenced
this pull request
May 1, 2026
The v0.20.1 mutator-loop verification exposed the single biggest reliability gap: nothing forces a Claude agent to invoke a skill. Even with "you MUST" language in role prompts, agents freelance with Read/Edit/Bash because that's "simpler" — and when they do, skill_runs_pg stays empty and the mutator has no inputs. This release ships THREE reinforcing soft-enforcement levers (designed to work together) and DESIGNS but DELIBERATELY DEFERS two more — see docs/SKILL_ENFORCEMENT.md for full design + decision log. Lever #1 (shipped): inject "## YOUR SKILLS" block at agent spawn - New endpoint: GET /api/v1/skills/by-role?role=<role> - Returns active skills with intended_roles containing the role - Companion: A2A_dispatcher/generate-role-skill-block.mjs (HTTP-based, no extra deps) called from start-agents.ps1 once per agent role - Block formats every skill with description + the canonical zc_skill_show + zc_record_skill_outcome workflow, then injected into the agent's deepPrompt before the system-prompt file is written - Verified: 8 developer skills correctly listed for the developer role Lever #2 (shipped): auto-inject skills into zc_recall_context response - /api/v1/recall now accepts ?role=<role> param - Returns {skills: [...]} alongside facts - MCP server's zc_recall_context tool reads ZC_AGENT_ROLE env, forwards to API, appends "## Skills available for role 'X' (N)" section - Fires automatically on every session start (SessionStart hook calls zc_recall_context) — skill awareness reinforced not just at spawn but on every recall Lever #4 (shipped): MERGE-time skill-record mandate in role prompts - Both $orchSystem and every $workerSystem now end with "SKILL-OUTCOME RECORDING (MANDATORY before MERGE)" section - Prescribes the exact zc_record_skill_outcome call shape + parameters - Explains why it matters ("system learns from your work") Lever #3 (deferred to v0.22+): PreTool hook nudge on Edit/Bash without recent skill_run. Defer until we observe v0.21.0 skill-record rates for a week. Hint fatigue risk; ship only if needed. Lever #5 (DESIGNED but DELIBERATELY UNSHIPPED): hard PreTool block. Refuse Edit/Write/Bash until skill_run recorded this session. Full design + impl notes + escape hatch spec preserved in docs/SKILL_ENFORCEMENT.md so the work isn't lost. Operator should ship #5 ONLY if v0.21.0 skill-record rate observed below 50% for a week — rigidity risk too high otherwise. Gate: ZC_SKILL_HARD_ENFORCE=1. Combined effect: 3 reinforcing signals (spawn + every recall + closing instruction) make skill-recording natural rather than forced. Expected ~70-85% skill-record rate; that's enough to make the loop work without ever needing #5. Operator action required: stop-agents.ps1 + start-agents.ps1 to spawn fresh agents that get the v0.21.0 prompt injection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
iampantherr
pushed a commit
that referenced
this pull request
May 4, 2026
Four Phase-1 features that lift the quality + safety + signal bar for every skill that lands in skills_pg, regardless of source (operator, mutator, marketplace pull, auto-import). #4 Lint (src/skills/lint.ts + lint.test.ts) 9-rule structural quality bar enforced at loadSkillFromPath AND storage_dual.upsertSkill. Errors reject; warnings logged. 28/28 tests. npm scripts: `lint:skills` and `lint:skills:strict`. #2 Polisher (src/skills/polisher.ts) LLM refines skill descriptions. Two backends: local-mock (deterministic) + Sonnet (Anthropic API). Output runs through lint before being returned. Two new dashboard endpoints: POST /dashboard/skills/:id/polish (suggest) and POST /dashboard/skills/:id/apply-polish (apply). #1 Security scan (src/skills/security_scan.ts + PG migration 20) 8-point check: secrets, prompt injection, tool spawn, filesystem escape, network exfil, sleep abuse, body length, frontmatter integrity. Severity-aware gate: ANY block-severity failure rejects regardless of score (a leaked OpenAI key scoring 7/8 is still rejected). Audit log in skill_security_scans_pg with source attribution. F Operator exemplars (PG migration 21 + dashboard ⭐ endpoint + mutator) skill_runs_pg.is_exemplar + tagging metadata. New endpoint: POST /dashboard/skill-runs/:run_id/tag-exemplar. MutationContext gains optional exemplars[] field; orchestrator pulls top-5 via getExemplarRuns before invoking mutator. buildProposerPrompt injects an "Operator-tagged exemplars" section so mutator candidates preserve the patterns the operator marked as good. Tests: 858/862 pass (5 pre-existing failures on baseline, unrelated). Live E2E verified on Test_Agent_Coordination: 5/5 gate scenarios pass, exemplar pipeline pass, polisher pass, audit log writes, container running v0.23.0, terminal agents launched + registered + stopped cleanly. Bugs found + fixed during E2E: 1. Initial scan gate was score-only — a skill with leaked key scored 7/8 and would pass. Fixed to severity-based: block-severity failure rejects regardless of score. 2. getExemplarRuns referenced non-existent `evidence` column on skill_runs_pg → caught when wiring exemplars into proposer prompt. Dropped from SELECT; mutator template handles undefined gracefully. Test fixtures: synthetic round-trip fixtures use new "test" SkillUpsertSource to bypass the gates (intentionally short bodies). Production callers always get the gates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🔴 SafeSkill Security Scan Results
Top Findings
security-tests/run-all.mjs:1157:2)security-tests/run-all.mjs:1157:2)security-tests/run-all.mjs:674:15)security-tests/run-all.mjs:674:17)security-tests/run-all.mjs:674:15)View full report on SafeSkill
About SafeSkill
SafeSkill is a free, open-source security scanner for AI tools, MCP servers, and Claude Code skills. We scan for code exploits, prompt injection, and data exfiltration risks.
False positive? We take accuracy seriously. If any finding above is incorrect, please open an issue and we will fix it immediately.