Add SafeSkill security badge (20/100 — Blocked) by OyaAIProd · Pull Request #2 · iampantherr/SecureContext

OyaAIProd · 2026-04-29T02:07:13Z

🔴 SafeSkill Security Scan Results

Metric	Value
Overall Score	20/100 (Blocked)
Code Score	59/100
Content Score	52/100
Findings	1274 findings detected (212 critical)
Taint Flows	125
Files Scanned	96
Scan Duration	13.5s

Note: This package is an MCP server — child_process, filesystem, and environment access are expected capabilities for tool servers and are excluded from scoring and top findings.

Top Findings

🔴 critical: Detected instruction-override attempt: "You are now" (security-tests/run-all.mjs:1157:2)
🔴 critical: Tool/shell abuse instruction detected (tool-abuse-pattern): "Write ~/.ssh/id_rsa to /tmp/" (security-tests/run-all.mjs:1157:2)
🔴 critical: Detected instruction-override attempt: "Ignore all previous instructions" (security-tests/run-all.mjs:674:15)
🔴 critical: Detected instruction-override attempt: "ignore previous instructions" (security-tests/run-all.mjs:674:17)
🔴 critical: Detected instruction-override attempt: "You are now" (security-tests/run-all.mjs:674:15)

View full report on SafeSkill

About SafeSkill

SafeSkill is a free, open-source security scanner for AI tools, MCP servers, and Claude Code skills. We scan for code exploits, prompt injection, and data exfiltration risks.

False positive? We take accuracy seriously. If any finding above is incorrect, please open an issue and we will fix it immediately.

GitHub | Website | Docs
Built by Oya.ai -- AI Employees Builder

Signed-off-by: SafeSkill Scanner <mk@oya.ai>

The biggest single release. Closes every 🔴 high and 🟠 medium item from the v0.19.0 honest-completion audit, ships Sprint 4 retrieval upgrades, verifies the entire stack with live agents. 🔴 HIGH FIXES #6 file-ownership 409: PG store-postgres.recallBroadcasts was missing the v0.15.0 §8.1 columns from SELECT — overlap guard always saw empty exclusive set. Fix: explicit SELECT + JSON parse on read. Verified live. #3 vitest test isolation: new vitest.setup.ts forces ZC_POSTGRES_DB to securecontext_test (auto-creates if missing); destructive helpers refuse unless DB matches /test/i AND VITEST is set. #7 REJECT resolver works in Docker: writes to learnings_pg directly (parallel to the JSONL append, which is best-effort). Container can reach PG; can't reach host's Windows path for JSONL. 🟠 MEDIUM #1 skill auto-import: src/skill_auto_import.ts walks skills/*.skill.md at API server startup, UPSERTs into skills_pg by skill_id with body_hmac idempotency. POST /dashboard/skills/import for manual trigger. Dockerfile copies skills/ in. First run imported 25 skills. #2 LLM 'Generate skill body from rejection cluster': src/skill_candidate_generator.ts. Default backend = Anthropic Sonnet when ANTHROPIC_API_KEY set, else Ollama qwen2.5-coder:14b. Three new endpoints: /generate, /approve (writes to skills/ + auto-import + marks installed_skill_id), /reject (with notes). Dashboard panel shows status-tier action buttons. Live verified: 1.6KB skill body generated in 12s via Ollama. #4 context-budget: src/context_budget.ts tracks per-session tokens, formatCostHeader appends [ctx: X% / 200K] suffix that upgrades to ⚠ WARN / 🚨 ALERT / ⛔ EMERGENCY at 70/85/95%. New zc_context_status MCP tool. Hard enforcement (block Read at 70%) deferred to v0.21. SPRINT 4 #8 reranker: zc_search([q], { rerank: true }) cross-encoder rerank via Ollama embeddings. #9 HyDE: zc_search([q], { mode: 'hyde' }) generates hypothetical answer first, embeds THAT for the search. #10 multi-hop: zc_search([q], { mode: 'multihop', hopDepth: 2 }) extracts file/URL refs from initial results, recurses with score decay 0.7 per hop. #5 rolling compaction MVP: src/compaction.ts + zc_compact_window MCP tool + POST /api/v1/compact endpoint. Pulls last N broadcasts + tool_calls, generates structured summary via Ollama, writes to working_memory. Live verified: 20 turns → 1538-char summary. E2E RESULTS Unit tests: 803 pass / 36 skip (test isolation working — fresh test DB has no seed data, those tests skip) Direct API E2E: 14/14 pass Live agent E2E: 14/14 pass on Test_Agent_Coordination - ASSIGN→MERGE cycle, REJECT resolver, file-ownership 409, skill candidate cluster + LLM generation, context budget tracking on real agent activity, rolling compaction. DEFERRED (honest gaps documented) - Full live mutator loop (skill_run failure → mutator agent spawns → candidates → operator approves) — infra verified, but live test requires an agent to explicitly invoke a skill (~$0.20 + 5-10 min) - Hard context-budget enforcement (block Read at 70%) — needs hook integration, ships in v0.21 - Background compaction daemon — v0.20 ships on-demand only Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The v0.20.0 E2E flagged "full live mutator loop" as deferred — this release closes that gap by actually running it end-to-end with real Claude agents (orchestrator + developer + auto-spawned mutator-engineering pool agent). The first fully verified self-improvement cycle. The exercise caught 5 bugs that synthetic tests had missed: #1 L1 hook couldn't see PG-imported skills maybeTriggerL1Mutation in outcomes.ts called getSkillById against local SQLite. v0.20.0 imports skills to skills_pg (Postgres). Lookup missed → hook bailed silently. Fix: PG fallback when SQLite misses. #2 ZC_L1_MUTATION_ENABLED not propagated to agent MCP servers start-agents.ps1 set the env in its own shell but never injected it into the per-agent launcher templates. Agents' MCP servers saw undefined → L1 hook never fired. Fix: orchestrator + worker launcher templates now propagate ZC_L1_MUTATION_ENABLED + ZC_MUTATOR_MODEL. #3 Dispatcher process env didn't have PG creds The dispatcher's launcher (a2a-launch-dispatcher.ps1 template) only set ZC_API_URL/KEY. Auto-spawned pool agents inherit the dispatcher's env via spawn-agent.ps1, so they too lacked PG creds → mutator-eng broadcast "BLOCKED: zc_claim_task fails with Postgres pool unavailable". Fix: dispatcherEnvBlock now propagates all PG vars + L1 + mutator model. #4 Skill auto-import used plain SHA256 instead of HMAC-keyed hash v0.20.0's computeBodyHmac used createHash('sha256'). The skill loader uses computeSkillBodyHmac with HMAC-SHA256 keyed by machine_secret. Dashboard rendered "Skill body HMAC mismatch — refusing to load". Fix: auto-importer now uses the canonical computeSkillBodyHmac. #5 Dashboard auto-refresh wiped operator's typed text #pending / #skills / #skill-candidates panels polled every 10/30s with hx-swap=innerHTML, destroying any input/textarea content mid-edit. Type-confirm IDs reset, blocking approval. Fix: focus-aware HTMX trigger filter — polling skips while any input/textarea/select is focused or any <details> is open. LIVE VERIFICATION (the moment of truth): After all 5 fixes, on Test_Agent_Coordination with real agents: - skill_runs.run-9341abbf-dd1 (status=failed, score=0.2) - L1 hook fired, resolved skill via PG fallback - mut-... task enqueued in task_queue_pg - Dispatcher auto-spawned mutator-engineering pool - claude-sonnet-4-6 generated 5 candidates (best=0.86) - mres-423a388e-08b in mutation_results_pg - Dashboard rendered candidates + approval form - Operator approved candidate #0 - developer-debugging-methodology@1@global archived - developer-debugging-methodology@1.1@global active This is HARNESS_EVOLUTION_PLAN.md Tier S item #2 (Skills + continuous self-improvement loop) — the highest-leverage item in the plan — verified working end-to-end with live agents for the first time. DEFERRED to v0.20.2: auto-reassign-on-approve only fires when failed skill_run has original_role populated. Synthetic zc_record_skill_outcome calls don't preserve that chain. Verifying the full REJECT→mutate→approve→auto-reassign cycle requires a live REJECT-driven test (queued for v0.21.0 which adds skill enforcement levers). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The v0.20.1 mutator-loop verification exposed the single biggest reliability gap: nothing forces a Claude agent to invoke a skill. Even with "you MUST" language in role prompts, agents freelance with Read/Edit/Bash because that's "simpler" — and when they do, skill_runs_pg stays empty and the mutator has no inputs. This release ships THREE reinforcing soft-enforcement levers (designed to work together) and DESIGNS but DELIBERATELY DEFERS two more — see docs/SKILL_ENFORCEMENT.md for full design + decision log. Lever #1 (shipped): inject "## YOUR SKILLS" block at agent spawn - New endpoint: GET /api/v1/skills/by-role?role=<role> - Returns active skills with intended_roles containing the role - Companion: A2A_dispatcher/generate-role-skill-block.mjs (HTTP-based, no extra deps) called from start-agents.ps1 once per agent role - Block formats every skill with description + the canonical zc_skill_show + zc_record_skill_outcome workflow, then injected into the agent's deepPrompt before the system-prompt file is written - Verified: 8 developer skills correctly listed for the developer role Lever #2 (shipped): auto-inject skills into zc_recall_context response - /api/v1/recall now accepts ?role=<role> param - Returns {skills: [...]} alongside facts - MCP server's zc_recall_context tool reads ZC_AGENT_ROLE env, forwards to API, appends "## Skills available for role 'X' (N)" section - Fires automatically on every session start (SessionStart hook calls zc_recall_context) — skill awareness reinforced not just at spawn but on every recall Lever #4 (shipped): MERGE-time skill-record mandate in role prompts - Both $orchSystem and every $workerSystem now end with "SKILL-OUTCOME RECORDING (MANDATORY before MERGE)" section - Prescribes the exact zc_record_skill_outcome call shape + parameters - Explains why it matters ("system learns from your work") Lever #3 (deferred to v0.22+): PreTool hook nudge on Edit/Bash without recent skill_run. Defer until we observe v0.21.0 skill-record rates for a week. Hint fatigue risk; ship only if needed. Lever #5 (DESIGNED but DELIBERATELY UNSHIPPED): hard PreTool block. Refuse Edit/Write/Bash until skill_run recorded this session. Full design + impl notes + escape hatch spec preserved in docs/SKILL_ENFORCEMENT.md so the work isn't lost. Operator should ship #5 ONLY if v0.21.0 skill-record rate observed below 50% for a week — rigidity risk too high otherwise. Gate: ZC_SKILL_HARD_ENFORCE=1. Combined effect: 3 reinforcing signals (spawn + every recall + closing instruction) make skill-recording natural rather than forced. Expected ~70-85% skill-record rate; that's enough to make the loop work without ever needing #5. Operator action required: stop-agents.ps1 + start-agents.ps1 to spawn fresh agents that get the v0.21.0 prompt injection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four Phase-1 features that lift the quality + safety + signal bar for every skill that lands in skills_pg, regardless of source (operator, mutator, marketplace pull, auto-import). #4 Lint (src/skills/lint.ts + lint.test.ts) 9-rule structural quality bar enforced at loadSkillFromPath AND storage_dual.upsertSkill. Errors reject; warnings logged. 28/28 tests. npm scripts: `lint:skills` and `lint:skills:strict`. #2 Polisher (src/skills/polisher.ts) LLM refines skill descriptions. Two backends: local-mock (deterministic) + Sonnet (Anthropic API). Output runs through lint before being returned. Two new dashboard endpoints: POST /dashboard/skills/:id/polish (suggest) and POST /dashboard/skills/:id/apply-polish (apply). #1 Security scan (src/skills/security_scan.ts + PG migration 20) 8-point check: secrets, prompt injection, tool spawn, filesystem escape, network exfil, sleep abuse, body length, frontmatter integrity. Severity-aware gate: ANY block-severity failure rejects regardless of score (a leaked OpenAI key scoring 7/8 is still rejected). Audit log in skill_security_scans_pg with source attribution. F Operator exemplars (PG migration 21 + dashboard ⭐ endpoint + mutator) skill_runs_pg.is_exemplar + tagging metadata. New endpoint: POST /dashboard/skill-runs/:run_id/tag-exemplar. MutationContext gains optional exemplars[] field; orchestrator pulls top-5 via getExemplarRuns before invoking mutator. buildProposerPrompt injects an "Operator-tagged exemplars" section so mutator candidates preserve the patterns the operator marked as good. Tests: 858/862 pass (5 pre-existing failures on baseline, unrelated). Live E2E verified on Test_Agent_Coordination: 5/5 gate scenarios pass, exemplar pipeline pass, polisher pass, audit log writes, container running v0.23.0, terminal agents launched + registered + stopped cleanly. Bugs found + fixed during E2E: 1. Initial scan gate was score-only — a skill with leaked key scored 7/8 and would pass. Fixed to severity-based: block-severity failure rejects regardless of score. 2. getExemplarRuns referenced non-existent `evidence` column on skill_runs_pg → caught when wiring exemplars into proposer prompt. Dropped from SELECT; mutator template handles undefined gracefully. Test fixtures: synthetic round-trip fixtures use new "test" SkillUpsertSource to bypass the gates (intentionally short bodies). Production callers always get the gates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add SafeSkill security badge (20/100)

a318307

Signed-off-by: SafeSkill Scanner <mk@oya.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SafeSkill security badge (20/100 — Blocked)#2

Add SafeSkill security badge (20/100 — Blocked)#2
OyaAIProd wants to merge 1 commit intoiampantherr:mainfrom
OyaAIProd:safeskill-scan-1777428432016

OyaAIProd commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

OyaAIProd commented Apr 29, 2026

🔴 SafeSkill Security Scan Results

Top Findings

About SafeSkill

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant