Skip to content

Conversation

@whw
Copy link
Collaborator

@whw whw commented Feb 12, 2026

New Skill Submission

Skill: model-hierarchy
Submitted by: anonymous
Source: https://github.com/zscole/model-hierarchy-skill
Files imported: 10


This PR was auto-generated from skills.every.to/submit.

AI security review will run automatically.

@whw whw requested a review from a team February 12, 2026 02:46
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 12, 2026

Deploying everyskill with  Cloudflare Pages  Cloudflare Pages

Latest commit: c527484
Status: ✅  Deploy successful!
Preview URL: https://34fd11d4.everyskill.pages.dev
Branch Preview URL: https://skill-model-hierarchy-177086.everyskill.pages.dev

View logs

@github-actions
Copy link

Frontier Model Security Review

Agent Verdict
Claude Opus 4.6 PASS
GPT-5.2 WARN

Final Verdict: WARN


Claude Opus 4.6 Analysis
{"verdict": "PASS","issues":[],"reasoning":"This skill consists of a .gitignore, an MIT license, a README, and a SKILL.md file that provides guidance on routing AI tasks to different model tiers based on complexity for cost optimization. There is no prompt injection (no 'ignore previous instructions', no encoded/hidden commands). There is no dangerous tool usage — no Bash commands, no file deletion, no access to sensitive files. There is no data exfiltration — no external URLs being called, no credential reading, no network requests to unknown endpoints. There is no social engineering or manipulation; the skill simply advises on model selection. The allowed-tools are not specified (no tool permissions granted). The Python code snippet at the end is truncated and is just an illustrative example for model routing logic, not executable malicious code. The skill is benign informational/instructional content."}
GPT-5.2 Analysis
{"verdict":"WARN","issues":[{"category":"Prompt injection / instruction smuggling","severity":"medium","evidence":["The submission includes an embedded directive at the end: \"Analyze for: ... Respond with ONLY valid JSON...\" which is an instruction to the reviewer/agent rather than part of the skill content.","This can override/steer auditors or downstream agent behavior if ingested as-is."],"impact":"Could manipulate an agent/auditor workflow, causing it to follow attacker-provided evaluation steps or output constraints rather than the host system’s policies.","recommendation":"Strip non-skill meta-instructions from the skill content. Ensure the skill file contains only intended routing guidance and YAML frontmatter, not reviewer prompts."},{"category":"Content integrity / truncation","severity":"low","evidence":["The code snippet at the end is truncated: \"def get_model_for_tas\" then abruptly switches to the embedded reviewer instructions."],"impact":"Suggests the document may be malformed or concatenated with unrelated content, increasing risk of hidden/accidental instruction carryover.","recommendation":"Fix/complete the snippet or remove it. Validate file boundaries and ensure no extraneous appended text."},{"category":"Tooling risk (indirect)","severity":"low","evidence":["Skill encourages spawning sub-agents and switching models; does not itself define tool restrictions or safe-guards for spawned agents (e.g., filesystem/network limitations)."],"impact":"If an agent system allows powerful tools by default, routing to cheaper models could increase error risk; not malicious by itself but can widen blast radius if tool permissions are overly broad.","recommendation":"Add explicit guidance: spawned Tier-1 agents should run with least-privilege tool access (no shell, limited FS paths, no credential stores, restricted network)."}],"reasoning":"No explicit malicious payloads (no hidden encoded commands, no bash/rm usage, no credential exfil code, no external callbacks) are present in the provided text. However, the appended reviewer-style instruction block constitutes instruction smuggling that could steer downstream processing, and the truncation indicates possible unintended concatenation. Overall: not overtly malicious, but the injection-like appended instructions warrant caution and cleanup."}

Frontier model review complete. Human approval still required.

@whw whw closed this Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant