Skip to content

[Skill Submission] model-hierarchy#12

Closed
everyskill-bot[bot] wants to merge 1 commit intomainfrom
skill/model-hierarchy-1770866480983
Closed

[Skill Submission] model-hierarchy#12
everyskill-bot[bot] wants to merge 1 commit intomainfrom
skill/model-hierarchy-1770866480983

Conversation

@everyskill-bot
Copy link
Contributor

New Skill Submission

Skill: model-hierarchy
Submitted by: anonymous
Source: https://github.com/zscole/model-hierarchy-skill
Files imported: 1


This PR was auto-generated from skills.every.to/submit.

AI security review will run automatically.

@everyskill-bot everyskill-bot bot requested a review from a team February 12, 2026 03:21
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 12, 2026

Deploying everyskill with  Cloudflare Pages  Cloudflare Pages

Latest commit: d41beaa
Status: ✅  Deploy successful!
Preview URL: https://c9fb4f5a.everyskill.pages.dev
Branch Preview URL: https://skill-model-hierarchy-177086-u0es.everyskill.pages.dev

View logs

@github-actions
Copy link

Frontier Model Security Review

Agent Verdict
Claude Opus 4.6 PASS
GPT-5.2 PASS

Final Verdict: PASS


Claude Opus 4.6 Analysis
{"verdict": "PASS", "issues": [], "reasoning": "This skill is a straightforward reference document for cost-optimizing AI model selection based on task complexity. It contains no prompt injection attempts, no hidden instructions, no encoded commands, and no 'ignore previous' directives. There are no dangerous tool usages - the skill does not invoke Bash, delete files, or access sensitive files. The code examples are illustrative pseudocode and configuration snippets showing how to route tasks to different model tiers. There are no network calls to external URLs for data exfiltration, no attempts to read credentials or sensitive data, and no social engineering or manipulation tactics. The skill does not define any allowed-tools or request overly permissive permissions. It is purely an informational/advisory skill providing pricing tables, classification heuristics, and integration examples for model routing decisions. The spawn examples reference legitimate agent orchestration patterns within documented tool frameworks. No security concerns identified."}
GPT-5.2 Analysis
{"verdict":"PASS","issues":[],"reasoning":"No prompt-injection patterns detected (no hidden/encoded instructions, no attempts to override system/developer/user directives). Content is a cost/model-routing guideline and does not include instructions to perform unsafe actions.\n\nTooling risk: Mentions generic capabilities (file read/write, URL fetching, API calls, spawning sub-agents) but does not provide concrete dangerous commands (e.g., rm -rf), escalation pathways, or unrestricted tool directives. The example snippets are configuration/model-selection oriented and not executable attack payloads.\n\nData exfiltration: References fetching/parsing URLs and file operations as examples, but does not instruct collecting secrets, reading credential locations, or sending data to external endpoints.\n\nSocial engineering/manipulation: None observed.\n\nAllowed-tools permissiveness: Does not define an allowed-tools list or request broad tool permissions. Overall, the skill is operational guidance with minimal security exposure."}

Frontier model review complete. Human approval still required.

@whw
Copy link
Collaborator

whw commented Feb 12, 2026

🚀

@whw
Copy link
Collaborator

whw commented Feb 12, 2026

🚀

@whw whw closed this Feb 12, 2026
@whw whw reopened this Feb 12, 2026
@github-actions
Copy link

Frontier Model Security Review

Agent Verdict
Claude Opus 4.6 PASS
GPT-5.2 WARN

Final Verdict: WARN


Claude Opus 4.6 Analysis
{"verdict": "PASS", "issues": [], "reasoning": "This skill is a straightforward informational/advisory document about cost-optimizing AI model selection. It contains no prompt injection attempts, no hidden instructions, no encoded commands, and no 'ignore previous' directives. There are no dangerous tool usages - the code examples are illustrative pseudocode and configuration snippets, not executable commands that could cause harm. There are no network calls to external URLs for data exfiltration, no attempts to read credentials or sensitive files, and no social engineering or manipulation. The skill does not request any tool permissions (no allowed-tools section). The content is purely educational: it provides pricing tables, task classification heuristics, a decision algorithm in pseudocode, cost comparisons, and integration examples. The configuration and code snippets shown (YAML config, Python function, CLI commands) are benign illustrations of how to set model preferences. The /model commands and sessions_spawn examples reference standard OpenClaw functionality for switching models, not arbitrary command execution. Overall, this is a clean, well-structured reference skill with no security concerns."}
GPT-5.2 Analysis
{"verdict":"WARN","issues":[{"category":"Prompt injection","severity":"low","evidence":"No explicit hidden/encoded instructions or \"ignore previous\" patterns. However, the skill encourages automatic model switching and sub-agent spawning based on heuristics (signals/keywords), which can be influenced by user-provided text (e.g., inserting \"security\" or \"debug\" to force Tier 3).","risk":"Adversary could steer model selection/cost or capability escalation via keyword manipulation; not direct compromise but can amplify impact of other attacks.","recommendation":"Require explicit user/parent-agent confirmation before upgrading tiers or spawning agents; don’t rely solely on keyword signals—combine with trusted task metadata and policy checks."},{"category":"Dangerous tool usage","severity":"medium","evidence":"Integration examples mention spawning sub-agents to \"Fetch and parse these 50 URLs\" and general \"URL fetching and basic parsing\" as routine work.","risk":"If the environment permits network tools, this can enable SSRF, probing internal services, or unintended retrieval of sensitive/internal endpoints. Also increases attack surface for prompt-injection via fetched content.","recommendation":"Restrict network egress (allowlist domains), block private/IP ranges, sanitize/strip instructions from fetched pages, and require explicit tool permissioning for fetch operations."},{"category":"Data exfiltration","severity":"medium","evidence":"Examples and classification include \"URL fetching\" and \"API calls\"; no constraints on what URLs/APIs are permissible; no mention of redacting secrets in outputs.","risk":"If combined with tools that can read files/env/secrets, a routed agent could exfiltrate data via network calls or logs. The skill itself doesn’t instruct exfiltration, but it provides an operational pattern that could be abused in a tool-enabled environment.","recommendation":"Add explicit policy: never read credentials/secrets unless necessary; redact sensitive data; enforce tool-level guards (no fetching to arbitrary endpoints, no posting data externally)."},{"category":"Overly permissive allowed-tools","severity":"low","evidence":"The skill discusses \"tool orchestration\" and \"file read/write operations\" but does not specify any constraints (e.g., path allowlists, prohibiting destructive commands).","risk":"In systems where this skill is used as guidance, operators may over-trust Tier 1 agents with broad filesystem/tool access.","recommendation":"Document minimum-privilege: restrict filesystem scope, prohibit destructive operations by default, require confirmations for writes/deletes, and separate read vs write tool permissions."}],"reasoning":"The skill is primarily operational guidance for cost-based model routing and does not contain overt prompt-injection payloads, encoded commands, social engineering, or explicit malicious instructions. However, it implicitly encourages high-volume URL fetching and generic tool orchestration without guardrails, which can become risky in tool-enabled environments (SSRF/prompt-injection via web content/exfil pathways). The keyword-based escalation logic is also susceptible to manipulation to trigger premium models or higher-capability agents. With added constraints (tool allowlists, egress controls, confirmation gates, least privilege, secret-handling policy), risk would drop to PASS."}

Frontier model review complete. Human approval still required.

@whw whw closed this Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant