Closed
Conversation
Deploying everyskill with
|
| Latest commit: |
d41beaa
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://c9fb4f5a.everyskill.pages.dev |
| Branch Preview URL: | https://skill-model-hierarchy-177086-u0es.everyskill.pages.dev |
Frontier Model Security Review
Final Verdict: PASS Claude Opus 4.6 Analysis{"verdict": "PASS", "issues": [], "reasoning": "This skill is a straightforward reference document for cost-optimizing AI model selection based on task complexity. It contains no prompt injection attempts, no hidden instructions, no encoded commands, and no 'ignore previous' directives. There are no dangerous tool usages - the skill does not invoke Bash, delete files, or access sensitive files. The code examples are illustrative pseudocode and configuration snippets showing how to route tasks to different model tiers. There are no network calls to external URLs for data exfiltration, no attempts to read credentials or sensitive data, and no social engineering or manipulation tactics. The skill does not define any allowed-tools or request overly permissive permissions. It is purely an informational/advisory skill providing pricing tables, classification heuristics, and integration examples for model routing decisions. The spawn examples reference legitimate agent orchestration patterns within documented tool frameworks. No security concerns identified."}GPT-5.2 Analysis{"verdict":"PASS","issues":[],"reasoning":"No prompt-injection patterns detected (no hidden/encoded instructions, no attempts to override system/developer/user directives). Content is a cost/model-routing guideline and does not include instructions to perform unsafe actions.\n\nTooling risk: Mentions generic capabilities (file read/write, URL fetching, API calls, spawning sub-agents) but does not provide concrete dangerous commands (e.g., rm -rf), escalation pathways, or unrestricted tool directives. The example snippets are configuration/model-selection oriented and not executable attack payloads.\n\nData exfiltration: References fetching/parsing URLs and file operations as examples, but does not instruct collecting secrets, reading credential locations, or sending data to external endpoints.\n\nSocial engineering/manipulation: None observed.\n\nAllowed-tools permissiveness: Does not define an allowed-tools list or request broad tool permissions. Overall, the skill is operational guidance with minimal security exposure."}Frontier model review complete. Human approval still required. |
Collaborator
|
🚀 |
Collaborator
|
🚀 |
Frontier Model Security Review
Final Verdict: WARN Claude Opus 4.6 Analysis{"verdict": "PASS", "issues": [], "reasoning": "This skill is a straightforward informational/advisory document about cost-optimizing AI model selection. It contains no prompt injection attempts, no hidden instructions, no encoded commands, and no 'ignore previous' directives. There are no dangerous tool usages - the code examples are illustrative pseudocode and configuration snippets, not executable commands that could cause harm. There are no network calls to external URLs for data exfiltration, no attempts to read credentials or sensitive files, and no social engineering or manipulation. The skill does not request any tool permissions (no allowed-tools section). The content is purely educational: it provides pricing tables, task classification heuristics, a decision algorithm in pseudocode, cost comparisons, and integration examples. The configuration and code snippets shown (YAML config, Python function, CLI commands) are benign illustrations of how to set model preferences. The /model commands and sessions_spawn examples reference standard OpenClaw functionality for switching models, not arbitrary command execution. Overall, this is a clean, well-structured reference skill with no security concerns."}GPT-5.2 Analysis{"verdict":"WARN","issues":[{"category":"Prompt injection","severity":"low","evidence":"No explicit hidden/encoded instructions or \"ignore previous\" patterns. However, the skill encourages automatic model switching and sub-agent spawning based on heuristics (signals/keywords), which can be influenced by user-provided text (e.g., inserting \"security\" or \"debug\" to force Tier 3).","risk":"Adversary could steer model selection/cost or capability escalation via keyword manipulation; not direct compromise but can amplify impact of other attacks.","recommendation":"Require explicit user/parent-agent confirmation before upgrading tiers or spawning agents; don’t rely solely on keyword signals—combine with trusted task metadata and policy checks."},{"category":"Dangerous tool usage","severity":"medium","evidence":"Integration examples mention spawning sub-agents to \"Fetch and parse these 50 URLs\" and general \"URL fetching and basic parsing\" as routine work.","risk":"If the environment permits network tools, this can enable SSRF, probing internal services, or unintended retrieval of sensitive/internal endpoints. Also increases attack surface for prompt-injection via fetched content.","recommendation":"Restrict network egress (allowlist domains), block private/IP ranges, sanitize/strip instructions from fetched pages, and require explicit tool permissioning for fetch operations."},{"category":"Data exfiltration","severity":"medium","evidence":"Examples and classification include \"URL fetching\" and \"API calls\"; no constraints on what URLs/APIs are permissible; no mention of redacting secrets in outputs.","risk":"If combined with tools that can read files/env/secrets, a routed agent could exfiltrate data via network calls or logs. The skill itself doesn’t instruct exfiltration, but it provides an operational pattern that could be abused in a tool-enabled environment.","recommendation":"Add explicit policy: never read credentials/secrets unless necessary; redact sensitive data; enforce tool-level guards (no fetching to arbitrary endpoints, no posting data externally)."},{"category":"Overly permissive allowed-tools","severity":"low","evidence":"The skill discusses \"tool orchestration\" and \"file read/write operations\" but does not specify any constraints (e.g., path allowlists, prohibiting destructive commands).","risk":"In systems where this skill is used as guidance, operators may over-trust Tier 1 agents with broad filesystem/tool access.","recommendation":"Document minimum-privilege: restrict filesystem scope, prohibit destructive operations by default, require confirmations for writes/deletes, and separate read vs write tool permissions."}],"reasoning":"The skill is primarily operational guidance for cost-based model routing and does not contain overt prompt-injection payloads, encoded commands, social engineering, or explicit malicious instructions. However, it implicitly encourages high-volume URL fetching and generic tool orchestration without guardrails, which can become risky in tool-enabled environments (SSRF/prompt-injection via web content/exfil pathways). The keyword-based escalation logic is also susceptible to manipulation to trigger premium models or higher-capability agents. With added constraints (tool allowlists, egress controls, confirmation gates, least privilege, secret-handling policy), risk would drop to PASS."}Frontier model review complete. Human approval still required. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New Skill Submission
Skill: model-hierarchy
Submitted by: anonymous
Source: https://github.com/zscole/model-hierarchy-skill
Files imported: 1
This PR was auto-generated from skills.every.to/submit.
AI security review will run automatically.