-
Notifications
You must be signed in to change notification settings - Fork 0
[Skill Submission] model-hierarchy #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Deploying everyskill with
|
| Latest commit: |
c5b00e5
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://19aec8cf.everyskill.pages.dev |
| Branch Preview URL: | https://skill-model-hierarchy-177086-e8j5.everyskill.pages.dev |
Frontier Model Security Review
Final Verdict: WARN Claude Opus 4.6 Analysis{"verdict": "PASS", "issues": [], "reasoning": "This skill is a straightforward informational guide for cost-optimizing AI model selection based on task complexity. It contains no prompt injection attempts, no hidden instructions, no encoded commands, and no 'ignore previous' directives. There are no dangerous tool usages - the skill does not invoke Bash, delete files, or access sensitive paths. The code examples are illustrative pseudocode and configuration snippets, not executable commands that could cause harm. There is no data exfiltration risk - no external URLs are contacted, no credentials are read or transmitted. The skill does not attempt social engineering or manipulation; it provides practical cost-optimization advice. It does not define any allowed-tools or grant overly permissive access. The only minor note is that the pricing table references 'Feb 2026' which may be speculative, but this is not a security concern. Overall, this is a benign, well-structured reference skill."}GPT-5.2 Analysis{
"verdict": "WARN",
"issues": [
{
"category": "Prompt injection",
"severity": "low",
"evidence": "No hidden/encoded directives or 'ignore previous instructions' patterns found. However, the skill includes behavioral directives like 'Default to Tier 2' and 'Switching to [premium model]' that could override a host system’s safety/cost policies if not constrained by the orchestrator."
},
{
"category": "Dangerous tool usage",
"severity": "medium",
"evidence": "Encourages Tier-1 sub-agents to perform 'File read/write operations', 'URL fetching and basic parsing', 'API calls with known parameters', and 'Fetch and parse these 50 URLs' without specifying sandboxing, path allowlists, command restrictions, or safe I/O policies."
},
{
"category": "Data exfiltration",
"severity": "medium",
"evidence": "Explicitly promotes URL fetching and API calls plus multi-agent batching. Without guardrails, a compromised prompt could steer sub-agents to transmit sensitive data to external endpoints. No mention of restricting destinations, blocking credentials, or redaction."
},
{
"category": "Overly permissive allowed-tools",
"severity": "medium",
"evidence": "Tooling is implied (file I/O, network fetch, tool orchestration) but not bounded (no allowlist of tools, no safe-mode defaults, no prohibition on shell commands, no constraints on filesystem scope)."
},
{
"category": "Social engineering/manipulation",
"severity": "low",
"evidence": "No explicit manipulation attempts. Some language could pressure upgrades ('needs Opus-level reasoning') but is not inherently malicious."
}
],
"reasoning": "The content is primarily operational guidance for cost-based model routing and does not contain overt prompt-injection payloads or encoded commands. The main risk is indirect: it normalizes broad file/network/tool actions and sub-agent spawning without specifying security boundaries (sandboxing, path/network allowlists, secrets handling, or tool restrictions). In an agentic environment, these omissions can enable prompt-driven misuse (e.g., reading sensitive files or exfiltrating via URL/API calls). Recommend adding explicit constraints: filesystem scope, secret redaction rules, approved domains/endpoints, rate limits, and a strict tool allowlist (and explicitly disallow shell/rm -rf patterns) for Tier-1 sub-agents."
}Frontier model review complete. Human approval still required. |
Collaborator
|
🚀 |
Frontier Model Security Review
Final Verdict: PASS Claude Opus 4.6 Analysis{"verdict": "PASS", "issues": [], "reasoning": "This skill is a straightforward informational/guidance document about cost-optimizing AI model selection based on task complexity. It contains no prompt injection attempts, no hidden instructions, no encoded commands, and no 'ignore previous' directives. There are no dangerous tool usages - the code examples are illustrative pseudocode and configuration snippets showing how to select models, not executable commands that delete files or access sensitive resources. There are no network calls to external URLs for data exfiltration, no attempts to read credentials or sensitive files, and no social engineering or manipulation attempts. The skill does not define any allowed-tools or invoke any tools directly - it is purely advisory content providing a decision framework for model routing. The integration examples reference standard configuration patterns (YAML config, Python helper functions) that are benign. The only minor note is that the pricing table references 'Feb 2026' which appears to be speculative/future-dated, but this is not a security concern."}GPT-5.2 Analysis{"verdict":"PASS","issues":[],"reasoning":"No prompt-injection patterns detected (no hidden/encoded directives, no \"ignore previous\" or instruction hierarchy bypass). Content is a cost/model-routing guideline and does not include executable payloads.\n\nTooling risk: Mentions benign examples of file operations, URL fetching, API calls, and spawning sub-agents, but provides no concrete dangerous commands (e.g., rm -rf), no unrestricted shell snippets, and no instructions to bypass safeguards. The YAML/Python examples select models only; they do not grant permissions or invoke tools directly.\n\nData exfiltration: References fetching URLs and parsing data as examples, but does not instruct harvesting credentials, reading secrets, or sending sensitive data to external endpoints.\n\nSocial engineering/manipulation: None observed.\n\nAllowed-tools permissiveness: No allowed-tools configuration present; no broad tool enablement directives. Overall the skill is operational guidance and appears non-malicious."}Frontier model review complete. Human approval still required. |
Collaborator
|
🚀 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New Skill Submission
Skill: model-hierarchy
Submitted by: anonymous
Source: https://github.com/zscole/model-hierarchy-skill
Files imported: 1
This PR was auto-generated from skills.every.to/submit.
AI security review will run automatically.