OCPEDGE-2727: Add eval harness configs for cluster-diagnostic and threat-model skills#189
OCPEDGE-2727: Add eval harness configs for cluster-diagnostic and threat-model skills#189dhensel-rh wants to merge 9 commits into
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…templates Rename skill directories from *-threat-model to shorter names (e.g., tnf-threat-model → tnf) and extract duplicated report templates into a shared references/report-templates.md, reducing SKILL.md sizes by ~756 lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allow the report output directory to be set externally via $REPORT_DIR, skipping workspace discovery for report path. Enables eval harnesses and CI to control where reports are written. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eat-model skills Add evaluation configs, test cases, and README for two skills: - cluster-diagnostic: 5 cases covering validate and recovery-guide modes - threat-model-tnf: 5 cases covering PR security analysis Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The eval harness resolves dataset.path from the repo root, not relative to the config file. Both configs were using short relative paths that broke when running from different working directories. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add case-006-game-quiz with quiz mode test case and answers - Add warning_classification judge for expected WARNING findings - Add game_mode_scoring judge for rating/score validation - Fix forbidden_recommendations to check 'shutdown -h' (not 'shutdown -h 1') - Update severity_classification description for clarity - Drop models.skill default (let CLI --model flag control it) - Simplify schema note to only exclude diagnose mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add execution.env.REPORT_DIR to threat-model-tnf.yaml so reports are written to the eval workspace instead of external paths - Reframe README: scoring not testing, scenarios not test cases - Update cluster-diagnostic case count to 6 (game mode added) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhensel-rh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughThis PR introduces a comprehensive threat modeling plugin for Claude Code that enables STRIDE/DFD threat analysis across TNF, TNA, SNO, and LVMS OpenShift edge topologies. It includes topology-specific skills with ShellCheck integration, MITRE ATT&CK and OWASP mapping, evaluation harnesses, and test cases for both cluster-diagnostic and threat-model-tnf skills. ChangesThreat Model Plugin Infrastructure & Shared References
Threat Model Skills for Each Topology
Evaluation Harnesses & Test Cases
Sequence DiagramsequenceDiagram
participant User
participant ThreatModelSkill
participant ShellCheck
participant PRAnalysis
participant DFDMapper
participant MITREMapper
participant ReportGen
User->>ThreatModelSkill: /threat-model:tnf PR_URL
ThreatModelSkill->>ThreatModelSkill: discover workspace, resolve org/repo/PR
ThreatModelSkill->>ThreatModelSkill: fetch PR diff via gh CLI
ThreatModelSkill->>ShellCheck: scan shell scripts in diff
ShellCheck-->>ThreatModelSkill: issues with MITRE mappings
ThreatModelSkill->>PRAnalysis: detect security patterns (injection, creds, etc)
PRAnalysis-->>ThreatModelSkill: pattern categories + severities
ThreatModelSkill->>DFDMapper: map changes to DFD elements (P1-P8, DS1-DS5, etc)
DFDMapper-->>ThreatModelSkill: element mappings with STRIDE per-element
ThreatModelSkill->>MITREMapper: cross-reference elements to MITRE techniques
MITREMapper-->>ThreatModelSkill: technique IDs + OWASP categories
ThreatModelSkill->>ReportGen: generate threat-model markdown report
ReportGen-->>ThreatModelSkill: report with sections, tables, findings
ThreatModelSkill->>ThreatModelSkill: append to MITRE findings tracker
ThreatModelSkill-->>User: threat-model report + updated tracker
🎯 3 (Moderate) | ⏱️ ~25 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Comment |
|
@dhensel-rh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/threat-model/README.md`:
- Around line 57-78: Update the workspace layout example in README.md to reflect
the real .claude/skills structure: replace the suggested
.claude/skills/threat-model/ subdirectory with four top-level entries (symlinks)
directly under .claude/skills/ (including "threat-model" as a top-level symlink
and the three mitre-findings-*.md files), and adjust the tree diagram and
filenames (e.g., "threat-model", "mitre-findings-tnf.md",
"mitre-findings-tna.md", "mitre-findings-sno.md", "mitre-findings-lvms.md") so
users are pointed to the correct path.
In `@plugins/threat-model/references/owasp-reference.md`:
- Around line 24-96: The ATT&CK technique IDs in the OWASP crosswalk are
incorrect and must be corrected: review each mapping row (e.g., the entries
labeled "Path traversal", "SSRF", "Insecure Deserialization", "Logging Sensitive
Data", "Master schedulable" and any other mismatched rows in the "Pattern to
OWASP Mapping" and the TNF/TNA/SNO tables) against MITRE ATT&CK and replace the
wrong T-codes with the proper techniques; ensure the MITRE column for the rows
referenced in the diff (those exact labels) uses the canonical attack IDs from
attack.mitre.org and update any dependent PE-* or DFD notes if they rely on the
old technique semantics so downstream TNF/TNA/SNO consumers get consistent
mappings.
In `@plugins/threat-model/references/report-templates.md`:
- Around line 8-12: Filename patterns PR<number>-THREAT-MODEL-<repo>.md and
VULN-PR<number>-<short-desc>.md are not unique across topologies/skills; update
the naming templates to include topology and skill identifiers so reports can't
overwrite each other. Change the templates to something like
PR<number>-THREAT-MODEL-<repo>-<topology>-<skill>.md and
VULN-PR<number>-<short-desc>-<topology>-<skill>.md (or equivalent tokens used by
your generator), and update any code that emits these filenames to populate the
<topology> and <skill> tokens from the current run context (e.g., the same
variables used to tag runs).
In `@plugins/threat-model/skills/tna/SKILL.md`:
- Line 208: The SC2164 entry in SKILL.md contains a duplicated word "without" in
its description; edit the SC2164 line (the table cell containing "cd without
without error-exit guard - path traversal risk") to remove the extra "without"
so it reads "cd without error-exit guard - path traversal risk" (update the
SC2164 description text accordingly).
In `@plugins/threat-model/skills/tnf/SKILL.md`:
- Around line 175-177: The workflow text in SKILL.md promises OWASP Top 10
mapping but only lists a MITRE ATT&CK mapping step; either add an OWASP mapping
step or remove the claim. Fix by updating the SKILL.md steps to include a new
step (e.g., "Map findings to OWASP Top 10 (see `owasp-reference.md`)") and a
corresponding report/output line (e.g., "Generate OWASP mapping report at
`$REPORT_DIR/owasp-report.md`" and ensure the Append Protocol step mentions
including OWASP mapping in the findings block written to `$FINDINGS_FILE`), or
alternatively remove references to `owasp-reference.md` in the metadata so the
skill no longer advertises OWASP mapping.
In `@plugins/two-node/evals/cluster-diagnostic.md`:
- Around line 39-42: The "Eval scope" paragraph is stale: it states only
`validate` and `recovery-guide` are testable while the project config includes
`game` (and diagnostic) cases; update the text under the "Eval scope" heading to
accurately reflect current behavior by noting that `game` mode and `diagnose`
are present in the config but require special setup (e.g., live SSH for
`diagnose` and tool-interception or interactive AskUserQuestion handling for
`game`) and may be only partially or conditionally testable rather than entirely
out of scope; ensure you mention the mode names (`validate`, `recovery-guide`,
`diagnose`, `game`) so readers can map the wording to the config.
In `@plugins/two-node/evals/cluster-diagnostic.yaml`:
- Around line 168-201: The forbidden_recommendations check currently only looks
for "pcs node standby" and "shutdown -h" but must also detect variants
referenced in the policy; update the logic that builds recommend_sections (and
the subsequent checks over sec_lower) to also flag "pcs standby" and any
"standby" variants (e.g., "pcs standby", "standby node") and to detect
sequential-shutdown phrasing such as "sequential shutdown",
"sequential-shutdown", "shutdown nodes in sequence", or "one-by-one shutdown"
(and similar "in sequence"/"one at a time" patterns); also broaden the shutdown
match to catch forms like "shutdown -h 1", "shutdown -h now", and numbered
arguments. Keep using the same variables (conversation, ann, recommend_sections,
sec_lower, forbidden) and append appropriate forbidden messages like "pcs
standby recommended" and "sequential shutdown recommended" when these patterns
are found.
- Around line 99-115: The BLOCKER detection is too brittle: replace the
substring checks around conv_upper/has_blocker with proper word-boundary and
negation-aware matching (e.g., use a case-insensitive regex like \bBLOCKER\b and
also detect phrases like "no blocker" to avoid false positives) and change the
expected_blockers logic so all expected blockers must be present (not just any
one) — iterate expected_blockers and confirm each is found (case-insensitive,
word-boundary aware) in conversation, building a missing list and returning
failure if any expected blocker is missing; update references to conv_upper,
has_blocker, should_reject, expected_blockers, found_blockers and conversation
accordingly.
- Around line 146-167: The recovery-guide completeness check ignores the
schema's annotations.expected_scenario, so add a scenario validation: read
expected = annotations.get("expected_scenario") and actual =
outputs.get("scenario") (or derive a scenario tag from the conversation if
outputs lacks it), then add a new check like "scenario_matches": (not expected)
or (expected == actual) to the checks dict used in the procedure_completeness
logic; include that key in passed/total computation and report it in the failed
list so recovery-guide runs fail when the expected_scenario does not match the
actual scenario.
In `@plugins/two-node/evals/README.md`:
- Around line 39-75: The README.md has fenced code blocks without language
identifiers (the directory tree block and the command examples like
/eval-analyze, /eval-dataset, /eval-run, /eval-review, /eval-optimize), causing
markdownlint MD040 failures; fix by adding appropriate language tags: mark the
tree block as ```text and each command block as ```bash in
plugins/two-node/evals/README.md so the blocks become ```text for the evals/
tree and ```bash for the /eval-analyze, /eval-dataset, /eval-run, /eval-review,
and /eval-optimize command snippets.
In `@plugins/two-node/evals/threat-model-tnf.yaml`:
- Around line 149-152: The regex r'\b[XxNn/Aa~-]\b' incorrectly splits
multi-char markers like "N/A" and miscaptures standalone "~" and "-"—replace the
pattern used in the re.findall call to match whole markers and use
case-insensitive matching: change the pattern to r'\b(?:X|N/A|~|-)\b' and call
re.findall with flags=re.IGNORECASE (i.e., re.findall(r'\b(?:X|N/A|~|-)\b',
stride_section[1][:2000], flags=re.IGNORECASE)) so markers, including "N/A" and
"~"/"-", are captured as single tokens (referencing the markers variable, the
re.findall call, and stride_section).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 970f557a-09b9-4426-a24a-dc4bbc370f8f
📒 Files selected for processing (46)
.claude/skills/threat-model-lvms.claude/skills/threat-model-sno.claude/skills/threat-model-tna.claude/skills/threat-model-tnfplugins/threat-model/.claude-plugin/plugin.jsonplugins/threat-model/README.mdplugins/threat-model/references/mitre-findings-template.mdplugins/threat-model/references/mitre-reference.mdplugins/threat-model/references/owasp-reference.mdplugins/threat-model/references/report-templates.mdplugins/threat-model/skills/lvms/SKILL.mdplugins/threat-model/skills/lvms/dfd-elements-lvms.mdplugins/threat-model/skills/sno/SKILL.mdplugins/threat-model/skills/sno/dfd-elements-sno.mdplugins/threat-model/skills/tna/SKILL.mdplugins/threat-model/skills/tna/dfd-elements-tna.mdplugins/threat-model/skills/tnf/SKILL.mdplugins/threat-model/skills/tnf/dfd-elements-tnf.mdplugins/two-node/evals/README.mdplugins/two-node/evals/cluster-diagnostic.mdplugins/two-node/evals/cluster-diagnostic.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-001-validate-sequential-shutdown/annotations.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-001-validate-sequential-shutdown/input.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-002-validate-safe-redfish/annotations.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-002-validate-safe-redfish/input.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-003-recovery-full-shutdown/annotations.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-003-recovery-full-shutdown/input.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-004-recovery-standby/annotations.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-004-recovery-standby/input.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-005-validate-pcs-standby/annotations.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-005-validate-pcs-standby/input.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-006-game-quiz/annotations.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-006-game-quiz/answers.yamlplugins/two-node/evals/cluster-diagnostic/cases/case-006-game-quiz/input.yamlplugins/two-node/evals/threat-model-tnf.mdplugins/two-node/evals/threat-model-tnf.yamlplugins/two-node/evals/threat-model-tnf/cases/case-001-shell-script-k8s-api/annotations.yamlplugins/two-node/evals/threat-model-tnf/cases/case-001-shell-script-k8s-api/input.yamlplugins/two-node/evals/threat-model-tnf/cases/case-002-credential-rotation-script/annotations.yamlplugins/two-node/evals/threat-model-tnf/cases/case-002-credential-rotation-script/input.yamlplugins/two-node/evals/threat-model-tnf/cases/case-003-mac-fencing-lookup/annotations.yamlplugins/two-node/evals/threat-model-tnf/cases/case-003-mac-fencing-lookup/input.yamlplugins/two-node/evals/threat-model-tnf/cases/case-004-trivial-indentation-fix/annotations.yamlplugins/two-node/evals/threat-model-tnf/cases/case-004-trivial-indentation-fix/input.yamlplugins/two-node/evals/threat-model-tnf/cases/case-005-tnf-retry-bugfix/annotations.yamlplugins/two-node/evals/threat-model-tnf/cases/case-005-tnf-retry-bugfix/input.yaml
| ### Recommended workspace layout | ||
|
|
||
| ```text | ||
| your-workspace/ | ||
| ├── repos/ | ||
| │ ├── cluster-etcd-operator/ | ||
| │ ├── installer/ | ||
| │ ├── machine-config-operator/ | ||
| │ ├── resource-agents/ | ||
| │ ├── two-node-toolbox/ | ||
| │ │ └── docs/ | ||
| │ │ ├── TNF-THREAT-MODEL.md | ||
| │ │ └── TNA-THREAT-MODEL.md | ||
| │ └── ... | ||
| └── .claude/ | ||
| └── skills/ | ||
| ├── threat-model/ | ||
| ├── mitre-findings-tnf.md # Created automatically on first use | ||
| ├── mitre-findings-tna.md | ||
| ├── mitre-findings-sno.md | ||
| └── mitre-findings-lvms.md | ||
| ``` |
There was a problem hiding this comment.
Show the actual .claude/skills layout here.
The diagram currently suggests a .claude/skills/threat-model/ subdirectory, but the plugin is installed as four top-level symlinks directly under .claude/skills/. As written, this example points users at the wrong path.
🔧 Suggested fix
└── .claude/
└── skills/
- ├── threat-model/
+ ├── threat-model-tnf
+ ├── threat-model-tna
+ ├── threat-model-sno
+ ├── threat-model-lvms
├── mitre-findings-tnf.md # Created automatically on first use
├── mitre-findings-tna.md
├── mitre-findings-sno.md
└── mitre-findings-lvms.md🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/threat-model/README.md` around lines 57 - 78, Update the workspace
layout example in README.md to reflect the real .claude/skills structure:
replace the suggested .claude/skills/threat-model/ subdirectory with four
top-level entries (symlinks) directly under .claude/skills/ (including
"threat-model" as a top-level symlink and the three mitre-findings-*.md files),
and adjust the tree diagram and filenames (e.g., "threat-model",
"mitre-findings-tnf.md", "mitre-findings-tna.md", "mitre-findings-sno.md",
"mitre-findings-lvms.md") so users are pointed to the correct path.
| ## Pattern to OWASP Mapping | ||
|
|
||
| | Security Pattern | OWASP | MITRE | CWE | | ||
| |-----------------|-------|-------|-----| | ||
| | **Command Injection** | A05 | T1059 | CWE-78 | | ||
| | Shell exec with unsanitized input | A05 | T1059 | CWE-78 | | ||
| | fmt.Sprintf() building shell commands | A05 | T1059 | CWE-78 | | ||
| | **Hardcoded Credentials** | A07 | T1552 | CWE-798 | | ||
| | Passwords in source code | A07 | T1552 | CWE-798 | | ||
| | API keys in config files | A07 | T1552 | CWE-798 | | ||
| | **Broken Access Control** | A01 | T1078 | CWE-284 | | ||
| | Missing authorization checks | A01 | T1078 | CWE-285 | | ||
| | Path traversal | A01 | T1083 | CWE-22 | | ||
| | SSRF | A01 | T1046 | CWE-918 | | ||
| | **Cryptographic Failures** | A04 | T1573 | CWE-327 | | ||
| | Weak algorithms (MD5, SHA1) | A04 | T1573 | CWE-328 | | ||
| | Disabled TLS verification | A04 | T1557 | CWE-295 | | ||
| | InsecureSkipVerify = true | A04 | T1557 | CWE-295 | | ||
| | **Security Misconfiguration** | A02 | T1562 | CWE-16 | | ||
| | Debug mode in production | A02 | T1562 | CWE-489 | | ||
| | Privileged containers | A02 | T1611 | CWE-250 | | ||
| | **Insecure Deserialization** | A08 | T1059 | CWE-502 | | ||
| | pickle.loads(), yaml.load() | A08 | T1059 | CWE-502 | | ||
| | **Logging Sensitive Data** | A09 | T1005 | CWE-532 | | ||
| | Credentials in logs | A09 | T1005 | CWE-532 | | ||
| | **Missing Error Handling** | A10 | - | CWE-754 | | ||
| | Unchecked error returns | A10 | - | CWE-252 | | ||
| | Fail-open logic | A10 | T1562 | CWE-636 | | ||
|
|
||
| --- | ||
|
|
||
| ## TNF-Specific OWASP Mappings | ||
|
|
||
| | TNF Component | Risk | OWASP | MITRE | CWE | DFD Elements | PE-* IDs | | ||
| |---------------|------|-------|-------|-----|--------------|----------| | ||
| | BMC credentials in install-config | Hardcoded secrets | A07 | T1552 | CWE-798 | P1, DS1, DF1, DF2 | PE-P1-I-1, PE-DS1-I-1 | | ||
| | BMC password in shell command | Command injection | A05 | T1059 | CWE-78 | P5, DF9 | PE-P5-T-1, PE-P5-I-1 | | ||
| | Credentials in CIB XML | Plaintext storage | A04 | T1552 | CWE-312 | DS3, DF7 | PE-DS3-I-1, PE-DF7-I-1 | | ||
| | InsecureSkipVerify on BMC | Crypto failure | A04 | T1557 | CWE-295 | P8, DF10 | PE-P8-S-1, PE-DF10-T-1 | | ||
| | Privileged TNF setup pods | Misconfiguration | A02 | T1611 | CWE-250 | P3, P4, P5 | PE-P4-E-1, PE-P5-E-1 | | ||
| | fencing-credentials Secret | Access control | A01 | T1552 | CWE-284 | DS2, DF4 | PE-DS2-I-1, PE-DS2-T-1 | | ||
| | Corosync unencrypted | Crypto failure | A04 | T1557 | CWE-319 | EE3, DF12 | PE-EE3-S-1 | | ||
| | PCS token generation | Auth weakness | A07 | T1078 | CWE-330 | P3, DS4, DF5 | PE-P3-S-1, PE-DS4-I-1 | | ||
| | Credentials in CLI args | Info exposure | A07 | T1552 | CWE-214 | P6, P8, DF9 | PE-DF9-I-1, PE-P8-I-1 | | ||
| | No fencing audit trail | Logging failure | A09 | - | CWE-778 | P5, P6 | PE-P5-R-1, PE-P1-R-1 | | ||
|
|
||
| --- | ||
|
|
||
| ## TNA-Specific OWASP Mappings | ||
|
|
||
| | TNA Component | Risk | OWASP | MITRE | CWE | DFD Elements | PE-* IDs | | ||
| |---------------|------|-------|-------|-----|--------------|----------| | ||
| | Arbiter taint as sole scheduling protection | Misconfiguration | A02 | T1562 | CWE-250 | TNA-P3 | PE-TNA-P3-T-1 | | ||
| | Worker ignition token | Credential exposure | A07 | T1552 | CWE-798 | TNA-DS6 | PE-TNA-DS6-I-1 | | ||
| | Worker lateral movement to control plane | Access control | A01 | T1021 | CWE-284 | TNA-P5, TNA-DS5 | PE-TNA-P5-E-1 | | ||
| | etcd data on compromised node | Crypto failure | A04 | T1552 | CWE-312 | TNA-DS5 | PE-TNA-DS5-I-1 | | ||
| | Rogue worker CSR approval | Auth failure | A07 | T1078 | CWE-287 | TNA-P5, TNA-DS6 | PE-TNA-P5-S-1 | | ||
| | No arbiter taint drift alert | Logging failure | A09 | - | CWE-778 | TNA-P3 | PE-TNA-P3-T-1 | | ||
|
|
||
| --- | ||
|
|
||
| ## SNO-Specific OWASP Mappings | ||
|
|
||
| | SNO Component | Risk | OWASP | MITRE | CWE | DFD Elements | PE-* IDs | | ||
| |---------------|------|-------|-------|-----|--------------|----------| | ||
| | install-config with pull secret + offline token | Credential exposure | A07 | T1552 | CWE-798 | SNO-DS1 | PE-SNO-DS1-I-1 | | ||
| | Single-member etcd (no quorum) | Data loss / total compromise | A06 | T1485 | CWE-312 | SNO-DS3 | PE-SNO-DS3-I-1, PE-SNO-DS3-D-1 | | ||
| | UnsafeScalingStrategy bypasses quorum checks | Insecure design | A06 | T1562 | CWE-636 | SNO-P4 | PE-SNO-P4-D-1 | | ||
| | Bootstrap-in-place runs privileged on bare metal | Misconfiguration | A02 | T1611 | CWE-250 | SNO-P5 | PE-SNO-P5-E-1 | | ||
| | Master schedulable (workloads on control plane) | Access control | A01 | T1610 | CWE-284 | SNO-P6 | PE-SNO-P6-E-1 | | ||
| | Kubeconfig + kubeadmin-password on admin workstation | Credential exposure | A07 | T1552 | CWE-522 | SNO-DS4 | PE-SNO-DS4-I-1 | | ||
| | Discovery ISO integrity | Supply chain | A03 | T1195 | CWE-494 | SNO-DS2 | PE-SNO-DS2-T-1 | | ||
|
|
There was a problem hiding this comment.
Correct the ATT&CK crosswalk before using this as the shared reference.
Several rows pair the stated risk with unrelated ATT&CK techniques, e.g. Path traversal → T1083, SSRF → T1046, Insecure deserialization → T1059, Logging sensitive data → T1005, and Master schedulable → T1610. MITRE defines those IDs as File and Directory Discovery, Network Service Discovery, Command and Scripting Interpreter, Data from Local System, and Deploy Container, respectively. (attack.mitre.org)
Because the TNF/TNA/SNO skills consume this file as their shared OWASP mapping, these mistakes will flow straight into reports.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/threat-model/references/owasp-reference.md` around lines 24 - 96, The
ATT&CK technique IDs in the OWASP crosswalk are incorrect and must be corrected:
review each mapping row (e.g., the entries labeled "Path traversal", "SSRF",
"Insecure Deserialization", "Logging Sensitive Data", "Master schedulable" and
any other mismatched rows in the "Pattern to OWASP Mapping" and the TNF/TNA/SNO
tables) against MITRE ATT&CK and replace the wrong T-codes with the proper
techniques; ensure the MITRE column for the rows referenced in the diff (those
exact labels) uses the canonical attack IDs from attack.mitre.org and update any
dependent PE-* or DFD notes if they rely on the old technique semantics so
downstream TNF/TNA/SNO consumers get consistent mappings.
| ## Report Naming Convention | ||
|
|
||
| - **Full threat model**: `PR<number>-THREAT-MODEL-<repo>.md` | ||
| - **Individual vuln**: `VULN-PR<number>-<short-desc>.md` | ||
|
|
There was a problem hiding this comment.
Include the topology/skill in the report filenames.
PR<number>-THREAT-MODEL-<repo>.md and VULN-PR<number>-<short-desc>.md are not unique across TNF/TNA/SNO/LVMS. In the shared eval workspace, two runs for the same PR/repo can overwrite each other’s reports.
Proposed fix
- **Full threat model**: `PR<number>-THREAT-MODEL-<repo>.md`
- **Individual vuln**: `VULN-PR<number>-<short-desc>.md`
+ **Full threat model**: `PR<number>-THREAT-MODEL-<topology>-<repo>.md`
+ **Individual vuln**: `VULN-PR<number>-<topology>-<short-desc>.md`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## Report Naming Convention | |
| - **Full threat model**: `PR<number>-THREAT-MODEL-<repo>.md` | |
| - **Individual vuln**: `VULN-PR<number>-<short-desc>.md` | |
| ## Report Naming Convention | |
| - **Full threat model**: `PR<number>-THREAT-MODEL-<topology>-<repo>.md` | |
| - **Individual vuln**: `VULN-PR<number>-<topology>-<short-desc>.md` | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/threat-model/references/report-templates.md` around lines 8 - 12,
Filename patterns PR<number>-THREAT-MODEL-<repo>.md and
VULN-PR<number>-<short-desc>.md are not unique across topologies/skills; update
the naming templates to include topology and skill identifiers so reports can't
overwrite each other. Change the templates to something like
PR<number>-THREAT-MODEL-<repo>-<topology>-<skill>.md and
VULN-PR<number>-<short-desc>-<topology>-<skill>.md (or equivalent tokens used by
your generator), and update any code that emits these filenames to populate the
<topology> and <skill> tokens from the current run context (e.g., the same
variables used to tag runs).
| | SC2029 | Warning | ssh command with unescaped variables | T1059 | | ||
| | SC2087 | Warning | Unquoted heredoc - variable expansion | T1059 | | ||
| | SC2155 | Warning | Declare/assign separately to avoid masking errors | - | | ||
| | SC2164 | Warning | cd without without error-exit guard - path traversal risk | T1083 | |
There was a problem hiding this comment.
Fix the duplicated "without" in the ShellCheck note.
This is a small but visible docs typo in the SC2164 description.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/threat-model/skills/tna/SKILL.md` at line 208, The SC2164 entry in
SKILL.md contains a duplicated word "without" in its description; edit the
SC2164 line (the table cell containing "cd without without error-exit guard -
path traversal risk") to remove the extra "without" so it reads "cd without
error-exit guard - path traversal risk" (update the SC2164 description text
accordingly).
Source: Linters/SAST tools
| 10. **Map findings to MITRE ATT&CK** techniques (see `$PLUGIN_DIR/references/mitre-reference.md`) | ||
| 11. **Generate report** at `$REPORT_DIR/` | ||
| 12. **Append findings to tracker** — follow the Append Protocol to write a findings block to `$FINDINGS_FILE` |
There was a problem hiding this comment.
Add the promised OWASP mapping step.
The workflow advertises OWASP Top 10 mapping and lists owasp-reference.md, but the instructions only ever map findings to MITRE ATT&CK. As written, the OWASP part of the report will never be produced. Add an OWASP-specific step/report section or remove the OWASP claim from the skill metadata.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/threat-model/skills/tnf/SKILL.md` around lines 175 - 177, The
workflow text in SKILL.md promises OWASP Top 10 mapping but only lists a MITRE
ATT&CK mapping step; either add an OWASP mapping step or remove the claim. Fix
by updating the SKILL.md steps to include a new step (e.g., "Map findings to
OWASP Top 10 (see `owasp-reference.md`)") and a corresponding report/output line
(e.g., "Generate OWASP mapping report at `$REPORT_DIR/owasp-report.md`" and
ensure the Append Protocol step mentions including OWASP mapping in the findings
block written to `$FINDINGS_FILE`), or alternatively remove references to
`owasp-reference.md` in the metadata so the skill no longer advertises OWASP
mapping.
| conv_upper = conversation.upper() | ||
| has_blocker = "BLOCKER" in conv_upper | ||
|
|
||
| if should_reject and not has_blocker: | ||
| return (False, "Procedure should have been rejected with BLOCKER but no BLOCKER found") | ||
|
|
||
| if not should_reject and has_blocker: | ||
| return (False, "Procedure should NOT have BLOCKER findings but BLOCKER was found") | ||
|
|
||
| found_blockers = [] | ||
| for b in expected_blockers: | ||
| if b.lower() in conversation.lower(): | ||
| found_blockers.append(b) | ||
|
|
||
| if expected_blockers and not found_blockers: | ||
| return (False, f"Expected blockers {expected_blockers} not found in output") | ||
|
|
There was a problem hiding this comment.
BLOCKER detection logic can mis-score valid/invalid outputs.
Using "BLOCKER" in conversation causes false positives (e.g., “No BLOCKER findings”), and expected blockers pass even if only one of multiple expected blockers is mentioned.
💡 Suggested patch
- conv_upper = conversation.upper()
- has_blocker = "BLOCKER" in conv_upper
+ conv_lower = conversation.lower()
+ blocker_lines = [
+ ln for ln in conv_lower.splitlines()
+ if "blocker" in ln and "no blocker" not in ln
+ ]
+ has_blocker = len(blocker_lines) > 0
@@
- found_blockers = []
- for b in expected_blockers:
- if b.lower() in conversation.lower():
- found_blockers.append(b)
-
- if expected_blockers and not found_blockers:
- return (False, f"Expected blockers {expected_blockers} not found in output")
+ found_blockers = [b for b in expected_blockers if b.lower() in conv_lower]
+ missing_blockers = [b for b in expected_blockers if b.lower() not in conv_lower]
+ if missing_blockers:
+ return (False, f"Expected blockers missing: {missing_blockers}. Found: {found_blockers}")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/two-node/evals/cluster-diagnostic.yaml` around lines 99 - 115, The
BLOCKER detection is too brittle: replace the substring checks around
conv_upper/has_blocker with proper word-boundary and negation-aware matching
(e.g., use a case-insensitive regex like \bBLOCKER\b and also detect phrases
like "no blocker" to avoid false positives) and change the expected_blockers
logic so all expected blockers must be present (not just any one) — iterate
expected_blockers and confirm each is found (case-insensitive, word-boundary
aware) in conversation, building a missing list and returning failure if any
expected blocker is missing; update references to conv_upper, has_blocker,
should_reject, expected_blockers, found_blockers and conversation accordingly.
| if: "annotations.get('mode') == 'recovery-guide'" | ||
| check: | | ||
| conversation = outputs.get("conversation", "") | ||
|
|
||
| if not conversation: | ||
| return (False, "No conversation output found") | ||
|
|
||
| checks = { | ||
| "bash_commands": any(marker in conversation for marker in ["```bash", "```sh", "curl ", "pcs ", "oc "]), | ||
| "has_verification": any(w in conversation.lower() for w in ["verify", "confirm", "check", "poll", "wait"]), | ||
| "has_parameters": any(p in conversation for p in ["$BMC", "$NODE", "BMC_USER", "BMC_PASS", "BMC_HOST"]), | ||
| } | ||
|
|
||
| passed = sum(checks.values()) | ||
| total = len(checks) | ||
| failed = [k for k, v in checks.items() if not v] | ||
|
|
||
| if passed == total: | ||
| return (True, f"All {total} completeness checks passed") | ||
| else: | ||
| return (False, f"{passed}/{total} checks passed. Missing: {failed}") | ||
|
|
There was a problem hiding this comment.
expected_scenario is defined in schema but never enforced in recovery checks.
Recovery-guide cases can pass completeness while returning the wrong scenario flow, because procedure_completeness doesn’t validate annotations.expected_scenario.
💡 Suggested patch
check: |
conversation = outputs.get("conversation", "")
+ ann = outputs.get("annotations", {})
+ expected_scenario = ann.get("expected_scenario")
@@
if not conversation:
return (False, "No conversation output found")
+
+ if expected_scenario and expected_scenario.lower() not in conversation.lower():
+ return (False, f"Expected scenario '{expected_scenario}' not referenced in response")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/two-node/evals/cluster-diagnostic.yaml` around lines 146 - 167, The
recovery-guide completeness check ignores the schema's
annotations.expected_scenario, so add a scenario validation: read expected =
annotations.get("expected_scenario") and actual = outputs.get("scenario") (or
derive a scenario tag from the conversation if outputs lacks it), then add a new
check like "scenario_matches": (not expected) or (expected == actual) to the
checks dict used in the procedure_completeness logic; include that key in
passed/total computation and report it in the failed list so recovery-guide runs
fail when the expected_scenario does not match the actual scenario.
| - name: forbidden_recommendations | ||
| description: | | ||
| Checks that the skill never recommends procedures that violate known | ||
| failure modes: pcs node standby, sequential shutdown, shutdown -h. | ||
| check: | | ||
| conversation = outputs.get("conversation", "") | ||
| ann = outputs.get("annotations", {}) | ||
| mode = ann.get("mode", "") | ||
|
|
||
| if not conversation: | ||
| return (False, "No conversation output found") | ||
|
|
||
| # Only check recommendations sections, not quoted failure descriptions | ||
| forbidden = [] | ||
|
|
||
| # Check if skill RECOMMENDS (not just mentions) dangerous procedures | ||
| conv_lower = conversation.lower() | ||
|
|
||
| recommend_sections = [] | ||
| for marker in ["recommend", "suggested", "recovery", "procedure", "steps to"]: | ||
| idx = conv_lower.find(marker) | ||
| if idx >= 0: | ||
| recommend_sections.append(conversation[idx:idx+500]) | ||
|
|
||
| for section in recommend_sections: | ||
| sec_lower = section.lower() | ||
| if "pcs node standby" in sec_lower and "never" not in sec_lower and "do not" not in sec_lower: | ||
| forbidden.append("pcs node standby recommended") | ||
| if "shutdown -h" in sec_lower and "never" not in sec_lower and "do not" not in sec_lower: | ||
| forbidden.append("shutdown -h 1 recommended") | ||
|
|
||
| if forbidden: | ||
| return (False, f"Forbidden recommendations found: {forbidden}") | ||
| return (True, "No forbidden procedures recommended") |
There was a problem hiding this comment.
Forbidden-procedure guard is incomplete against stated policy.
The description forbids sequential shutdown and standby recommendations, but the implementation only checks pcs node standby and shutdown -h; it misses sequential-shutdown phrasing and standby variants like pcs standby.
💡 Suggested patch
for section in recommend_sections:
sec_lower = section.lower()
- if "pcs node standby" in sec_lower and "never" not in sec_lower and "do not" not in sec_lower:
+ if ("pcs node standby" in sec_lower or "pcs standby" in sec_lower) and "never" not in sec_lower and "do not" not in sec_lower:
forbidden.append("pcs node standby recommended")
+ if "sequential shutdown" in sec_lower and "never" not in sec_lower and "do not" not in sec_lower:
+ forbidden.append("sequential shutdown recommended")
if "shutdown -h" in sec_lower and "never" not in sec_lower and "do not" not in sec_lower:
forbidden.append("shutdown -h 1 recommended")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/two-node/evals/cluster-diagnostic.yaml` around lines 168 - 201, The
forbidden_recommendations check currently only looks for "pcs node standby" and
"shutdown -h" but must also detect variants referenced in the policy; update the
logic that builds recommend_sections (and the subsequent checks over sec_lower)
to also flag "pcs standby" and any "standby" variants (e.g., "pcs standby",
"standby node") and to detect sequential-shutdown phrasing such as "sequential
shutdown", "sequential-shutdown", "shutdown nodes in sequence", or "one-by-one
shutdown" (and similar "in sequence"/"one at a time" patterns); also broaden the
shutdown match to catch forms like "shutdown -h 1", "shutdown -h now", and
numbered arguments. Keep using the same variables (conversation, ann,
recommend_sections, sec_lower, forbidden) and append appropriate forbidden
messages like "pcs standby recommended" and "sequential shutdown recommended"
when these patterns are found.
| ``` | ||
| evals/ | ||
| ├── <skill-name>.yaml # Eval config (judges, thresholds, schema) | ||
| ├── <skill-name>.md # Cached skill analysis | ||
| └── <skill-name>/ | ||
| └── cases/ | ||
| └── case-NNN-<slug>/ | ||
| ├── input.yaml # Scenario input | ||
| └── annotations.yaml # Expected outcomes | ||
| ``` | ||
|
|
||
| ## Adding a New Eval | ||
|
|
||
| 1. **Analyze the skill** — reads SKILL.md, designs judges, writes the eval config | ||
| ``` | ||
| /eval-analyze --skill <name> --config evals/<name>.yaml | ||
| ``` | ||
|
|
||
| 2. **Generate scenarios** — creates `input.yaml` + `annotations.yaml` per case | ||
| ``` | ||
| /eval-dataset --config evals/<name>.yaml | ||
| ``` | ||
|
|
||
| 3. **Run the eval** — executes the skill against each case, scores with judges, generates HTML report | ||
| ``` | ||
| /eval-run --model claude-opus-4-6 --config evals/<name>.yaml | ||
| ``` | ||
|
|
||
| 4. **Review results** — walk through cases, collect human feedback | ||
| ``` | ||
| /eval-review --run-id <run-id> --config evals/<name>.yaml | ||
| ``` | ||
|
|
||
| 5. **(Optional) Optimize** — auto-fix SKILL.md based on judge failures, re-run to verify | ||
| ``` | ||
| /eval-optimize --config evals/<name>.yaml | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced blocks to satisfy markdownlint consistently.
Several fenced blocks omit a language tag (directory tree + command snippets), which triggers MD040 and can break lint-gated CI.
💡 Suggested patch
-```
+```text
evals/
├── <skill-name>.yaml # Eval config (judges, thresholds, schema)
├── <skill-name>.md # Cached skill analysis
└── <skill-name>/
└── cases/
└── case-NNN-<slug>/
├── input.yaml # Scenario input
└── annotations.yaml # Expected outcomes@@
-
/eval-analyze --skill <name> --config evals/<name>.yaml
@@
-
/eval-dataset --config evals/<name>.yaml
@@
-
/eval-run --model claude-opus-4-6 --config evals/<name>.yaml
@@
-
/eval-review --run-id <run-id> --config evals/<name>.yaml
@@
-
/eval-optimize --config evals/<name>.yaml
</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 39-39: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
---
[warning] 53-53: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
---
[warning] 58-58: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
---
[warning] 63-63: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
---
[warning] 68-68: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
---
[warning] 73-73: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @plugins/two-node/evals/README.md around lines 39 - 75, The README.md has
fenced code blocks without language identifiers (the directory tree block and
the command examples like /eval-analyze, /eval-dataset, /eval-run, /eval-review,
/eval-optimize), causing markdownlint MD040 failures; fix by adding appropriate
language tags: mark the tree block as text and each command block as bash
in plugins/two-node/evals/README.md so the blocks become text for the evals/ tree and bash for the /eval-analyze, /eval-dataset, /eval-run, /eval-review,
and /eval-optimize command snippets.
</details>
<!-- fingerprinting:phantom:poseidon:hawk -->
<!-- cr-comment:v1:67e0db102e5477d236202d1a -->
_Source: Linters/SAST tools_
<!-- This is an auto-generated comment by CodeRabbit -->
| markers = re.findall(r'\b[XxNn/Aa~-]\b', stride_section[1][:2000]) | ||
| if len(markers) < 3: | ||
| return (False, f"STRIDE matrix appears empty or minimal ({len(markers)} markers)") | ||
| return (True, f"STRIDE matrix populated ({len(markers)} cell markers found)") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import re
sample = "X ~ - N/A x n/a"
bad = re.findall(r'\b[XxNn/Aa~-]\b', sample)
good = re.findall(r'(?i)\b(?:x|~|-|n/?a)\b', sample)
print("bad:", bad)
print("good:", good)
PYRepository: openshift-eng/edge-tooling
Length of output: 147
Fix STRIDE marker regex to correctly capture N/A and standalone ~/- cells
r'\b[XxNn/Aa~-]\b' splits N/A and misses ~ and - (e.g., "X ~ - N/A x n/a" yields ['X', 'N', '/', 'A', 'x', 'n', '/', 'a'] instead of treating N/A as one marker).
Suggested fix
- markers = re.findall(r'\b[XxNn/Aa~-]\b', stride_section[1][:2000])
+ markers = re.findall(r'(?i)(?<![A-Za-z0-9])(?:x|~|-|n/?a)(?![A-Za-z0-9])', stride_section[1][:2000])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| markers = re.findall(r'\b[XxNn/Aa~-]\b', stride_section[1][:2000]) | |
| if len(markers) < 3: | |
| return (False, f"STRIDE matrix appears empty or minimal ({len(markers)} markers)") | |
| return (True, f"STRIDE matrix populated ({len(markers)} cell markers found)") | |
| markers = re.findall(r'(?i)(?<![A-Za-z0-9])(?:x|~|-|n/?a)(?![A-Za-z0-9])', stride_section[1][:2000]) | |
| if len(markers) < 3: | |
| return (False, f"STRIDE matrix appears empty or minimal ({len(markers)} markers)") | |
| return (True, f"STRIDE matrix populated ({len(markers)} cell markers found)") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/two-node/evals/threat-model-tnf.yaml` around lines 149 - 152, The
regex r'\b[XxNn/Aa~-]\b' incorrectly splits multi-char markers like "N/A" and
miscaptures standalone "~" and "-"—replace the pattern used in the re.findall
call to match whole markers and use case-insensitive matching: change the
pattern to r'\b(?:X|N/A|~|-)\b' and call re.findall with flags=re.IGNORECASE
(i.e., re.findall(r'\b(?:X|N/A|~|-)\b', stride_section[1][:2000],
flags=re.IGNORECASE)) so markers, including "N/A" and "~"/"-", are captured as
single tokens (referencing the markers variable, the re.findall call, and
stride_section).
|
Closing — eval configs are tracked in #178 (OCPEDGE-2727). |
Summary
cluster-diagnosticandthreat-model:tnfskillsREPORT_DIRin threat-model-tnf eval so reports write to the eval workspaceDepends on #188.
Replaces #178 (could not reopen after force-push).
Test plan
/eval-run --model claude-opus-4-6 --config evals/cluster-diagnostic.yaml/eval-run --model claude-opus-4-6 --config evals/threat-model-tnf.yamlreports/dir)npx markdownlint-cli2 '**/*.md'to confirm no lint violations🤖 Generated with Claude Code