Skip to content

OBSINTA-1287: Agentic test iteration skills and Docker sandbox#879

Open
DavidRajnoha wants to merge 6 commits intoopenshift:mainfrom
DavidRajnoha:infra/OBSINTA-1287-agentic-test-iteration
Open

OBSINTA-1287: Agentic test iteration skills and Docker sandbox#879
DavidRajnoha wants to merge 6 commits intoopenshift:mainfrom
DavidRajnoha:infra/OBSINTA-1287-agentic-test-iteration

Conversation

@DavidRajnoha
Copy link
Copy Markdown
Contributor

@DavidRajnoha DavidRajnoha commented Apr 15, 2026

Summary

  • Add four Claude Code skills for agentic Cypress test iteration:
    /analyze-ci-results, /diagnose-test-failure, /iterate-incident-tests, and
    /iterate-ci-flaky
  • Add supporting scripts for CI polling, Slack notifications, and GitHub PR
    review
  • Add Docker-based sandbox for isolated Claude Code agent execution with
    security policy and blast radius documentation
  • Reorganize .claude/commands/ directory structure into
    cypress/test-iteration/ and cypress/test-development/ subdirectories

Summary by CodeRabbit

  • New Features

    • Docker-based isolated sandbox for running agentic test tooling and local CI-like runs
    • End-to-end agentic test-iteration system with CI analysis and automated fix/validate loops
    • Slack notifications and GitHub PR comment review integrations for iterative test fixes
    • CLI helpers to poll CI status and clean test artifacts
  • Documentation

    • Detailed sandbox, security, and test-iteration architecture guides
    • Test stability ledger report added for tracking stability across runs

DavidRajnoha and others added 4 commits April 15, 2026 15:47
Add architecture documentation for the agentic test iteration system
and a roadmap with future improvement ideas including Slack notifications,
cloud execution options, and interaction models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add four Claude Code skills for agentic test iteration:
- /analyze-ci-results: analyze OpenShift CI (Prow) test results
- /diagnose-test-failure: diagnose Cypress test failures
- /iterate-incident-tests: autonomous local test iteration loop
- /iterate-ci-flaky: CI-based flaky test iteration loop

Supporting scripts:
- poll-ci-status.py: poll Prow CI job status
- notify-slack.py: Slack notifications for CI results
- review-github.py: GitHub PR comment review flow

Also adds test stability ledger for tracking test reliability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a Docker sandbox environment for running Claude Code agents in
isolation with controlled blast radius. Includes:
- Dockerfile and run script for containerized execution
- Security policy (sandbox-policy.yaml) for resource limits
- Blast radius analysis documentation
- Setup guide and Bun crash workaround report

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename cypress-run.md → run.md, cypress-setup.md → setup.md
- Move test development commands to cypress/test-development/ as symlinks
- Remove old command symlinks from commands root
- Add clean-test-artifacts.sh utility script

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 15, 2026

@DavidRajnoha: This pull request references OBSINTA-1287 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Add four Claude Code skills for agentic Cypress test iteration:
    /analyze-ci-results, /diagnose-test-failure, /iterate-incident-tests, and
    /iterate-ci-flaky
  • Add supporting scripts for CI polling, Slack notifications, and GitHub PR
    review
  • Add Docker-based sandbox for isolated Claude Code agent execution with
    security policy and blast radius documentation
  • Reorganize .claude/commands/ directory structure into
    cypress/test-iteration/ and cypress/test-development/ subdirectories

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from etmurasaki and zhuje April 15, 2026 13:52
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: DavidRajnoha
Once this PR has been reviewed and has the lgtm label, please assign jgbernalp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 15, 2026

@DavidRajnoha: This pull request references OBSINTA-1287 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Add four Claude Code skills for agentic Cypress test iteration:
    /analyze-ci-results, /diagnose-test-failure, /iterate-incident-tests, and
    /iterate-ci-flaky
  • Add supporting scripts for CI polling, Slack notifications, and GitHub PR
    review
  • Add Docker-based sandbox for isolated Claude Code agent execution with
    security policy and blast radius documentation
  • Reorganize .claude/commands/ directory structure into
    cypress/test-iteration/ and cypress/test-development/ subdirectories

Summary by CodeRabbit

Release Notes

  • New Features

  • Added isolated Docker sandbox environment for running tests with controlled security permissions

  • Added CI test analysis capabilities to automatically diagnose failures from build artifacts

  • Added interactive test iteration workflow for fixing failing and flaky tests locally and on CI

  • Added Slack notification integration for monitoring long-running test iterations

  • Added GitHub PR comment-based review interface for collaborative test fixes

  • Documentation

  • Added comprehensive guides for Docker sandbox setup and security considerations

  • Added architectural documentation for the test iteration system

  • Added test stability ledger for tracking test health across iterations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 15, 2026

Walkthrough

Replaces the old Cypress setup doc with a non-interactive setup workflow, renames cypress-run to run, and adds a comprehensive agentic test-iteration system (CI analysis, failure diagnosis, automated fix loops). Introduces Docker sandbox tooling, sandbox policy, utility scripts, test-stability ledger, and multiple supporting docs and scripts.

Changes

Cohort / File(s) Summary
Cypress Command Restructuring
/.claude/commands/cypress/.../cypress-setup.md (deleted), /.claude/commands/cypress/setup.md, /.claude/commands/cypress/run.md
Removed old interactive cypress-setup doc; added new non-interactive setup command that creates web/cypress/export-env.sh, validates Node and credentials, and instructs sourcing; renamed command metadata cypress-runrun and updated cross-reference /cypress-setup/cypress:setup.
Cypress Test Development Redirects
/.claude/commands/cypress/test-development/fixture-schema-reference.md, .../generate-incident-fixture.md, .../generate-regression-test.md, .../refactor-regression-test.md, .../validate-incident-fixtures.md
Added lightweight redirect/reference files pointing to .cursor/commands/* counterparts under the new test-development namespace.
Removed Old Redirects
/.claude/commands/fixture-schema-reference.md, /.claude/commands/generate-incident-fixture.md, /.claude/commands/generate-regression-test.md, /.claude/commands/refactor-regression-test.md, /.claude/commands/validate-incident-fixtures.md
Deleted five legacy redirect files that previously pointed to ../../.cursor/commands/*.
Cypress Test-Iteration Commands
/.claude/commands/cypress/test-iteration/analyze-ci-results.md, .../diagnose-test-failure.md, .../iterate-ci-flaky.md, .../iterate-incident-tests.md
Added large agentic command docs: CI artifact analysis/parsing and classification; multimodal test-failure diagnosis; flaky-test iteration; end-to-end incident-test iteration with diagnose→fix→validate loops and stability ledger update rules.
Test-Iteration Utility Scripts
/.claude/commands/cypress/test-iteration/scripts/notify-slack.py, .../poll-ci-status.py, .../review-github.py
Added three executable Python scripts: Slack notifications+thread-reply polling; GitHub PR check polling via gh; PR comment post + author-reply wait for two-phase review.
Productize Iterations Doc
/.claude/commands/cypress/test-iteration/productize-iterations.md
Added workflow doc for consolidating agent branches into a clean branch with overlap analysis, staged application, and verification including flakiness probes.
Sandbox Infrastructure
sandbox/Dockerfile, sandbox/run.sh, sandbox-start.sh, sandbox-policy.yaml
Added Docker sandbox image/build/run scripts and a sandbox policy manifest: Node image with Claude Code, gh/oc, non-root user; run script clones branch into isolated worktree and mounts scoped credentials; policy enumerates filesystem and per-binary network allowlists.
Agentic Development Docs
docs/agentic-development/architecture/test-iteration-system.md, docs/agentic-development/architecture/security/docker-sandbox-blast-radius.md
Added architecture for the agentic test-iteration system and a blast-radius/threat-model doc for Docker sandbox credential exposure and mitigations.
Sandbox Roadmap & Guides
docs/agentic-development/setup/docker-sandbox-guide.md, docs/agentic-development/roadmap/openshell/sandbox-bun-crash-report.md, docs/agentic-development/roadmap/openshell/sandbox-setup-guide.md, docs/agentic-development/roadmap/test-iteration-ideas.md
Added Docker sandbox quick-start and credential guidance, recorded deprecated openshell approach and crash report, openshell setup guide (deprecated), and a roadmap of test-iteration enhancements and options.
Test Utilities & Reports
web/cypress/reports/test-stability.md, web/scripts/clean-test-artifacts.sh
Added a test-stability ledger markdown with machine-readable JSON block and a cleanup script to remove Cypress reports/screenshots/videos.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Agent as Iterate Agent
    participant Git as Git Server
    participant Cypress as Cypress Runner
    participant Parser as Report Parser
    participant Diagnoser as Diagnosis Agent
    participant Fixer as Fix Agent
    participant Verifier as Validation Step
    participant Ledger as Stability Ledger

    User->>Agent: /iterate-incident-tests (target, params)
    Agent->>Git: create/checkout working branch
    Agent->>Ledger: read stability ledger
    Agent->>Cypress: run tests (headless, filtered)
    Cypress-->>Agent: mochawesome JSON + screenshots
    Agent->>Parser: parse results -> failures list
    Parser-->>Agent: failing tests
    loop per failing test
        Agent->>Diagnoser: analyze (screenshot, spec, fixture, error)
        Diagnoser-->>Agent: classification + fix proposal
        alt fixable
            Agent->>Fixer: apply constrained fix
            Fixer->>Git: commit changes
            Agent->>Cypress: rerun failing tests
            Cypress-->>Agent: updated results
            Agent->>Verifier: validate (bounded retries)
        else blocking
            Agent->>User: report blocking issue
        end
    end
    Agent->>Cypress: flakiness probe (N runs)
    Cypress-->>Agent: probe results
    Agent->>Ledger: update ledger (JSON block + run history)
    Agent->>Git: commit ledger updates
    Agent-->>User: final structured report
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main changes: agentic test iteration skills and Docker sandbox infrastructure for the monitoring-plugin project.
Docstring Coverage ✅ Passed Docstring coverage is 82.35% which is sufficient. The required threshold is 80.00%.
Stable And Deterministic Test Names ✅ Passed The custom check for stable and deterministic Ginkgo test names is not applicable to this pull request. This PR adds and modifies only documentation files (Markdown), shell scripts, Python utilities, Docker configuration, and YAML policy files—no Go test files (*.go) containing Ginkgo tests are introduced or modified.
Test Structure And Quality ✅ Passed This PR does not contain any Ginkgo test code modifications. All modified files are documentation, scripts, and configuration files—no Go test files were changed.
Microshift Test Compatibility ✅ Passed This PR does not add any new Ginkgo e2e tests. Changes consist of documentation, Docker configuration, and Cypress test infrastructure files only.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add any Ginkgo e2e tests. The repository is TypeScript/JavaScript-based with Cypress tests, not Go-based Ginkgo tests.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains only documentation, Docker configuration, and utility scripts—no Kubernetes deployment manifests or pod scheduling constraints.
Ote Binary Stdout Contract ✅ Passed This PR contains no Go files or OTE binaries that could violate stdout/stderr contracts; check is not applicable.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Custom check for IPv6 and disconnected network test compatibility applies only to new Ginkgo e2e tests in Go. This PR adds no new Go test files, only documentation, Python scripts, shell scripts, Dockerfile, and YAML configuration. The six existing Go files remain unmodified, so check criteria do not apply.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

🧹 Nitpick comments (20)
.claude/commands/cypress/test-development/fixture-schema-reference.md (1)

1-1: Consider adding a trailing newline.

This file (and the other test-development pointer files) lacks a trailing newline at EOF. While this doesn't break functionality for indirection files, POSIX defines a text line as ending with a newline character, and many tools expect this convention.

📝 Proposed fix
-../../../.cursor/commands/fixture-schema-reference.md
+../../../.cursor/commands/fixture-schema-reference.md
+
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-development/fixture-schema-reference.md at
line 1, Add a single trailing newline character to the end of
fixture-schema-reference.md so the file ends with a newline (POSIX text file
convention); simply ensure the last line is terminated with "\n" and commit the
change.
web/scripts/clean-test-artifacts.sh (1)

7-8: Add cleanup for the downloads folder.

The script cleans screenshots/ and videos/ but not downloads/. According to cypress.config.ts, the downloadsFolder is set to './cypress/downloads', which can accumulate files from download-related tests.

📁 Proposed fix to include downloads cleanup
 rm -rf "$SCRIPT_DIR/cypress/screenshots/"*
 rm -rf "$SCRIPT_DIR/cypress/videos/"*
+rm -rf "$SCRIPT_DIR/cypress/downloads/"*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/scripts/clean-test-artifacts.sh` around lines 7 - 8, The cleanup script
clean-test-artifacts.sh currently removes screenshots and videos but omits the
Cypress downloads folder; update the script to also remove contents of the
downloads directory by adding a removal command that targets
"$SCRIPT_DIR/cypress/downloads/"* (use the same pattern as the existing rm -rf
lines), referencing the SCRIPT_DIR variable and matching the downloadsFolder
from cypress.config.ts so downloaded test artifacts are cleaned up.
sandbox-policy.yaml (1)

66-73: Avoid hard-coding ephemeral cluster endpoints in policy

The OpenShift endpoint list is tightly bound to one CI lease/environment, which is likely to expire and break reuse.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sandbox-policy.yaml` around lines 66 - 73, The policy currently hard-codes
ephemeral CI cluster endpoints (the host entries "10.200.0.1",
"*.ci-ln-8bnwfst-76ef8.aws-4.ci.openshift.org", and
"console-openshift-console.apps.ci-ln-8bnwfst-76ef8.aws-4.ci.openshift.org")
which will expire; change these to stable, parameterized values or well-scoped
patterns by removing the specific CI lease names and either use a DNS wildcard
that matches the intended domain scope, reference a templated/variable
placeholder (e.g., a Helm/Kustomize/ENV variable like {{
.Values.openshift_console }} or ${OPENSHIFT_CONSOLE}) or point to cluster-local
service names so the policy is reusable across environments; update the three
host entries in sandbox-policy.yaml accordingly and ensure any callers/CI
manifests set the variable at runtime.
.claude/commands/cypress/test-iteration/diagnose-test-failure.md (2)

111-111: Add language specifier to fenced code block.

The linter flags this code block as missing a language specifier. Since this is a template for diagnosis output in markdown format, add markdown as the language.

📝 Suggested fix
-```
+```markdown
 ## Diagnosis
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md at line
111, Add the missing fenced-code language tag by changing the opening ``` before
the "## Diagnosis" block to include "markdown" (i.e., use ```markdown); locate
the fenced block that contains the "## Diagnosis" heading in
diagnose-test-failure.md and update its opening fence so the linter recognizes
the block language.

78-81: Path prefix inconsistency with other file references.

Line 60 references web/cypress/views/incidents-page.ts (with web/ prefix), but lines 78-81 reference cypress/support/incidents_prometheus_query_mocks/ without the web/ prefix. For consistency and clarity, consider adding the web/ prefix here as well.

📝 Suggested fix
-If the error suggests an API/intercept issue, read relevant files in `cypress/support/incidents_prometheus_query_mocks/`:
-- `prometheus-mocks.ts` — intercept setup and route matching
-- `mock-generators.ts` — response data generation
-- `types.ts` — type definitions for fixtures
+If the error suggests an API/intercept issue, read relevant files in `web/cypress/support/incidents_prometheus_query_mocks/`:
+- `prometheus-mocks.ts` — intercept setup and route matching
+- `mock-generators.ts` — response data generation
+- `types.ts` — type definitions for fixtures
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md around
lines 78 - 81, The documentation has an inconsistent path prefix: earlier it
referenced web/cypress/views/incidents-page.ts but the lines mentioning
cypress/support/incidents_prometheus_query_mocks/ lack the web/ prefix; update
those references (prometheus-mocks.ts, mock-generators.ts, types.ts) to use the
same web/ prefix (i.e., web/cypress/support/incidents_prometheus_query_mocks/)
so all file paths in this guide are consistent and unambiguous.
docs/agentic-development/roadmap/openshell/sandbox-setup-guide.md (1)

87-89: Fix blockquote formatting.

The linter flags a blank line inside the blockquote. These two related notes should be combined into a single blockquote block.

📝 Suggested fix
 > **Note:** `openshell sandbox upload` treats the destination as a directory and places the source file inside it. So the actual file path inside the sandbox will be `/sandbox/adc.json/application_default_credentials.json`.
-
-> **Security tradeoff:** The ADC file contains a refresh token (not raw credentials). The agent inside the sandbox can read it. This is a known limitation — see the Troubleshooting section for alternatives.
+>
+> **Security tradeoff:** The ADC file contains a refresh token (not raw credentials). The agent inside the sandbox can read it. This is a known limitation — see the Troubleshooting section for alternatives.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/agentic-development/roadmap/openshell/sandbox-setup-guide.md` around
lines 87 - 89, Combine the two separate blockquote notes into a single
blockquote to remove the blank line linter error: merge the "**Note:**
`openshell sandbox upload` treats the destination as a directory..." and the
"**Security tradeoff:** The ADC file contains a refresh token..." lines into one
contiguous blockquote (remove the empty line between them) so both sentences
appear under the same blockquote marker and preserve both messages.
.claude/commands/cypress/test-iteration/scripts/notify-slack.py (2)

206-214: Edge case in remaining time calculation.

When remaining is calculated as 0 or negative (deadline reached), the sleep call at line 214 will still execute with sleep(1) due to max(1, remaining). This causes an extra 1-second delay after timeout. Consider checking remaining > 0 before sleeping.

📝 Suggested fix
         remaining = int(deadline - time.time())
-        if remaining > 0:
+        if remaining > poll_interval:
             print(
                 f"No reply yet, {remaining}s remaining...",
                 file=sys.stderr,
                 flush=True,
             )
-
-        time.sleep(min(poll_interval, max(1, remaining)))
+            time.sleep(poll_interval)
+        elif remaining > 0:
+            time.sleep(remaining)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/scripts/notify-slack.py around lines
206 - 214, The current loop computes remaining = int(deadline - time.time()) and
always calls time.sleep(min(poll_interval, max(1, remaining))) which forces a 1s
sleep even when remaining <= 0; update the logic around the print and sleep so
that you only call time.sleep when remaining > 0 (e.g., after the existing if
remaining > 0: print(...) compute sleep_duration = min(poll_interval, remaining)
and then time.sleep(sleep_duration)), using the existing variables remaining,
poll_interval and deadline to locate the code and avoid the extra 1-second delay
after timeout.

97-113: Consider URL scheme validation for webhook.

The webhook_url comes from an environment variable and is passed directly to urllib.request.Request. While this is typically set by administrators, validating the URL scheme provides defense in depth against accidental misconfiguration or malicious injection.

🛡️ Suggested fix
 def send_webhook(webhook_url, blocks):
     """Option A: Send via incoming webhook."""
+    # Validate URL scheme for security
+    if not webhook_url.startswith(("https://", "http://")):
+        print(f"Invalid webhook URL scheme: {webhook_url}", file=sys.stderr)
+        return {"ok": False, "error": "Invalid URL scheme"}
+
     payload = json.dumps({"blocks": blocks}).encode("utf-8")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/scripts/notify-slack.py around lines
97 - 113, The send_webhook function currently accepts webhook_url from env and
passes it directly to urllib.request.Request; add a URL scheme validation step
before constructing the Request: parse webhook_url (e.g., via
urllib.parse.urlparse), ensure scheme is either "https" (preferably) or "http"
and reject any other schemes (file:, javascript:, data:, etc.), and if invalid
log/print a clear error and return {"ok": False, "error": "invalid webhook URL
scheme"} without making the outbound request; keep this validation localized to
send_webhook and include webhook_url in the log only as needed (avoid leaking
secrets).
docs/agentic-development/architecture/test-iteration-system.md (3)

98-101: Path prefix inconsistency.

Lines 98-101 reference paths without the web/ prefix (e.g., cypress/e2e/incidents/), but the actual file paths in the repository include web/ (e.g., web/cypress/e2e/incidents/). This inconsistency could cause confusion when the agent attempts to edit files.

📝 Suggested fix
 Scope — may only edit:
-- `cypress/e2e/incidents/**/*.cy.ts` (test files)
-- `cypress/fixtures/incident-scenarios/*.yaml` (fixtures)
-- `cypress/views/incidents-page.ts` (page object)
-- `cypress/support/incidents_prometheus_query_mocks/**` (mock layer)
+- `web/cypress/e2e/incidents/**/*.cy.ts` (test files)
+- `web/cypress/fixtures/incident-scenarios/*.yaml` (fixtures)
+- `web/cypress/views/incidents-page.ts` (page object)
+- `web/cypress/support/incidents_prometheus_query_mocks/**` (mock layer)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/agentic-development/architecture/test-iteration-system.md` around lines
98 - 101, Update the documented file paths to match the repository layout by
adding the missing "web/" prefix to each listed path: change
cypress/e2e/incidents/**/*.cy.ts to web/cypress/e2e/incidents/**/*.cy.ts,
cypress/fixtures/incident-scenarios/*.yaml to
web/cypress/fixtures/incident-scenarios/*.yaml, cypress/views/incidents-page.ts
to web/cypress/views/incidents-page.ts, and
cypress/support/incidents_prometheus_query_mocks/** to
web/cypress/support/incidents_prometheus_query_mocks/** so the examples in
test-iteration-system.md align with actual repo paths.

156-159: Add language specifier for usage examples.

The linter flags this code block. Use text or bash for command usage examples.

📝 Suggested fix
-```
+```bash
 /cypress:test-iteration:analyze-ci-results ci-url=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/.../{RUN_ID}/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/agentic-development/architecture/test-iteration-system.md` around lines
156 - 159, The fenced code blocks containing the example commands starting with
"/cypress:test-iteration:analyze-ci-results ci-url=..." are missing a language
specifier; update each block to use a language tag like bash (e.g., replace the
opening "```" with "```bash" or "```text") and ensure the closing "```" remains,
so the examples render and pass the linter.

14-50: Add language specifier for ASCII diagram.

The linter flags this code block. For ASCII diagrams, use text as the language specifier.

📝 Suggested fix
-```
+```text
 User: /cypress:test-iteration:iterate-incident-tests target=regression max-iterations=3
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/agentic-development/architecture/test-iteration-system.md` around lines
14 - 50, The ASCII diagram block beginning with "User:
/cypress:test-iteration:iterate-incident-tests target=regression
max-iterations=3" needs a language specifier to satisfy the linter; change the
opening fence from ``` to ```text so the block reads ```text ... ``` (leave the
content and closing fence unchanged) to mark it as plain text ASCII diagram.
docs/agentic-development/setup/docker-sandbox-guide.md (1)

115-117: Add language specifier to error message code block.

The linter flags this code block as missing a language specifier. Use text or plaintext for error message output.

📝 Suggested fix
-```
+```text
 ERROR: GCP ADC credentials not found at ...
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/agentic-development/setup/docker-sandbox-guide.md around lines 115 -
117, Update the fenced code block that contains the GCP ADC error message to
include a language specifier (e.g., use text or plaintext) so the linter
recognizes it as a plain text code block; locate the block in
docs/agentic-development/setup/docker-sandbox-guide.md that currently contains
"ERROR: GCP ADC credentials not found at ..." and change the opening fence to
include the language specifier.


</details>

</blockquote></details>
<details>
<summary>.claude/commands/cypress/test-iteration/iterate-ci-flaky.md (2)</summary><blockquote>

`142-148`: **Add language specifiers to code blocks.**

Multiple code blocks are flagged by the linter for missing language specifiers. These template/output blocks should use appropriate language identifiers.


<details>
<summary>📝 Suggested fixes</summary>

Line 142: Use `text` for the PR summary template
```diff
-```
+```text
 PR #{pr}: {title}
```

Line 156: Use `text` for the stability context template
```diff
-```
+```text
 Stability context (from {N} previous runs):
```

Line 345: Use `text` for the stability report
```diff
-```
+```text
 Stability Report:
```

Line 354: Use `markdown` for the final report template
```diff
-```
+```markdown
 # CI Flaky Test Iteration Report
```

Line 409: Use `markdown` for the table row template
```diff
-```
+```markdown
 | {next_number} | {YYYY-MM-DD} | ci | {branch} | {total_tests} | {passed} | {failed} | {flaky} | {commit_sha} |
```
</details>


Also applies to: 156-165, 345-350, 354-389, 409-411

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/iterate-ci-flaky.md around lines 142

  • 148, Several fenced code blocks in the CI flaky iteration template are missing
    language specifiers; update the code fence headers for the identified templates
    so the linter stops flagging them — specifically, add text to the "PR #{pr}: {title}" PR summary block and to the "Stability context (from {N} previous runs):" and "Stability Report:" blocks, and add markdown to the final report
    header "# CI Flaky Test Iteration Report" and the table row template "|
    {next_number} | {YYYY-MM-DD} | ci | {branch} | {total_tests} | {passed} |
    {failed} | {flaky} | {commit_sha} |"; ensure the same change is applied to the
    other occurrences referenced (lines 156-165, 345-350, 354-389, 409-411) so each
    triple-backtick fence includes the appropriate language identifier.

</details>

---

`269-270`: **Path prefix inconsistency with repository structure.**

The file paths listed here (e.g., `cypress/e2e/incidents/**`) should include the `web/` prefix to match the actual repository structure (e.g., `web/cypress/e2e/incidents/**`). This inconsistency appears in multiple documentation files in this PR.


<details>
<summary>📝 Suggested fix</summary>

```diff
    - May edit: `cypress/e2e/incidents/**`, `cypress/fixtures/incident-scenarios/**`, `cypress/views/incidents-page.ts`, `cypress/support/incidents_prometheus_query_mocks/**`
+   - May edit: `web/cypress/e2e/incidents/**`, `web/cypress/fixtures/incident-scenarios/**`, `web/cypress/views/incidents-page.ts`, `web/cypress/support/incidents_prometheus_query_mocks/**`
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/iterate-ci-flaky.md around lines 269
- 270, Update the path prefixes in the listed lines of
.claude/commands/cypress/test-iteration/iterate-ci-flaky.md so they match the
repo layout by prepending "web/" to each path: change "cypress/e2e/incidents/**"
to "web/cypress/e2e/incidents/**", "cypress/fixtures/incident-scenarios/**" to
"web/cypress/fixtures/incident-scenarios/**", "cypress/views/incidents-page.ts"
to "web/cypress/views/incidents-page.ts",
"cypress/support/incidents_prometheus_query_mocks/**" to
"web/cypress/support/incidents_prometheus_query_mocks/**", and any other listed
entries such as "src/**" or cypress config mentions to "web/src/**" or
"web/<cypress-config-path>" as appropriate so all documented paths consistently
include the web/ prefix.
```

</details>

</blockquote></details>
<details>
<summary>.claude/commands/cypress/test-iteration/scripts/review-github.py (1)</summary><blockquote>

`185-186`: **Unused variable `created_at`.**

The `created_at` value is unpacked but never used. Prefix with underscore to indicate it's intentionally ignored.


<details>
<summary>📝 Suggested fix</summary>

```diff
-    comment_id, created_at = post_comment(args.pr, formatted, args.repo)
+    comment_id, _created_at = post_comment(args.pr, formatted, args.repo)
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/scripts/review-github.py around
lines 185 - 186, The tuple unpacking from post_comment assigns comment_id and
created_at but created_at is unused; update the assignment in the post_comment
call site (the line assigning comment_id, created_at) to ignore the unused value
by renaming it to _created_at (or simply _), e.g., change "comment_id,
created_at = post_comment(...)" to "comment_id, _created_at = post_comment(...)"
so linters understand the value is intentionally ignored while leaving
post_comment and comment_id usage intact.
```

</details>

</blockquote></details>
<details>
<summary>.claude/commands/cypress/test-iteration/analyze-ci-results.md (3)</summary><blockquote>

`41-54`: **Specify error handling for parallel metadata fetches.**

Step 2 mentions fetching three files in parallel using WebFetch but doesn't specify what should happen if one or more fetches fail. Missing metadata files could indicate an incomplete CI run or incorrect URL.



Consider adding error handling guidance:
- If `started.json` or `finished.json` is missing → likely incorrect URL or incomplete run, report and stop
- If `prowjob.json` is missing → continue but note that PR metadata will be incomplete
- Document which fields are critical vs. optional for downstream analysis

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/analyze-ci-results.md around lines
41 - 54, Add explicit error handling when using WebFetch in Step 2 for
started.json, finished.json, and prowjob.json: treat started.json and
finished.json fetch failures as fatal (log an error that the job root/URL may be
wrong and abort analysis), treat prowjob.json failure as non-fatal (continue but
mark PR metadata incomplete), and surface which fields are critical (from
started.json: timestamp, pull, repos base/PR SHAs; from finished.json: passed,
result, revision) versus optional (prowjob.json fields like PR
title/author/branch/cluster/duration). Ensure the implementation of the parallel
fetch routine records any missing files, returns a clear error for fatal missing
files, and annotates downstream data with a flag when prowjob.json is absent.
```

</details>

---

`31-33`: **Add URL validation before normalization.**

The URL normalization logic assumes well-formed input but doesn't validate the URL pattern before processing. Malformed URLs could cause issues in subsequent WebFetch operations.



<details>
<summary>💡 Suggested validation step</summary>

Add a validation step before normalization:

```markdown
### Step 1: Normalize the URL

**First, validate the input URL:**

The URL must match one of these patterns:
- `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/...`
- `https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/...`

If the URL doesn't match, report an error: "Invalid CI URL. Expected a gcsweb or Prow URL for an OpenShift monitoring-plugin PR."

**Then normalize to the job root:**
[existing content continues...]
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/analyze-ci-results.md around lines
31 - 33, Add a validation step before the existing "Normalize the URL" logic:
ensure the input matches either the gcsweb pattern
(https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/...)
or the Prow pattern
(https://prow.ci.openshift.org/view/gs/test-platform-results/...), and if not,
return/report the error "Invalid CI URL. Expected a gcsweb or Prow URL for an
OpenShift monitoring-plugin PR." After validation, continue with the existing
normalization (replace the Prow prefix with the gcsweb prefix and append a
trailing slash if missing); hook this validation into the same flow that runs
the URL normalization step so subsequent WebFetch calls only receive validated,
normalized URLs.
```

</details>

---

`27-27`: **Add language specifiers to fenced code blocks.**

Multiple fenced code blocks lack language specifiers, which reduces syntax highlighting and editor support. Consider adding appropriate language tags (e.g., `text`, `bash`, `json`, `markdown`) to improve readability.



<details>
<summary>📝 Example fixes for some blocks</summary>

Line 27-29 (URL pattern):
```diff
-```
+```text
 https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/{PR}/{JOB}/{RUN_ID}/
 ```
```

Lines 56-66 (summary output):
```diff
-```
+```text
 CI Run Summary:
   PR:          #{PR_NUMBER} - {PR_TITLE}
 ```
```

Lines 192-197 (git correlation output):
```diff
-```
+```text
 Commit correlation for {test_name}:
 ```
```

Lines 211-261 (final report):
```diff
-```
+```markdown
 # CI Analysis Report
 ```
```
</details>


Also applies to: 56-56, 77-77, 94-94, 106-106, 117-117, 133-133, 192-192, 200-200, 211-211

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/analyze-ci-results.md at line 27,
Several fenced code blocks in
.claude/commands/cypress/test-iteration/analyze-ci-results.md lack language
specifiers; add appropriate tags (e.g., text, bash, json, markdown) to each
block so editors can syntax-highlight them. Locate the unlabeled blocks
referenced in the comment such as the "URL pattern" block (lines showing the GCS
URL), the "CI Run Summary" block, the "Commit correlation for {test_name}"
block, and the final "# CI Analysis Report" block and update their opening
fences to include the correct language (suggest text for plain output blocks and
markdown for the report header) and similarly annotate the other listed blocks
(56, 77, 94, 106, 117, 133, 192, 200, 211) with appropriate specifiers.
```

</details>

</blockquote></details>
<details>
<summary>.claude/commands/cypress/test-iteration/iterate-incident-tests.md (2)</summary><blockquote>

`136-159`: **Consider explicit state tracking for the iteration loop.**

The workflow manages complex state across multiple iterations (failure lists, fix attempts, retry counts, stability data) but doesn't specify how this state is persisted between steps. This could lead to state loss if the agent is interrupted or crashes.



Consider adding:

1. **A state file** (e.g., `web/cypress/.iteration-state.json`) that tracks:
   ```json
   {
     "iteration": 2,
     "failures": [...],
     "fix_attempts": {...},
     "retry_counts": {...},
     "commit_hashes": [...]
   }
   ```

2. **State checkpoints** after each major step (diagnosis complete, fixes applied, validation complete)

3. **Resume capability**: If the workflow is interrupted, the agent can resume from the last checkpoint instead of starting over

This is especially important for the autonomous "run in background" execution model mentioned in Step 5 (line 193).


Also applies to: 199-228, 435-484

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around
lines 136 - 159, Add explicit iteration state persistence by writing and
checkpointing a JSON state file (e.g., web/cypress/.iteration-state.json) that
captures stability_context, iteration number, failures, fix_attempts,
retry_counts, and commit_hashes after each major step (diagnosis, fixes applied,
validation); implement save_state() and load_state() helpers used wherever
stability_context is produced or consumed (Step 2) and around the loop that runs
in background (Step 5) so the workflow can resume from the last checkpoint if
interrupted, and ensure resume logic checks for an existing state file and
restores iteration/failure lists before continuing (used by the Diagnosis Agents
in Step 8).
```

</details>

---

`142-142`: **Add language specifiers to fenced code blocks.**

Multiple fenced code blocks lack language specifiers, reducing readability and editor support.



<details>
<summary>📝 Suggested language tags</summary>

- Line 142: `text` (stability context output)
- Line 209: `typescript` or `javascript` (test result object structure)
- Line 225: `text` (mochawesome suite structure)
- Line 239: `text` (screenshot naming pattern)
- Line 280: `markdown` (fix constraints heading)
- Line 341: `text` (commit message template)
- Line 377: `text` (flakiness report)
- Line 401: `markdown` (final report)
- Line 448: `text` (table row format)
</details>


Also applies to: 209-209, 225-225, 239-239, 280-280, 341-341, 377-377, 401-401, 448-448

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md at line
142, Update the fenced code blocks in
.claude/commands/cypress/test-iteration/iterate-incident-tests.md to include
proper language specifiers: add `text` to the stability context output block,
`typescript` or `javascript` to the test result object structure block, `text`
to the mochawesome suite structure and screenshot naming pattern blocks,
`markdown` to the fix constraints and final report headings, and `text` to the
commit message template, flakiness report, and table row format blocks so
editors and linters can correctly highlight and validate those snippets.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/commands/cypress/setup.md:

  • Around line 23-26: The docs change directory with the line "cd web/cypress"
    but later commands still reference "web/cypress/export-env.sh"; update those
    occurrences (e.g., the commands at the locations referencing
    "web/cypress/export-env.sh" around Lines 49-50 and 79-80) to reference
    "export-env.sh" (or "./export-env.sh") so they correctly target the script
    relative to the already-changed working directory and avoid sourcing/writing the
    wrong path.
  • Around line 51-53: The exported credential lines in export-env.sh
    (CYPRESS_BASE_URL, CYPRESS_LOGIN_IDP, CYPRESS_LOGIN_USERS) persist kubeadmin
    credentials in plaintext; update export-env.sh to add an explicit “DO NOT COMMIT
    / secrets” header, enforce restrictive file permissions (e.g., create or
    instruct to run chmod 600 on the file), and add a .gitignore entry to ignore
    this file by name; also include a short comment explaining the secure workflow
    (temporary file only, remove after use) and a one-line check in the script that
    exits if the file is tracked in git to prevent accidental commits.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md:

  • Around line 90-93: Update the "3. Unsigned Commits" section in
    iterate-incident-tests.md to document security implications and prescribe a
    mandatory pre-merge validation step: add language in the "final report"
    instructions that requires the user to review every generated commit, verify no
    malicious changes, and sign the squash-merge with their GPG key; also add a
    short threat-model paragraph describing risks if an agent introduces backdoor
    code and attribution difficulties for unsigned commits; finally, note
    alternative mitigations (use a dedicated agent signing key, require human
    approval per commit batch, or run in the Docker sandbox with an automated
    commit-verification step before any push) and call out the Docker sandbox as the
    recommended place to implement commit verification.
  • Around line 80-81: Replace the hard-coded absolute paths in the permission and
    screenshot commands: change the "Bash(cd
    /home/drajnoha/Code/monitoring-plugin:)" and "Bash(find
    /home/drajnoha/Code/monitoring-plugin/web/cypress:
    )" entries to use a
    repository-root variable (e.g. "${REPO_ROOT}") or relative paths (e.g. "cd
    ${REPO_ROOT}" or "find web/cypress/screenshots -name "*.png" -type f" when
    running from the repo root or "find ./web/cypress/screenshots ..." from the web/
    working dir); also add brief docs noting that REPO_ROOT must be set in the
    environment or CI before running the commands.
  • Around line 99-502: Add explicit execution limits to the orchestrator: define
    and check a global wall-clock timeout (e.g., max_wall_clock_time) at the top of
    the main orchestration loop (Steps 0–15) and abort with a clear message when
    exceeded; enforce a per-test/run timeout by passing/integrating a configured
    per_test_timeout into the Cypress invocation (the source cypress/export-env.sh && npx cypress run ... call in Step 5) and into any retry loops; track total
    test executions with a total_run_budget counter (used in Step 12/Step 13) and
    stop spawning Diagnosis/Fix agents when exceeded; implement a circuit_breaker
    threshold (e.g., circuit_breaker_failure_rate) observed after the first
    iteration (Step 5/Step 6) to pause for human review if >50% failures; surface
    these config values (max_wall_clock_time, per_test_timeout, total_run_budget,
    circuit_breaker_threshold) near the orchestrator entry and check them before
    each major loop/agent spawn.

In @.claude/commands/cypress/test-iteration/scripts/poll-ci-status.py:

  • Around line 89-90: The parsing of attempts and interval from sys.argv in
    poll-ci-status.py (variables attempts and interval) lacks validation and can
    allow zero/negative or non-integer values; update the code to parse inside
    try/except to catch ValueError, coerce to int, and enforce positive bounds
    (e.g., attempts >= 1 and interval >= 1), falling back to the existing defaults
    (30 and 300) or exiting with a clear error if values are invalid; ensure both
    the attempts and interval checks are applied after parsing so negative/zero
    values are clamped or rejected before they are used for sleeping/polling.
  • Around line 54-68: When scanning checks (the checks list) for job_substring,
    don't break on the first match; instead filter all entries with
    check.get("name","") containing job_substring, pick the most recent run (e.g.,
    by check.get("completed_at") or fallback to check.get("created_at") or id) and
    evaluate that one; treat terminal_states =
    ("SUCCESS","FAILURE","CANCELLED","TIMED_OUT","SKIPPED") as CI_COMPLETE (print
    CI_COMPLETE state=... url=...), return 0 only for "SUCCESS" and return non-zero
    for other terminal states, and remove the current break so the loop continues
    polling when no terminal state is found.

In @docs/agentic-development/roadmap/openshell/sandbox-bun-crash-report.md:

  • Around line 64-65: Update product name capitalization in the sandbox docs:
    change lowercase "claude" to the official "Claude" in the phrase "Create a
    claude provider and use -- claude flag" (and any other occurrences in this
    section), and ensure "GitHub" is capitalized where referenced (notably the
    occurrences around lines 84–85); make these edits for consistency across the
    file while preserving the surrounding text and formatting.
  • Around line 29-41: The unlabeled fenced code block containing the Bun crash
    log should be marked with a language tag text to satisfy MD040; locate the
    triple-backtick block that starts with the Bun crash output (the fenced crash
    log block shown in the diff) and change the opening fence from totext so
    the block is recognized as plain text by the linter.

In @docs/agentic-development/roadmap/test-iteration-ideas.md:

  • Line 107: The heading "## Test Stability Ledger" is indented and triggers
    markdownlint MD023; remove any leading spaces before the "## Test Stability
    Ledger" line so the heading is left-aligned (start the line with "##"
    immediately) to satisfy the lint rule and preserve the exact heading text.
  • Around line 148-150: Several fenced code blocks are missing language labels
    (MD040); update each unlabeled triple-backtick block (notably the block
    containing "Push fix ──→ [PAUSE: fix_applied] ──→ CI runs (~2h) ──→ [PAUSE:
    ci_complete] ──→ Analyze") to include an explicit language: use text for message
    examples and bash for shell/command examples, and apply the same labeling to the
    other unlabeled blocks called out in the review so all code fences specify
    either text or bash as appropriate.

In @sandbox-policy.yaml:

  • Around line 11-16: The policy grants broad write access under the read_write
    list by including "/home"; narrow this to the minimal required home directory
    (e.g. replace "/home" with "/home/sandbox" or another specific user path) so the
    read_write entry reduces blast radius; update the read_write array to remove
    "/home" and add the exact allowed path (refer to the read_write list and the
    "/home" entry).

In @sandbox-start.sh:

  • Line 15: Replace the risky global TLS bypass "git config --global
    http.sslVerify false" with a scoped change or remove it: either delete that line
    entirely, or change it to a repository-scoped command (e.g., use "git -C
    config --local http.sslVerify false" or run the config only in a temporary CI
    workspace) and add a comment explaining why it's needed and ensure you revert or
    limit its scope; reference the exact token "git config --global http.sslVerify
    false" when updating the script.
  • Line 5: Replace the ephemeral install of @anthropic-ai/claude-code@latest in
    the sandbox-start.sh npm install command with a pinned semver (e.g.,
    @anthropic-ai/claude-code@) to ensure reproducible sandbox runs;
    locate the npm install line that currently contains
    "@anthropic-ai/claude-code@latest" and change it to a fixed version (preferably
    the same version used elsewhere in the repo, such as in web/package.json), then
    commit the updated script.

In @sandbox/Dockerfile:

  • Around line 29-30: The RUN that streams openshift-client-linux.tar.gz into tar
    should first download and verify the artifact integrity: fetch both the tarball
    and its corresponding checksum/signature for openshift-client-linux.tar.gz,
    verify the checksum (e.g., SHA256) or validate the signature before extraction,
    and only proceed to extract oc and kubectl from openshift-client-linux.tar.gz if
    verification succeeds; update the Dockerfile around the RUN that references
    openshift-client-linux.tar.gz to perform the download, verification, and
    conditional extraction steps so the build fails on checksum/signature mismatch.
  • Line 33: The Dockerfile RUN that installs @anthropic-ai/claude-code currently
    uses the floating tag "@latest" which breaks reproducible image builds; change
    the RUN line installing @anthropic-ai/claude-code to pin a specific version (or
    use a build ARG like CLAUDE_CODE_VERSION and reference it in the RUN) so the
    installation uses a fixed version instead of "@latest"—update the RUN command
    that mentions "@anthropic-ai/claude-code@latest" accordingly.

In @sandbox/run.sh:

  • Line 69: Update the printed docs path in sandbox/run.sh: the echo statement
    referencing "docs/docker-sandbox-blast-radius.md" should point to the actual
    documentation location under
    "docs/agentic-development/architecture/security/docker-sandbox-blast-radius.md"
    so users are directed to the existing file; locate the echo call in run.sh and
    replace the old relative path with the full path above.
  • Around line 47-52: CURRENT_BRANCH may be empty when the repo is in a detached
    HEAD, causing git clone --branch "" to fail; update the script to detect an
    empty CURRENT_BRANCH (the variable computed by CURRENT_BRANCH=$(git -C
    "$PROJECT_ROOT" branch --show-current)) and provide a fallback: either resolve
    the repo's default branch (e.g., query origin's HEAD or use a sensible default
    like "main"/"master") and assign that to CURRENT_BRANCH, or exit early with a
    clear error message before running git clone --single-branch --branch
    "$CURRENT_BRANCH" so the subsequent git clone call never receives an empty
    branch name.
  • Around line 73-84: The docker run line uses an unquoted expansion
    ${KUBECONFIG_MOUNT:-} which can word-split paths with spaces; replace it with an
    arguments array: build a shell array (e.g. DOCKER_OPTS or MOUNTS) and
    conditionally append the KUBECONFIG_MOUNT value only when non-empty, then call
    docker run with "${DOCKER_OPTS[@]}" between the other fixed flags so the
    KUBECONFIG_MOUNT is passed as a single safe argument; update the docker run
    invocation that references CONTAINER_NAME and IMAGE_NAME to use the array
    element expansion instead of the unquoted ${KUBECONFIG_MOUNT:-}.

Nitpick comments:
In @.claude/commands/cypress/test-development/fixture-schema-reference.md:

  • Line 1: Add a single trailing newline character to the end of
    fixture-schema-reference.md so the file ends with a newline (POSIX text file
    convention); simply ensure the last line is terminated with "\n" and commit the
    change.

In @.claude/commands/cypress/test-iteration/analyze-ci-results.md:

  • Around line 41-54: Add explicit error handling when using WebFetch in Step 2
    for started.json, finished.json, and prowjob.json: treat started.json and
    finished.json fetch failures as fatal (log an error that the job root/URL may be
    wrong and abort analysis), treat prowjob.json failure as non-fatal (continue but
    mark PR metadata incomplete), and surface which fields are critical (from
    started.json: timestamp, pull, repos base/PR SHAs; from finished.json: passed,
    result, revision) versus optional (prowjob.json fields like PR
    title/author/branch/cluster/duration). Ensure the implementation of the parallel
    fetch routine records any missing files, returns a clear error for fatal missing
    files, and annotates downstream data with a flag when prowjob.json is absent.
  • Around line 31-33: Add a validation step before the existing "Normalize the
    URL" logic: ensure the input matches either the gcsweb pattern
    (https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/...)
    or the Prow pattern
    (https://prow.ci.openshift.org/view/gs/test-platform-results/...), and if not,
    return/report the error "Invalid CI URL. Expected a gcsweb or Prow URL for an
    OpenShift monitoring-plugin PR." After validation, continue with the existing
    normalization (replace the Prow prefix with the gcsweb prefix and append a
    trailing slash if missing); hook this validation into the same flow that runs
    the URL normalization step so subsequent WebFetch calls only receive validated,
    normalized URLs.
  • Line 27: Several fenced code blocks in
    .claude/commands/cypress/test-iteration/analyze-ci-results.md lack language
    specifiers; add appropriate tags (e.g., text, bash, json, markdown) to each
    block so editors can syntax-highlight them. Locate the unlabeled blocks
    referenced in the comment such as the "URL pattern" block (lines showing the GCS
    URL), the "CI Run Summary" block, the "Commit correlation for {test_name}"
    block, and the final "# CI Analysis Report" block and update their opening
    fences to include the correct language (suggest text for plain output blocks and
    markdown for the report header) and similarly annotate the other listed blocks
    (56, 77, 94, 106, 117, 133, 192, 200, 211) with appropriate specifiers.

In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md:

  • Line 111: Add the missing fenced-code language tag by changing the opening before the "## Diagnosis" block to include "markdown" (i.e., usemarkdown);
    locate the fenced block that contains the "## Diagnosis" heading in
    diagnose-test-failure.md and update its opening fence so the linter recognizes
    the block language.
  • Around line 78-81: The documentation has an inconsistent path prefix: earlier
    it referenced web/cypress/views/incidents-page.ts but the lines mentioning
    cypress/support/incidents_prometheus_query_mocks/ lack the web/ prefix; update
    those references (prometheus-mocks.ts, mock-generators.ts, types.ts) to use the
    same web/ prefix (i.e., web/cypress/support/incidents_prometheus_query_mocks/)
    so all file paths in this guide are consistent and unambiguous.

In @.claude/commands/cypress/test-iteration/iterate-ci-flaky.md:

  • Around line 142-148: Several fenced code blocks in the CI flaky iteration
    template are missing language specifiers; update the code fence headers for the
    identified templates so the linter stops flagging them — specifically, add
context (from {N} previous runs):" and "Stability Report:" blocks, and add
```markdown to the final report header "# CI Flaky Test Iteration Report" and
the table row template "| {next_number} | {YYYY-MM-DD} | ci | {branch} |
{total_tests} | {passed} | {failed} | {flaky} | {commit_sha} |"; ensure the same
change is applied to the other occurrences referenced (lines 156-165, 345-350,
354-389, 409-411) so each triple-backtick fence includes the appropriate
language identifier.
- Around line 269-270: Update the path prefixes in the listed lines of
.claude/commands/cypress/test-iteration/iterate-ci-flaky.md so they match the
repo layout by prepending "web/" to each path: change "cypress/e2e/incidents/**"
to "web/cypress/e2e/incidents/**", "cypress/fixtures/incident-scenarios/**" to
"web/cypress/fixtures/incident-scenarios/**", "cypress/views/incidents-page.ts"
to "web/cypress/views/incidents-page.ts",
"cypress/support/incidents_prometheus_query_mocks/**" to
"web/cypress/support/incidents_prometheus_query_mocks/**", and any other listed
entries such as "src/**" or cypress config mentions to "web/src/**" or
"web/<cypress-config-path>" as appropriate so all documented paths consistently
include the web/ prefix.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md:
- Around line 136-159: Add explicit iteration state persistence by writing and
checkpointing a JSON state file (e.g., web/cypress/.iteration-state.json) that
captures stability_context, iteration number, failures, fix_attempts,
retry_counts, and commit_hashes after each major step (diagnosis, fixes applied,
validation); implement save_state() and load_state() helpers used wherever
stability_context is produced or consumed (Step 2) and around the loop that runs
in background (Step 5) so the workflow can resume from the last checkpoint if
interrupted, and ensure resume logic checks for an existing state file and
restores iteration/failure lists before continuing (used by the Diagnosis Agents
in Step 8).
- Line 142: Update the fenced code blocks in
.claude/commands/cypress/test-iteration/iterate-incident-tests.md to include
proper language specifiers: add `text` to the stability context output block,
`typescript` or `javascript` to the test result object structure block, `text`
to the mochawesome suite structure and screenshot naming pattern blocks,
`markdown` to the fix constraints and final report headings, and `text` to the
commit message template, flakiness report, and table row format blocks so
editors and linters can correctly highlight and validate those snippets.

In @.claude/commands/cypress/test-iteration/scripts/notify-slack.py:
- Around line 206-214: The current loop computes remaining = int(deadline -
time.time()) and always calls time.sleep(min(poll_interval, max(1, remaining)))
which forces a 1s sleep even when remaining <= 0; update the logic around the
print and sleep so that you only call time.sleep when remaining > 0 (e.g., after
the existing if remaining > 0: print(...) compute sleep_duration =
min(poll_interval, remaining) and then time.sleep(sleep_duration)), using the
existing variables remaining, poll_interval and deadline to locate the code and
avoid the extra 1-second delay after timeout.
- Around line 97-113: The send_webhook function currently accepts webhook_url
from env and passes it directly to urllib.request.Request; add a URL scheme
validation step before constructing the Request: parse webhook_url (e.g., via
urllib.parse.urlparse), ensure scheme is either "https" (preferably) or "http"
and reject any other schemes (file:, javascript:, data:, etc.), and if invalid
log/print a clear error and return {"ok": False, "error": "invalid webhook URL
scheme"} without making the outbound request; keep this validation localized to
send_webhook and include webhook_url in the log only as needed (avoid leaking
secrets).

In @.claude/commands/cypress/test-iteration/scripts/review-github.py:
- Around line 185-186: The tuple unpacking from post_comment assigns comment_id
and created_at but created_at is unused; update the assignment in the
post_comment call site (the line assigning comment_id, created_at) to ignore the
unused value by renaming it to _created_at (or simply _), e.g., change
"comment_id, created_at = post_comment(...)" to "comment_id, _created_at =
post_comment(...)" so linters understand the value is intentionally ignored
while leaving post_comment and comment_id usage intact.

In `@docs/agentic-development/architecture/test-iteration-system.md`:
- Around line 98-101: Update the documented file paths to match the repository
layout by adding the missing "web/" prefix to each listed path: change
cypress/e2e/incidents/**/*.cy.ts to web/cypress/e2e/incidents/**/*.cy.ts,
cypress/fixtures/incident-scenarios/*.yaml to
web/cypress/fixtures/incident-scenarios/*.yaml, cypress/views/incidents-page.ts
to web/cypress/views/incidents-page.ts, and
cypress/support/incidents_prometheus_query_mocks/** to
web/cypress/support/incidents_prometheus_query_mocks/** so the examples in
test-iteration-system.md align with actual repo paths.
- Around line 156-159: The fenced code blocks containing the example commands
starting with "/cypress:test-iteration:analyze-ci-results ci-url=..." are
missing a language specifier; update each block to use a language tag like bash
(e.g., replace the opening "```" with "```bash" or "```text") and ensure the
closing "```" remains, so the examples render and pass the linter.
- Around line 14-50: The ASCII diagram block beginning with "User:
/cypress:test-iteration:iterate-incident-tests target=regression
max-iterations=3" needs a language specifier to satisfy the linter; change the
opening fence from ``` to ```text so the block reads ```text ... ``` (leave the
content and closing fence unchanged) to mark it as plain text ASCII diagram.

In `@docs/agentic-development/roadmap/openshell/sandbox-setup-guide.md`:
- Around line 87-89: Combine the two separate blockquote notes into a single
blockquote to remove the blank line linter error: merge the "**Note:**
`openshell sandbox upload` treats the destination as a directory..." and the
"**Security tradeoff:** The ADC file contains a refresh token..." lines into one
contiguous blockquote (remove the empty line between them) so both sentences
appear under the same blockquote marker and preserve both messages.

In `@docs/agentic-development/setup/docker-sandbox-guide.md`:
- Around line 115-117: Update the fenced code block that contains the GCP ADC
error message to include a language specifier (e.g., use ```text or
```plaintext) so the linter recognizes it as a plain text code block; locate the
block in docs/agentic-development/setup/docker-sandbox-guide.md that currently
contains "ERROR: GCP ADC credentials not found at ..." and change the opening
fence to include the language specifier.

In `@sandbox-policy.yaml`:
- Around line 66-73: The policy currently hard-codes ephemeral CI cluster
endpoints (the host entries "10.200.0.1",
"*.ci-ln-8bnwfst-76ef8.aws-4.ci.openshift.org", and
"console-openshift-console.apps.ci-ln-8bnwfst-76ef8.aws-4.ci.openshift.org")
which will expire; change these to stable, parameterized values or well-scoped
patterns by removing the specific CI lease names and either use a DNS wildcard
that matches the intended domain scope, reference a templated/variable
placeholder (e.g., a Helm/Kustomize/ENV variable like {{
.Values.openshift_console }} or ${OPENSHIFT_CONSOLE}) or point to cluster-local
service names so the policy is reusable across environments; update the three
host entries in sandbox-policy.yaml accordingly and ensure any callers/CI
manifests set the variable at runtime.

In `@web/scripts/clean-test-artifacts.sh`:
- Around line 7-8: The cleanup script clean-test-artifacts.sh currently removes
screenshots and videos but omits the Cypress downloads folder; update the script
to also remove contents of the downloads directory by adding a removal command
that targets "$SCRIPT_DIR/cypress/downloads/"* (use the same pattern as the
existing rm -rf lines), referencing the SCRIPT_DIR variable and matching the
downloadsFolder from cypress.config.ts so downloaded test artifacts are cleaned
up.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: ae39984a-7f8b-449e-9358-5243dbe0f792

📥 Commits

Reviewing files that changed from the base of the PR and between 0234fa4 and fad1ad6.

📒 Files selected for processing (32)
  • .claude/commands/cypress/cypress-setup.md
  • .claude/commands/cypress/run.md
  • .claude/commands/cypress/setup.md
  • .claude/commands/cypress/test-development/fixture-schema-reference.md
  • .claude/commands/cypress/test-development/generate-incident-fixture.md
  • .claude/commands/cypress/test-development/generate-regression-test.md
  • .claude/commands/cypress/test-development/refactor-regression-test.md
  • .claude/commands/cypress/test-development/validate-incident-fixtures.md
  • .claude/commands/cypress/test-iteration/analyze-ci-results.md
  • .claude/commands/cypress/test-iteration/diagnose-test-failure.md
  • .claude/commands/cypress/test-iteration/iterate-ci-flaky.md
  • .claude/commands/cypress/test-iteration/iterate-incident-tests.md
  • .claude/commands/cypress/test-iteration/scripts/notify-slack.py
  • .claude/commands/cypress/test-iteration/scripts/poll-ci-status.py
  • .claude/commands/cypress/test-iteration/scripts/review-github.py
  • .claude/commands/fixture-schema-reference.md
  • .claude/commands/generate-incident-fixture.md
  • .claude/commands/generate-regression-test.md
  • .claude/commands/refactor-regression-test.md
  • .claude/commands/validate-incident-fixtures.md
  • docs/agentic-development/architecture/security/docker-sandbox-blast-radius.md
  • docs/agentic-development/architecture/test-iteration-system.md
  • docs/agentic-development/roadmap/openshell/sandbox-bun-crash-report.md
  • docs/agentic-development/roadmap/openshell/sandbox-setup-guide.md
  • docs/agentic-development/roadmap/test-iteration-ideas.md
  • docs/agentic-development/setup/docker-sandbox-guide.md
  • sandbox-policy.yaml
  • sandbox-start.sh
  • sandbox/Dockerfile
  • sandbox/run.sh
  • web/cypress/reports/test-stability.md
  • web/scripts/clean-test-artifacts.sh
💤 Files with no reviewable changes (6)
  • .claude/commands/refactor-regression-test.md
  • .claude/commands/generate-regression-test.md
  • .claude/commands/fixture-schema-reference.md
  • .claude/commands/generate-incident-fixture.md
  • .claude/commands/validate-incident-fixtures.md
  • .claude/commands/cypress/cypress-setup.md

Comment on lines +23 to +26
1. **Navigate to the cypress directory**:
```bash
cd web/cypress
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Path flow is inconsistent after cd web/cypress

After Line 25, commands in Lines 49 and 79 should target export-env.sh directly, not web/cypress/export-env.sh. Current instructions can write/source the wrong location.

🧩 Proposed fix
-cat > web/cypress/export-env.sh << 'ENVEOF'
+cat > export-env.sh << 'ENVEOF'
...
-source web/cypress/export-env.sh
+source export-env.sh

Also applies to: 49-50, 79-80

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/setup.md around lines 23 - 26, The docs change
directory with the line "cd web/cypress" but later commands still reference
"web/cypress/export-env.sh"; update those occurrences (e.g., the commands at the
locations referencing "web/cypress/export-env.sh" around Lines 49-50 and 79-80)
to reference "export-env.sh" (or "./export-env.sh") so they correctly target the
script relative to the already-changed working directory and avoid
sourcing/writing the wrong path.

Comment on lines +51 to +53
export CYPRESS_BASE_URL='<console-url>'
export CYPRESS_LOGIN_IDP='kube:admin'
export CYPRESS_LOGIN_USERS='kubeadmin:<password>'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add explicit secret-handling safeguards for export-env.sh

These lines persist kubeadmin credentials in plaintext. Add guardrails: restrictive permissions, ignore-from-git, and an explicit “do not commit” warning.

🔐 Proposed addition
+chmod 600 export-env.sh
+# IMPORTANT: export-env.sh contains secrets. Do not commit it.
+# Ensure it is listed in .gitignore.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/setup.md around lines 51 - 53, The exported
credential lines in export-env.sh (CYPRESS_BASE_URL, CYPRESS_LOGIN_IDP,
CYPRESS_LOGIN_USERS) persist kubeadmin credentials in plaintext; update
export-env.sh to add an explicit “DO NOT COMMIT / secrets” header, enforce
restrictive file permissions (e.g., create or instruct to run chmod 600 on the
file), and add a .gitignore entry to ignore this file by name; also include a
short comment explaining the secure workflow (temporary file only, remove after
use) and a one-line check in the script that exits if the file is tracked in git
to prevent accidental commits.

Comment on lines +80 to +81
"Bash(cd /home/drajnoha/Code/monitoring-plugin:*)",
"Bash(find /home/drajnoha/Code/monitoring-plugin/web/cypress:*)",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Replace hard-coded absolute paths with variables or relative paths.

The permissions configuration and screenshot find command contain hard-coded absolute paths (/home/drajnoha/Code/monitoring-plugin). This makes the skill non-portable and will fail for other users or execution environments (including the Docker sandbox mentioned in the PR).

🔧 Proposed fix

For the permissions block (lines 80-81), use a placeholder or environment variable:

       "Bash(source cypress/export-env.sh:*)",
-      "Bash(cd /home/drajnoha/Code/monitoring-plugin:*)",
-      "Bash(find /home/drajnoha/Code/monitoring-plugin/web/cypress:*)",
+      "Bash(cd ${REPO_ROOT}:*)",
+      "Bash(find ${REPO_ROOT}/web/cypress:*)",
       "Bash(ls:*)"

Add documentation explaining that ${REPO_ROOT} should be set to the repository root path.

For the find command (line 235), make it relative to the working directory:

 ```bash
-find /home/drajnoha/Code/monitoring-plugin/web/cypress/screenshots -name "*.png" -type f
+find web/cypress/screenshots -name "*.png" -type f

Since the instructions specify running from the `web/` directory in Step 4, the path should be relative:

```diff
-find /home/drajnoha/Code/monitoring-plugin/web/cypress/screenshots -name "*.png" -type f
+find cypress/screenshots -name "*.png" -type f

Also applies to: 235-235

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around
lines 80 - 81, Replace the hard-coded absolute paths in the permission and
screenshot commands: change the "Bash(cd
/home/drajnoha/Code/monitoring-plugin:*)" and "Bash(find
/home/drajnoha/Code/monitoring-plugin/web/cypress:*)" entries to use a
repository-root variable (e.g. "${REPO_ROOT}") or relative paths (e.g. "cd
${REPO_ROOT}" or "find web/cypress/screenshots -name \"*.png\" -type f" when
running from the repo root or "find ./web/cypress/screenshots ..." from the web/
working dir); also add brief docs noting that REPO_ROOT must be set in the
environment or CI before running the commands.

Comment on lines +90 to +93
### 3. Unsigned Commits

All commits in this workflow use `--no-gpg-sign` to avoid GPG passphrase prompts blocking the loop. These unsigned commits live on a working branch and are intended to be **squash-merged** by the user with their own signature when approved. Never push unsigned commits directly to main.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Document the security implications and review process for unsigned commits.

The workflow uses --no-gpg-sign to avoid blocking on GPG prompts during autonomous execution. While the text mentions these commits are "intended to be squash-merged by the user with their own signature," this creates a supply-chain risk if the review process isn't followed strictly.

Consider adding explicit safeguards:

  1. Add a pre-merge validation step in the final report that reminds the user to:

    • Review every generated commit before merging
    • Verify no malicious changes were introduced
    • Sign the squash-merge commit with their GPG key
  2. Document the threat model: What happens if a malicious diagnosis/fix agent introduces backdoor code? The unsigned commits make attribution harder.

  3. Consider an alternative approach:

    • Use a separate signing key for the autonomous agent
    • Or require human approval for each commit batch (defeats the "autonomous" goal but increases security)
    • Or run in an isolated environment (Docker sandbox) with commit verification before pushing

The Docker sandbox infrastructure added in this PR might be the right place to implement commit verification before any changes are pushed to the main repository.

Also applies to: 337-338, 483-484

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around
lines 90 - 93, Update the "3. Unsigned Commits" section in
iterate-incident-tests.md to document security implications and prescribe a
mandatory pre-merge validation step: add language in the "final report"
instructions that requires the user to review every generated commit, verify no
malicious changes, and sign the squash-merge with their GPG key; also add a
short threat-model paragraph describing risks if an agent introduces backdoor
code and attribution difficulties for unsigned commits; finally, note
alternative mitigations (use a dedicated agent signing key, require human
approval per commit batch, or run in the Docker sandbox with an automated
commit-verification step before any push) and call out the Docker sandbox as the
recommended place to implement commit verification.

Comment on lines +99 to +502
## Instructions

Execute the following steps in order. This is the main orchestrator — it coordinates sub-agents and manages the iteration loop.

### Step 0: CI Context (optional)

If `ci-url` is provided, run `/cypress:test-iteration:analyze-ci-results` first to get CI failure context.

Capture the CI analysis output:
- If **all failures are INFRA_***: Report the infrastructure issues to the user and **STOP**. No test changes will help.
- If **mixed infra + test/code**: Note the infra issues for the user, but proceed with the test/code failures only.
- If **all test/code**: Proceed. Use the CI diagnosis (commit correlation, screenshots) as context for the local iteration.

Store the CI analysis as `ci_context` for later reference by diagnosis agents.

### Step 1: Branch Setup

First, check the current branch:
```bash
git rev-parse --abbrev-ref HEAD
```

**Decision logic:**
- If `skip-branch` is "true": Stay on the current branch, skip to Step 3.
- If already on a `test/incident-robustness-*` branch: Stay on it, skip to Step 3.
- If on any other non-main working branch (e.g., `agentic-test-iteration`, a feature branch): Ask the user whether to create a child branch or work on the current one.
- If on `main`: Create a new branch.

To create a branch (only when needed):
```bash
git checkout -b test/incident-robustness-$(date +%Y-%m-%d)
```

If that branch name already exists, append a suffix: `-2`, `-3`, etc.

**IMPORTANT**: Do NOT combine `cd` and `git` in the same command — compound `cd && git` commands trigger a security approval prompt that blocks autonomous execution. Always use separate Bash calls, or set the working directory before running git.

### Step 2: Read Stability Ledger

Read `web/cypress/reports/test-stability.md` and parse the JSON block between `STABILITY_DATA_START` and `STABILITY_DATA_END`.

Extract a compressed summary for use during diagnosis:

```
Stability context (from {N} previous runs):
Known flaky:
- "test full title" — passed {X}/{Y} runs, last failure: {date}, reason: {reason}
- "test full title" — passed {X}/{Y} runs, last failure: {date}, reason: {reason}
Recently fixed:
- "test full title" — fixed by {commit}, stable since {date}
Persistent failures:
- "test full title" — failed {X}/{Y} runs, never fixed
```

If the ledger has no data yet (first run), note "No prior stability data" and continue.

Store this as `stability_context` — it will be passed to Diagnosis Agents in Step 8 to help them:
- Prioritize known-flaky tests (these are higher value to fix)
- Avoid re-attempting fixes that already failed in previous runs
- Distinguish new failures from recurring ones

### Step 3: Resolve Target

Based on the `target` parameter, determine the Cypress run command:

| Target | Spec | Grep Tags |
|--------|------|-----------|
| `all` | `cypress/e2e/incidents/**/*.cy.ts` | `@incidents --@e2e-real --@flaky --@demo` |
| `regression` | `cypress/e2e/incidents/regression/**/*.cy.ts` | `@incidents --@e2e-real --@flaky` |
| specific file | `cypress/e2e/incidents/{target}` | (none) |
| grep pattern | `cypress/e2e/incidents/**/*.cy.ts` | (none, use `--env grepTags="{target}"`) |

### Step 4: Clean Previous Results

From the `web/` directory:
```bash
bash scripts/clean-test-artifacts.sh
```

### Step 5: Run Tests

Execute Cypress inline (NOT in a separate terminal). From the `web/` directory:

```bash
source cypress/export-env.sh && npx cypress run --spec "{SPEC}" --env grepTags="{GREP_TAGS}"
```

Where `{GREP_TAGS}` comes from the "Grep Tags" column in the target table above. If the target has no grep tags, omit the `--env` flag entirely.

**IMPORTANT**: Use `grepTags` (not `grep`). The `grep` option searches test names as text, while `grepTags` filters by `@tag` annotations. Using `grep` with tag strings like `@incidents --@e2e-real` will match nothing and cause all specs to run unfiltered.

Note: `source && npx` is one logical operation (env setup + run) and is acceptable as a single command.

**IMPORTANT**: Tests can run for a long time, especially e2e tests that wait for alerts to fire. Use `run_in_background` to avoid blocking, and check the output when notified of completion.

Capture the exit code:
- `0` = all passed
- non-zero = failures occurred

### Step 6: Parse Results

Merge mochawesome reports and parse. From the `web/` directory:

```bash
npx mochawesome-merge screenshots/cypress_report_*.json -o screenshots/merged-report.json
```

Read `screenshots/merged-report.json` and extract:

For each test:
```
{
spec_file: string, // from results[].fullFile
suite: string, // from suites[].title
test_name: string, // from tests[].title
full_title: string, // from tests[].fullTitle
state: "passed" | "failed" | "skipped",
error_message: string, // from tests[].err.message (if failed)
stack_trace: string, // from tests[].err.estack (if failed)
duration_ms: number // from tests[].duration
}
```

Build a failure list and a pass list.

**Note**: Mochawesome JSON has nested suites. Walk the tree recursively:
```
results[] -> suites[] -> tests[]
-> suites[] -> tests[] (nested suites)
```

### Step 7: Identify Screenshots

For each failure, find the corresponding screenshot:

```bash
find /home/drajnoha/Code/monitoring-plugin/web/cypress/screenshots -name "*.png" -type f
```

Match screenshots to failures using the naming convention:
```
{Suite Name} -- {Test Title} (failed).png
{Suite Name} -- {Test Title} -- before all hook (failed).png
```

### Step 8: Diagnosis Loop

**If no failures** (exit code 0): Skip to Step 13 (flakiness probe).

**If failures exist**: For each failing test, spawn a **Diagnosis Agent** (Explore-type sub-agent).

Use the `/cypress:test-iteration:diagnose-test-failure` skill prompt. Provide:
- `test-name`: the full title
- `spec-file`: the spec file path
- `error-message`: the error message
- `screenshot-path`: absolute path to the failure screenshot
- `stack-trace`: the error stack trace
- `ci-context`: any relevant context from Step 0
- `stability-context`: the compressed stability summary from Step 2 (prior flakiness history, previous failure patterns, previously attempted fixes)

**Parallelization**: If failures are in **different spec files**, spawn diagnosis agents in parallel. If they're in the **same spec file**, diagnose sequentially (they may share root causes like a broken `before all` hook).

**Before-all hook failures**: If a `before all` hook failed, all tests in that suite were skipped. Diagnose only the hook failure — fixing it will unblock all skipped tests.

Collect all diagnoses. Separate into:
- **Fixable**: `TEST_BUG`, `FIXTURE_ISSUE`, `PAGE_OBJECT_GAP`, `MOCK_ISSUE`
- **Blocking**: `REAL_REGRESSION`, `INFRA_ISSUE`

If any **blocking** issues found: Report them to the user. Continue fixing the fixable issues.

### Step 9: Fix Loop

For each fixable failure, spawn a **Fix Agent** (general-purpose sub-agent).

Provide the Fix Agent with:
1. The full diagnosis from Step 8
2. The test file content (read it)
3. The page object content (read `cypress/views/incidents-page.ts`)
4. The fixture content (if relevant)
5. These constraints:

```
## Fix Constraints

You may ONLY edit files in these paths:
- web/cypress/e2e/incidents/**/*.cy.ts (test files)
- web/cypress/fixtures/incident-scenarios/*.yaml (fixtures)
- web/cypress/views/incidents-page.ts (page object)
- web/cypress/support/incidents_prometheus_query_mocks/** (mock layer)

You must NOT edit:
- web/src/** (source code — that's Phase 2)
- Non-incident test files
- Cypress config or support infrastructure
- Any file outside the web/ directory

## Fix Guidelines

- Prefer the minimal change that fixes the issue
- Don't refactor surrounding code — only fix the failing test
- If adding a wait/timeout, prefer Cypress retry-ability (.should()) over cy.wait()
- If fixing a selector, check that the new selector exists in the current DOM
by reading the relevant React component in src/ (read-only, don't edit)
- If fixing a fixture, validate it against the fixture schema
(run /cypress:test-development:validate-incident-fixtures mentally or reference the schema)
- If adding a page object method, follow existing naming conventions
```

After the Fix Agent returns, verify the fix makes sense:
- Does the edit address the diagnosed root cause?
- Could the edit break other tests?
- Is it the minimal change needed?

If the fix looks wrong, re-diagnose with additional context.

### Step 10: Validate Fixes

After applying fixes, re-run **only the previously failing tests**:

From the `web/` directory:
```bash
source cypress/export-env.sh && npx cypress run --spec "{SPEC}" --env grepTags="{FAILING_TEST_TAGS}"
```

For each test:
- **Now passes**: Stage the fix files with `git add`
- **Still fails**: Re-diagnose (increment retry counter). Max 2 retries per test.
- **After 2 retries still failing**: Mark as `UNRESOLVED` and report to user

### Step 11: Commit Batch

After all fixable failures are addressed (or max retries reached):

Stage and commit as separate commands (never chain `cd && git`):
```bash
git add <fixed-files>
```
```bash
git commit --no-gpg-sign -m "<message>"
```

Commit message format:
```
fix(tests): <summary of what was fixed>

- <file>: <change description>
- <file>: <change description>

Classifications: N TEST_BUG, N FIXTURE_ISSUE, N PAGE_OBJECT_GAP, N MOCK_ISSUE
Unresolved: N (if any)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
```

Track commit count. If commit count reaches **5**: Notify the user that the review threshold has been reached and ask whether to continue or pause for review.

### Step 12: Iterate

If there were failures and `current_iteration < max-iterations`:
- Increment iteration counter
- Go back to **Step 4** (clean results and re-run)

This catches cascading fixes — e.g., fixing a `before all` hook unblocks skipped tests that may have their own issues.

If all tests pass: Proceed to Step 13.

### Step 13: Flakiness Probe

Run the full target test suite `flakiness-runs` times (default: 3), even if everything is green.

For each run:
1. Clean previous results (Step 4)
2. Run tests (Step 5)
3. Parse results (Step 6)
4. Record per-test pass/fail

After all runs, compute flakiness:

```
Flakiness Report:
Total tests: N
Stable (all runs passed): N
Flaky (some runs failed): N
Broken (all runs failed): N

Flaky tests:
- "test name" — passed 2/3 runs
Error on failure: <error message>
- "test name" — passed 1/3 runs
Error on failure: <error message>
```

For each **flaky** test:
- Diagnose it using `/cypress:test-iteration:diagnose-test-failure` with the context that it's intermittent
- Common flaky patterns: race conditions, animation timing, network mock timing, DOM detach/reattach
- Apply fix if confident (add `.should('exist')` guards, use `{ timeout: N }`, avoid `.eq(N)` on dynamic lists)
- Re-run flakiness probe on just the fixed tests to verify

### Step 14: Final Report

Output a summary:

```
# Iteration Complete

## Branch: test/incident-robustness-YYYY-MM-DD
## Commits: N
## Iterations: N

## Results
- Tests run: N
- Passing: N
- Fixed in this session: N
- Unresolved: N (details below)
- Flaky (stabilized): N
- Flaky (remaining): N

## Fixes Applied
1. [commit-sha] fix(tests): <summary>
- <file>: <change>

2. [commit-sha] fix(tests): <summary>
- <file>: <change>

## Unresolved Issues
- "test name": REAL_REGRESSION — <description>. Source file X was modified in PR #N.
- "test name": UNRESOLVED after 2 retries — <last error>

## Remaining Flakiness
- "test name": 2/3 passed — timing issue in chart rendering, needs investigation

## Recommendations
- [Next steps for unresolved issues]
- [Whether to merge current fixes or wait]
```

### Step 15: Update Stability Ledger

After the final report, update `web/cypress/reports/test-stability.md`.

Read the file and update both sections:

**1. Current Status table** — for each test in this run:
- If test already in table: update pass rate (rolling average across all recorded runs), update trend
- If test is new: add a row
- Pass rate = total passes / total runs across all recorded iterations
- Trend: compare last 3 runs — improving / stable / degrading

**2. Run History log** — append a new row:
```
| {next_number} | {YYYY-MM-DD} | local | {branch} | {total_tests} | {passed} | {failed} | {flaky} | {commit_sha} |
```

**3. Machine-readable data** — update the JSON block between `STABILITY_DATA_START` and `STABILITY_DATA_END`:
```json
{
"tests": {
"test full title": {
"results": ["pass", "pass", "fail", "pass"],
"last_failure_reason": "Timed out...",
"last_failure_date": "2026-03-23",
"fixed_by": "abc1234"
}
},
"runs": [
{
"date": "2026-03-23",
"type": "local",
"branch": "test/incident-robustness-2026-03-23",
"total": 15,
"passed": 15,
"failed": 0,
"flaky": 0,
"commit": "abc1234"
}
]
}
```

Commit the ledger update together with the final batch of fixes if any, or as a standalone commit:
```bash
git add web/cypress/reports/test-stability.md
```
```bash
git commit --no-gpg-sign -m "docs: update test stability ledger — {passed}/{total} passed, {flaky} flaky"
```

### Error Handling

- **Cypress crashes** (not just test failures): Check if it's an OOM issue (`--max-old-space-size`), a missing dependency, or a config problem. Report to user.
- **No `export-env.sh`**: Create it directly using the cluster credentials from the conversation. Do NOT use the interactive `/cypress:setup`.
- **No mochawesome reports generated**: Check if the reporter config is correct. Fall back to parsing Cypress console output.
- **Git conflicts**: If the working branch has conflicts with changes, report to user and stop.
- **Sub-agent failure**: If a Diagnosis or Fix agent fails, log the error and skip that test. Don't let one broken agent block the whole loop.

### Guardrails

- **Never edit source code** (`src/`) in Phase 1
- **Never disable a test** — if a test can't be fixed, mark it as unresolved, don't add `.skip()`
- **Never add `@flaky` tag** to a test — that's a human decision
- **Never change test assertions to match wrong behavior** — if the UI is wrong, it's a REAL_REGRESSION
- **Max 2 retries per test** to avoid infinite loops
- **Max 5 commits before pausing** for user review
- **Always run flakiness probe** before declaring success
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout and resource limits to prevent runaway autonomous execution.

The autonomous workflow can potentially run indefinitely with nested loops (max iterations × max retries × flakiness runs). There's no global timeout, memory limit, or total execution time budget specified.

Consider adding:

  1. Global timeout: Maximum wall-clock time for the entire workflow (e.g., 4 hours)
  2. Per-test timeout: Maximum time for a single test run (inherited from Cypress config, but should be documented)
  3. Total resource budget: Maximum number of test executions (e.g., max_iterations × flakiness_runs × 2 should not exceed 50 total runs)
  4. Circuit breaker: If >50% of tests fail in iteration 1, pause for human review instead of continuing

These limits protect against:

  • Infinite loops due to bugs in the orchestration logic
  • Resource exhaustion on the execution environment
  • Runaway costs in cloud execution scenarios (mentioned in the PR's architecture docs)
🧰 Tools
🪛 LanguageTool

[style] ~125-~125: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ld branch or work on the current one. - If on main: Create a new branch. To cre...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[style] ~267-~267: Consider using a different verb for a more formal wording.
Context: ...ound: Report them to the user. Continue fixing the fixable issues. ### Step 9: Fix Lo...

(FIX_RESOLVE)

🪛 markdownlint-cli2 (0.22.0)

[warning] 142-142: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 209-209: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 225-225: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 239-239: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 280-280: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 341-341: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 377-377: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 401-401: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 448-448: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around
lines 99 - 502, Add explicit execution limits to the orchestrator: define and
check a global wall-clock timeout (e.g., max_wall_clock_time) at the top of the
main orchestration loop (Steps 0–15) and abort with a clear message when
exceeded; enforce a per-test/run timeout by passing/integrating a configured
per_test_timeout into the Cypress invocation (the `source cypress/export-env.sh
&& npx cypress run ...` call in Step 5) and into any retry loops; track total
test executions with a total_run_budget counter (used in Step 12/Step 13) and
stop spawning Diagnosis/Fix agents when exceeded; implement a circuit_breaker
threshold (e.g., circuit_breaker_failure_rate) observed after the first
iteration (Step 5/Step 6) to pause for human review if >50% failures; surface
these config values (max_wall_clock_time, per_test_timeout, total_run_budget,
circuit_breaker_threshold) near the orchestrator entry and check them before
each major loop/agent spawn.

Comment thread sandbox/Dockerfile
Comment on lines +29 to +30
RUN curl -fsSL https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz \
| tar xz -C /usr/local/bin oc kubectl
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Verify OpenShift client artifact integrity before extracting

Lines 29–30 stream a remote tarball directly into tar without checksum/signature validation. That is a supply-chain gap for a privileged build step.

🛡️ Proposed fix
-RUN curl -fsSL https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz \
-    | tar xz -C /usr/local/bin oc kubectl
+RUN set -euo pipefail; \
+    OC_URL="https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz"; \
+    curl -fsSL "$OC_URL" -o /tmp/openshift-client-linux.tar.gz; \
+    curl -fsSL "${OC_URL}.sha256" -o /tmp/openshift-client-linux.tar.gz.sha256; \
+    sha256sum -c /tmp/openshift-client-linux.tar.gz.sha256; \
+    tar xzf /tmp/openshift-client-linux.tar.gz -C /usr/local/bin oc kubectl; \
+    rm -f /tmp/openshift-client-linux.tar.gz /tmp/openshift-client-linux.tar.gz.sha256
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN curl -fsSL https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz \
| tar xz -C /usr/local/bin oc kubectl
RUN set -euo pipefail; \
OC_URL="https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz"; \
curl -fsSL "$OC_URL" -o /tmp/openshift-client-linux.tar.gz; \
curl -fsSL "${OC_URL}.sha256" -o /tmp/openshift-client-linux.tar.gz.sha256; \
sha256sum -c /tmp/openshift-client-linux.tar.gz.sha256; \
tar xzf /tmp/openshift-client-linux.tar.gz -C /usr/local/bin oc kubectl; \
rm -f /tmp/openshift-client-linux.tar.gz /tmp/openshift-client-linux.tar.gz.sha256
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sandbox/Dockerfile` around lines 29 - 30, The RUN that streams
openshift-client-linux.tar.gz into tar should first download and verify the
artifact integrity: fetch both the tarball and its corresponding
checksum/signature for openshift-client-linux.tar.gz, verify the checksum (e.g.,
SHA256) or validate the signature before extraction, and only proceed to extract
oc and kubectl from openshift-client-linux.tar.gz if verification succeeds;
update the Dockerfile around the RUN that references
openshift-client-linux.tar.gz to perform the download, verification, and
conditional extraction steps so the build fails on checksum/signature mismatch.

Comment thread sandbox/Dockerfile
| tar xz -C /usr/local/bin oc kubectl

# Install Claude Code via npm (uses Node.js, not Bun)
RUN npm install -g @anthropic-ai/claude-code@latest
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "Dockerfile" -type f

Repository: openshift/monitoring-plugin

Length of output: 129


🏁 Script executed:

cat -n sandbox/Dockerfile | head -50

Repository: openshift/monitoring-plugin

Length of output: 2076


🏁 Script executed:

rg "npm install" --type dockerfile

Repository: openshift/monitoring-plugin

Length of output: 105


🏁 Script executed:

rg "npm install" --type json

Repository: openshift/monitoring-plugin

Length of output: 53


🏁 Script executed:

rg "npm install" --type-list | grep -i docker

Repository: openshift/monitoring-plugin

Length of output: 460


🏁 Script executed:

rg "npm install" sandbox/

Repository: openshift/monitoring-plugin

Length of output: 141


🏁 Script executed:

rg "@latest" sandbox/

Repository: openshift/monitoring-plugin

Length of output: 141


🏁 Script executed:

find . -name "package.json" -type f -exec cat -n {} + | head -100

Repository: openshift/monitoring-plugin

Length of output: 7385


🏁 Script executed:

cat sandbox/Dockerfile | grep -E "(ARG|FROM|npm|RUN)" | head -20

Repository: openshift/monitoring-plugin

Length of output: 669


Pin Claude Code version in image builds

Line 33 uses @latest, which prevents reproducible builds and can cause drift in deployed images, leading to unexplained CI/sandbox regressions.

📌 Proposed fix
-RUN npm install -g `@anthropic-ai/claude-code`@latest
+ARG CLAUDE_CODE_VERSION=<pin-known-good-version>
+RUN npm install -g "@anthropic-ai/claude-code@${CLAUDE_CODE_VERSION}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sandbox/Dockerfile` at line 33, The Dockerfile RUN that installs
`@anthropic-ai/claude-code` currently uses the floating tag "@latest" which breaks
reproducible image builds; change the RUN line installing
`@anthropic-ai/claude-code` to pin a specific version (or use a build ARG like
CLAUDE_CODE_VERSION and reference it in the RUN) so the installation uses a
fixed version instead of "@latest"—update the RUN command that mentions
"@anthropic-ai/claude-code@latest" accordingly.

Comment thread sandbox/run.sh
Comment on lines +47 to +52
CURRENT_BRANCH=$(git -C "$PROJECT_ROOT" branch --show-current)
REMOTE_URL=$(git -C "$PROJECT_ROOT" remote get-url origin 2>/dev/null || true)

echo "Cloning branch '$CURRENT_BRANCH' into $SANDBOX_DIR..."
git clone --single-branch --branch "$CURRENT_BRANCH" --depth 50 "$PROJECT_ROOT" "$SANDBOX_DIR"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle detached HEAD before cloning by branch

If Line 47 returns an empty branch (detached HEAD), Line 51 fails (--branch ""). Add a fallback path (e.g., clone default branch or fail early with a clear error).

🧭 Proposed fix
 CURRENT_BRANCH=$(git -C "$PROJECT_ROOT" branch --show-current)
+if [ -z "$CURRENT_BRANCH" ]; then
+    echo "ERROR: repository is in detached HEAD; cannot determine branch for sandbox clone"
+    echo "Please checkout a branch and retry."
+    exit 1
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sandbox/run.sh` around lines 47 - 52, CURRENT_BRANCH may be empty when the
repo is in a detached HEAD, causing git clone --branch "" to fail; update the
script to detect an empty CURRENT_BRANCH (the variable computed by
CURRENT_BRANCH=$(git -C "$PROJECT_ROOT" branch --show-current)) and provide a
fallback: either resolve the repo's default branch (e.g., query origin's HEAD or
use a sensible default like "main"/"master") and assign that to CURRENT_BRANCH,
or exit early with a clear error message before running git clone
--single-branch --branch "$CURRENT_BRANCH" so the subsequent git clone call
never receives an empty branch name.

Comment thread sandbox/run.sh
echo " Kubeconfig: $([ -n "${KUBECONFIG_MOUNT:-}" ] && echo 'mounted (read-only)' || echo 'NOT SET')"
echo ""
echo " Filesystem: Only worktree is writable. Host is isolated."
echo " See docs/docker-sandbox-blast-radius.md for full security analysis."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix printed documentation path

Line 69 references docs/docker-sandbox-blast-radius.md, but the documented file path is under docs/agentic-development/architecture/security/. The current output can send users to a non-existent path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sandbox/run.sh` at line 69, Update the printed docs path in sandbox/run.sh:
the echo statement referencing "docs/docker-sandbox-blast-radius.md" should
point to the actual documentation location under
"docs/agentic-development/architecture/security/docker-sandbox-blast-radius.md"
so users are directed to the existing file; locate the echo call in run.sh and
replace the old relative path with the full path above.

Comment thread sandbox/run.sh
Comment on lines +73 to +84
docker run -it --rm \
--name "$CONTAINER_NAME" \
-v "${SANDBOX_DIR}:/sandbox" \
-v "${ADC_PATH}:/tmp/adc.json:ro" \
${KUBECONFIG_MOUNT:-} \
-e "GOOGLE_APPLICATION_CREDENTIALS=/tmp/adc.json" \
-e "CLAUDE_CODE_USE_VERTEX=1" \
-e "ANTHROPIC_VERTEX_PROJECT_ID=$VERTEX_PROJECT" \
-e "CLOUD_ML_REGION=$VERTEX_REGION" \
-e "KUBECONFIG=/tmp/kubeconfig" \
-e "GITHUB_TOKEN=${GITHUB_TOKEN:-}" \
"$IMAGE_NAME"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "run.sh" -path "*/sandbox/*" | head -5

Repository: openshift/monitoring-plugin

Length of output: 87


🏁 Script executed:

wc -l sandbox/run.sh 2>/dev/null || echo "File not found at that path"

Repository: openshift/monitoring-plugin

Length of output: 88


🏁 Script executed:

cat -n sandbox/run.sh | sed -n '70,90p'

Repository: openshift/monitoring-plugin

Length of output: 665


🏁 Script executed:

grep -n "KUBECONFIG_MOUNT\|KUBECONFIG_PATH" sandbox/run.sh

Repository: openshift/monitoring-plugin

Length of output: 486


Use an argument array for optional mounts to prevent word-splitting issues

Line 77 expands ${KUBECONFIG_MOUNT:-} unquoted in the docker command. This is unsafe because if the path contains spaces, word-splitting will occur and break the mount specification. Using an array pattern ensures proper argument handling.

🛠️ Proposed fix
+DOCKER_ARGS=(
+  -it --rm
+  --name "$CONTAINER_NAME"
+  -v "${SANDBOX_DIR}:/sandbox"
+  -v "${ADC_PATH}:/tmp/adc.json:ro"
+  -e "GOOGLE_APPLICATION_CREDENTIALS=/tmp/adc.json"
+  -e "CLAUDE_CODE_USE_VERTEX=1"
+  -e "ANTHROPIC_VERTEX_PROJECT_ID=$VERTEX_PROJECT"
+  -e "CLOUD_ML_REGION=$VERTEX_REGION"
+  -e "KUBECONFIG=/tmp/kubeconfig"
+  -e "GITHUB_TOKEN=${GITHUB_TOKEN:-}"
+)
+if [ -n "${KUBECONFIG_MOUNT:-}" ]; then
+  DOCKER_ARGS+=(-v "${KUBECONFIG_PATH}:/tmp/kubeconfig:ro")
+fi
-docker run -it --rm \
-    --name "$CONTAINER_NAME" \
-    -v "${SANDBOX_DIR}:/sandbox" \
-    -v "${ADC_PATH}:/tmp/adc.json:ro" \
-    ${KUBECONFIG_MOUNT:-} \
-    -e "GOOGLE_APPLICATION_CREDENTIALS=/tmp/adc.json" \
-    -e "CLAUDE_CODE_USE_VERTEX=1" \
-    -e "ANTHROPIC_VERTEX_PROJECT_ID=$VERTEX_PROJECT" \
-    -e "CLOUD_ML_REGION=$VERTEX_REGION" \
-    -e "KUBECONFIG=/tmp/kubeconfig" \
-    -e "GITHUB_TOKEN=${GITHUB_TOKEN:-}" \
-    "$IMAGE_NAME"
+docker run "${DOCKER_ARGS[@]}" "$IMAGE_NAME"
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 77-77: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sandbox/run.sh` around lines 73 - 84, The docker run line uses an unquoted
expansion ${KUBECONFIG_MOUNT:-} which can word-split paths with spaces; replace
it with an arguments array: build a shell array (e.g. DOCKER_OPTS or MOUNTS) and
conditionally append the KUBECONFIG_MOUNT value only when non-empty, then call
docker run with "${DOCKER_OPTS[@]}" between the other fixed flags so the
KUBECONFIG_MOUNT is passed as a single safe argument; update the docker run
invocation that references CONTAINER_NAME and IMAGE_NAME to use the array
element expansion instead of the unquoted ${KUBECONFIG_MOUNT:-}.

Claude Agent and others added 2 commits April 16, 2026 11:12
Skill that takes one or more agentic test iteration branches and
produces a clean, shippable fix branch. Key steps:

1. Discover and fetch source branches
2. Analyze each branch's changes and intent
3. Evaluate overlaps (complementary/progressive/redundant/conflicting)
4. Present evaluation and wait for user confirmation
5. Apply changes with inline lint fixing (no separate lint commits)
6. Structure commits by logical concern, not by source branch
7. Verify with regression run + e2e-real + flakiness probe
8. Confirm with user before pushing

Codifies the workflow from the OBSINTA-1290 consolidation session.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…osis skills

Agentic iteration loops lack git history context and can re-introduce
patterns that were previously removed for good reasons (e.g. cy.reload()
breaking dynamic plugin chunk loading). This caused a real regression
during the OBSINTA-1290 consolidation.

Added checks to three skills:
- productize-iterations: Step 3 now flags REGRESSIVE overlaps where an
  iteration branch re-introduces a removed pattern
- iterate-incident-tests: Fix Guidelines now require checking git log
  before applying any fix
- diagnose-test-failure: Diagnosis output now includes a History Check
  section that flags previously reverted patterns

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 16, 2026

@DavidRajnoha: This pull request references OBSINTA-1287 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Add four Claude Code skills for agentic Cypress test iteration:
    /analyze-ci-results, /diagnose-test-failure, /iterate-incident-tests, and
    /iterate-ci-flaky
  • Add supporting scripts for CI polling, Slack notifications, and GitHub PR
    review
  • Add Docker-based sandbox for isolated Claude Code agent execution with
    security policy and blast radius documentation
  • Reorganize .claude/commands/ directory structure into
    cypress/test-iteration/ and cypress/test-development/ subdirectories

Summary by CodeRabbit

  • New Features

  • Docker-based isolated sandbox for running agentic test tooling and local CI-like runs

  • End-to-end agentic test-iteration system with CI analysis and automated fix/validate loops

  • Slack notifications and GitHub PR comment review integrations for iterative test fixes

  • CLI helpers to poll CI status and clean test artifacts

  • Documentation

  • Detailed sandbox, security, and test-iteration architecture guides

  • Test stability ledger report added for tracking stability across runs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
.claude/commands/cypress/test-iteration/iterate-incident-tests.md (2)

80-81: ⚠️ Potential issue | 🟠 Major

Replace machine-specific absolute paths with repo-relative or ${REPO_ROOT} paths.

Lines 80–81 and Line 235 still hardcode /home/drajnoha/..., which breaks portability in other developer machines/CI/sandbox environments.

Also applies to: 235-235

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around
lines 80 - 81, The hardcoded absolute paths ("Bash(cd
/home/drajnoha/Code/monitoring-plugin:*)" and "Bash(find
/home/drajnoha/Code/monitoring-plugin/web/cypress:*)") must be made portable:
replace those literal /home/drajnoha/... strings with repo-relative paths or use
the ${REPO_ROOT} variable (e.g., "${REPO_ROOT}/monitoring-plugin" and
"${REPO_ROOT}/monitoring-plugin/web/cypress") wherever those two entries and the
similar occurrence around line 235 appear so the commands run on other developer
machines and CI environments.

99-510: ⚠️ Potential issue | 🟠 Major

Add global timeout/run-budget/circuit-breaker guardrails for autonomous loops.

The workflow still has nested retry/iteration/flakiness loops without an explicit wall-clock cutoff, total execution budget, or first-iteration failure-rate circuit breaker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around
lines 99 - 510, The orchestration flow (Steps 0–15, variables like
current_iteration, max-iterations, flakiness-runs, run_in_background and the
commit threshold) lacks global timeout/ budget and a first-iteration
failure-rate circuit breaker; add configurable guardrails: introduce global
config variables (e.g., GLOBAL_WALLCLOCK_TIMEOUT_MS, MAX_TOTAL_RUN_SECONDS,
MAX_AUTONOMOUS_COMMITS, and FIRST_ITER_FAIL_RATE_THRESHOLD) and enforce them at
the top-level orchestrator so the loop aborts when wall-clock or total-run
budget is exceeded, when commits reach MAX_AUTONOMOUS_COMMITS (already tracked
at 5), or when the first iteration failure rate >
FIRST_ITER_FAIL_RATE_THRESHOLD—return a clear "budget_exhausted" or
"circuit_breaker_triggered" result and stop spawning Diagnosis/Fix agents;
ensure these checks are evaluated before Step 4 (clean) and between iterations
(before incrementing current_iteration) and persisted in the final report.
🧹 Nitpick comments (4)
.claude/commands/cypress/test-iteration/diagnose-test-failure.md (1)

111-145: Specify fenced code block language.

At Line 111, the fenced block has no language tag (MD040). Use text to keep it lint-clean and unambiguous.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md around
lines 111 - 145, The fenced code block under the "## Diagnosis" section is
missing a language tag (MD040); add a language identifier `text` to the opening
triple-backtick fence (the ``` block that contains the Diagnosis template) so it
becomes ```text to satisfy the linter and keep the block unambiguous.
.claude/commands/cypress/test-iteration/productize-iterations.md (2)

36-37: Use named parameters consistently instead of shell positional placeholders.

The document declares named params (branch-name, source-branches, etc.) but operational steps use $1/$2/$3/$4. This mismatch increases execution mistakes when invoked as a command skill.

Also applies to: 131-131, 183-183, 218-218, 240-240, 260-260

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/productize-iterations.md around
lines 36 - 37, The doc mixes named parameters (branch-name, source-branches,
etc.) with shell positional placeholders ($1/$2/$3/$4); update all operational
examples and commands to use the declared named parameters consistently (e.g.,
replace occurrences of $1/$2/$3/$4 with the corresponding named tokens like
source-branches, branch-name, target-branch, iteration-number or a consistent
${source-branches} style) so commands and instructions match the declared
parameter names; apply this change for the referenced occurrences (including the
`$2` glob handling example and the other noted instances) and ensure any example
command parsing logic (the glob check for source-branches) references the named
param instead of $2.

49-49: Specify fenced code block languages (MD040).

Add explicit language tags to these fenced blocks to satisfy markdownlint and improve readability.

Also applies to: 64-64, 104-104, 169-169, 224-224, 237-237

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/productize-iterations.md at line 49,
The markdown contains multiple fenced code blocks that use plain ``` with no
language tag (MD040); update each bare fence (specifically the blocks currently
at lines 49, 64, 104, 169, 224, and 237) to include an appropriate language
identifier (e.g., ```bash, ```sh, ```json, ```yaml, or ```text) that matches the
block content so markdownlint MD040 is satisfied and readability is improved;
after updating the fences, run markdownlint to verify no MD040 violations
remain.
.claude/commands/cypress/test-iteration/iterate-incident-tests.md (1)

142-142: Add language identifiers to fenced code blocks.

MD040 warnings are valid on these blocks; add text, bash, or json as appropriate.

Also applies to: 209-209, 225-225, 239-239, 280-280, 349-349, 385-385, 409-409, 456-456

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md at line
142, Update the Markdown file's fenced code blocks flagged by MD040: replace
bare triple backticks (```) with language-specific fences (e.g., ```bash for
shell commands, ```json for JSON snippets, ```text for plain output/examples)
for each code block in the iterate-incident-tests markdown so the linter no
longer reports MD040; ensure each block's language matches the block content
(commands → bash, JSON → json, plain text/example → text).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md:
- Around line 4-22: The parameters block is missing the stability-context
parameter the orchestrator sends; add a new parameter entry named
"stability-context" in the parameters list (same level as test-name, spec-file,
error-message, etc.) so the handler accepts the orchestrator payload, mark it
required: false, and include a short description like "Context about test
stability from the orchestrator"; update any validation that enumerates accepted
parameter names (if present) to include stability-context and ensure functions
that read inputs (e.g., code that references test-name, spec-file,
error-message) can also access stability-context.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md:
- Around line 10-11: The docs are inconsistent about the "target" and test
filtering: the text says target can be a full spec path or a grep pattern for a
test title, but the implementation/documentation prepends
"cypress/e2e/incidents/{target}" (risking double-prefix) and uses grepTags
(which filters tags, not title text). Fix by making "target" semantics explicit
and robust: (1) update the logic/docs so when composing the spec path you check
if target already starts with "cypress/" before prepending
"cypress/e2e/incidents/", and (2) use the correct filter flag—use "grep" when
the intent is to match test titles and "grepTags" only when matching tag
expressions; adjust the relevant mentions/usages of target, grep, and grepTags
in the file so examples and instructions are consistent. Ensure examples show
both a full-spec path and a filename case, and show separate examples for "grep"
(title) vs "grepTags" (tag) usage.

In @.claude/commands/cypress/test-iteration/productize-iterations.md:
- Line 17: The docs for the test-target option are inconsistent and the
specific-file mapping can corrupt full paths: update the description for
"test-target" and the resolution table so they both mention the
grepTags/grep-pattern option (add a "grep-pattern" row to the table), and change
the specific-file resolution rule that prepends "cypress/e2e/incidents/{target}"
so it only prepends when the target is a bare filename (e.g., no '/' and not
already starting with "cypress/"); if the target looks like a full path
(contains '/' or starts with "cypress/") leave it unchanged and document that
behavior in the table and the "test-target" description.
- Around line 1-20: Add a top-level `name` field to the YAML frontmatter for
this command (between the leading and trailing ---) so the command becomes
discoverable; pick a stable identifier like `productize-iterations` (lowercase,
hyphenated) and add it above the `description` field, then run any command
registry/lint that validates frontmatter to ensure it passes discovery checks
and matches other commands’ naming conventions.
- Around line 265-266: Replace the credential-bearing rewrite of the remote URL
(the git remote set-url origin https://$(gh auth
token)@github.com/<owner>/<repo>.git line) with an ephemeral auth header or
credential helper when pushing; for example, remove the set-url step and invoke
push using an in-memory header like using git -c
http.extraheader="AUTHORIZATION: bearer $(gh auth token)" push origin $1 or
configure and rely on the Git credential helper / gh auth login instead of
embedding the token in the remote URL.

---

Duplicate comments:
In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md:
- Around line 80-81: The hardcoded absolute paths ("Bash(cd
/home/drajnoha/Code/monitoring-plugin:*)" and "Bash(find
/home/drajnoha/Code/monitoring-plugin/web/cypress:*)") must be made portable:
replace those literal /home/drajnoha/... strings with repo-relative paths or use
the ${REPO_ROOT} variable (e.g., "${REPO_ROOT}/monitoring-plugin" and
"${REPO_ROOT}/monitoring-plugin/web/cypress") wherever those two entries and the
similar occurrence around line 235 appear so the commands run on other developer
machines and CI environments.
- Around line 99-510: The orchestration flow (Steps 0–15, variables like
current_iteration, max-iterations, flakiness-runs, run_in_background and the
commit threshold) lacks global timeout/ budget and a first-iteration
failure-rate circuit breaker; add configurable guardrails: introduce global
config variables (e.g., GLOBAL_WALLCLOCK_TIMEOUT_MS, MAX_TOTAL_RUN_SECONDS,
MAX_AUTONOMOUS_COMMITS, and FIRST_ITER_FAIL_RATE_THRESHOLD) and enforce them at
the top-level orchestrator so the loop aborts when wall-clock or total-run
budget is exceeded, when commits reach MAX_AUTONOMOUS_COMMITS (already tracked
at 5), or when the first iteration failure rate >
FIRST_ITER_FAIL_RATE_THRESHOLD—return a clear "budget_exhausted" or
"circuit_breaker_triggered" result and stop spawning Diagnosis/Fix agents;
ensure these checks are evaluated before Step 4 (clean) and between iterations
(before incrementing current_iteration) and persisted in the final report.

---

Nitpick comments:
In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md:
- Around line 111-145: The fenced code block under the "## Diagnosis" section is
missing a language tag (MD040); add a language identifier `text` to the opening
triple-backtick fence (the ``` block that contains the Diagnosis template) so it
becomes ```text to satisfy the linter and keep the block unambiguous.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md:
- Line 142: Update the Markdown file's fenced code blocks flagged by MD040:
replace bare triple backticks (```) with language-specific fences (e.g., ```bash
for shell commands, ```json for JSON snippets, ```text for plain
output/examples) for each code block in the iterate-incident-tests markdown so
the linter no longer reports MD040; ensure each block's language matches the
block content (commands → bash, JSON → json, plain text/example → text).

In @.claude/commands/cypress/test-iteration/productize-iterations.md:
- Around line 36-37: The doc mixes named parameters (branch-name,
source-branches, etc.) with shell positional placeholders ($1/$2/$3/$4); update
all operational examples and commands to use the declared named parameters
consistently (e.g., replace occurrences of $1/$2/$3/$4 with the corresponding
named tokens like source-branches, branch-name, target-branch, iteration-number
or a consistent ${source-branches} style) so commands and instructions match the
declared parameter names; apply this change for the referenced occurrences
(including the `$2` glob handling example and the other noted instances) and
ensure any example command parsing logic (the glob check for source-branches)
references the named param instead of $2.
- Line 49: The markdown contains multiple fenced code blocks that use plain ```
with no language tag (MD040); update each bare fence (specifically the blocks
currently at lines 49, 64, 104, 169, 224, and 237) to include an appropriate
language identifier (e.g., ```bash, ```sh, ```json, ```yaml, or ```text) that
matches the block content so markdownlint MD040 is satisfied and readability is
improved; after updating the fences, run markdownlint to verify no MD040
violations remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: b44329ab-1bff-4022-b0f8-017b8ec937cf

📥 Commits

Reviewing files that changed from the base of the PR and between fad1ad6 and 15e4532.

📒 Files selected for processing (3)
  • .claude/commands/cypress/test-iteration/diagnose-test-failure.md
  • .claude/commands/cypress/test-iteration/iterate-incident-tests.md
  • .claude/commands/cypress/test-iteration/productize-iterations.md

Comment on lines +4 to +22
parameters:
- name: test-name
description: "Full title of the failing test (from mochawesome 'fullTitle' or Cypress output)"
required: true
- name: spec-file
description: "Path to the spec file (e.g., cypress/e2e/incidents/regression/01.reg_filtering.cy.ts)"
required: true
- name: error-message
description: "The error message from the test failure"
required: true
- name: screenshot-path
description: "Absolute path to the failure screenshot (will be read with multimodal vision)"
required: false
- name: stack-trace
description: "The error stack trace (estack from mochawesome)"
required: false
- name: ci-context
description: "Optional context from /cypress:test-iteration:analyze-ci-results (commit correlation, infra status)"
required: false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add missing stability-context parameter to match orchestrator contract.

Line 4–22 define accepted inputs, but the orchestrator passes stability-context (see iterate workflow). Without declaring it here, that context can be silently dropped during diagnosis.

🔧 Proposed fix
 parameters:
   - name: test-name
@@
   - name: ci-context
     description: "Optional context from /cypress:test-iteration:analyze-ci-results (commit correlation, infra status)"
     required: false
+  - name: stability-context
+    description: "Compressed prior stability history from test-stability ledger for flaky/persistent-failure correlation"
+    required: false
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parameters:
- name: test-name
description: "Full title of the failing test (from mochawesome 'fullTitle' or Cypress output)"
required: true
- name: spec-file
description: "Path to the spec file (e.g., cypress/e2e/incidents/regression/01.reg_filtering.cy.ts)"
required: true
- name: error-message
description: "The error message from the test failure"
required: true
- name: screenshot-path
description: "Absolute path to the failure screenshot (will be read with multimodal vision)"
required: false
- name: stack-trace
description: "The error stack trace (estack from mochawesome)"
required: false
- name: ci-context
description: "Optional context from /cypress:test-iteration:analyze-ci-results (commit correlation, infra status)"
required: false
parameters:
- name: test-name
description: "Full title of the failing test (from mochawesome 'fullTitle' or Cypress output)"
required: true
- name: spec-file
description: "Path to the spec file (e.g., cypress/e2e/incidents/regression/01.reg_filtering.cy.ts)"
required: true
- name: error-message
description: "The error message from the test failure"
required: true
- name: screenshot-path
description: "Absolute path to the failure screenshot (will be read with multimodal vision)"
required: false
- name: stack-trace
description: "The error stack trace (estack from mochawesome)"
required: false
- name: ci-context
description: "Optional context from /cypress:test-iteration:analyze-ci-results (commit correlation, infra status)"
required: false
- name: stability-context
description: "Compressed prior stability history from test-stability ledger for flaky/persistent-failure correlation"
required: false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md around
lines 4 - 22, The parameters block is missing the stability-context parameter
the orchestrator sends; add a new parameter entry named "stability-context" in
the parameters list (same level as test-name, spec-file, error-message, etc.) so
the handler accepts the orchestrator payload, mark it required: false, and
include a short description like "Context about test stability from the
orchestrator"; update any validation that enumerates accepted parameter names
(if present) to include stability-context and ensure functions that read inputs
(e.g., code that references test-name, spec-file, error-message) can also access
stability-context.

Comment on lines +10 to +11
- a specific spec file path (e.g., "cypress/e2e/incidents/01.incidents.cy.ts")
- a grep pattern for a specific test (e.g., "should filter by severity")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Target resolution is internally inconsistent (grep vs grepTags, and spec path composition).

Line 10 shows target can be a full spec path, but Line 168 prepends cypress/e2e/incidents/{target} (can double-prefix paths).
Line 11 says grep pattern for test title, while Line 169/Line 188 force grepTags, which filters tags, not title text.

🔧 Proposed fix
-| specific file | `cypress/e2e/incidents/{target}` | (none) |
-| grep pattern | `cypress/e2e/incidents/**/*.cy.ts` | (none, use `--env grepTags="{target}"`) |
+| specific file | `{target}` | (none) |
+| grep pattern (test title) | `cypress/e2e/incidents/**/*.cy.ts` | (none, use `--env grep="{target}"`) |
-**IMPORTANT**: Use `grepTags` (not `grep`). The `grep` option searches test names as text, while `grepTags` filters by `@tag` annotations.
+**IMPORTANT**:
+- Use `grepTags` only for `@tag` filtering.
+- Use `grep` for test-title pattern filtering.

Also applies to: 168-169, 188-188

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around
lines 10 - 11, The docs are inconsistent about the "target" and test filtering:
the text says target can be a full spec path or a grep pattern for a test title,
but the implementation/documentation prepends "cypress/e2e/incidents/{target}"
(risking double-prefix) and uses grepTags (which filters tags, not title text).
Fix by making "target" semantics explicit and robust: (1) update the logic/docs
so when composing the spec path you check if target already starts with
"cypress/" before prepending "cypress/e2e/incidents/", and (2) use the correct
filter flag—use "grep" when the intent is to match test titles and "grepTags"
only when matching tag expressions; adjust the relevant mentions/usages of
target, grep, and grepTags in the file so examples and instructions are
consistent. Ensure examples show both a full-spec path and a filename case, and
show separate examples for "grep" (title) vs "grepTags" (tag) usage.

Comment on lines +1 to +20
---
description: Consolidate one or more agentic test iteration branches into a clean, shippable fix branch — analyzes overlaps, merges intelligently, verifies with flakiness probing, and pushes
parameters:
- name: branch-name
description: "Target branch name (e.g. test/OBSINTA-1290-incident-stability). Created from main."
required: true
type: string
- name: source-branches
description: "Comma-separated list of iteration branch names or a glob (e.g. 'FIX-PROPOSAL-AGENTIC/*' or 'branch-a,branch-b,branch-c'). Fetched from origin."
required: true
type: string
- name: flakiness-runs
description: "Number of flakiness probe runs after fixes are applied (default: 3)"
required: false
type: string
- name: test-target
description: "Cypress test target — 'all', 'regression', a spec file, or grepTags pattern (default: all)"
required: false
type: string
---
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Frontmatter is missing required command identifier (name).

Other command specs in this PR define name, but this file does not. That can prevent the command from being discoverable/invokable.

🔧 Proposed fix
 ---
+name: productize-iterations
 description: Consolidate one or more agentic test iteration branches into a clean, shippable fix branch — analyzes overlaps, merges intelligently, verifies with flakiness probing, and pushes
 parameters:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
---
description: Consolidate one or more agentic test iteration branches into a clean, shippable fix branch — analyzes overlaps, merges intelligently, verifies with flakiness probing, and pushes
parameters:
- name: branch-name
description: "Target branch name (e.g. test/OBSINTA-1290-incident-stability). Created from main."
required: true
type: string
- name: source-branches
description: "Comma-separated list of iteration branch names or a glob (e.g. 'FIX-PROPOSAL-AGENTIC/*' or 'branch-a,branch-b,branch-c'). Fetched from origin."
required: true
type: string
- name: flakiness-runs
description: "Number of flakiness probe runs after fixes are applied (default: 3)"
required: false
type: string
- name: test-target
description: "Cypress test target — 'all', 'regression', a spec file, or grepTags pattern (default: all)"
required: false
type: string
---
---
name: productize-iterations
description: Consolidate one or more agentic test iteration branches into a clean, shippable fix branch — analyzes overlaps, merges intelligently, verifies with flakiness probing, and pushes
parameters:
- name: branch-name
description: "Target branch name (e.g. test/OBSINTA-1290-incident-stability). Created from main."
required: true
type: string
- name: source-branches
description: "Comma-separated list of iteration branch names or a glob (e.g. 'FIX-PROPOSAL-AGENTIC/*' or 'branch-a,branch-b,branch-c'). Fetched from origin."
required: true
type: string
- name: flakiness-runs
description: "Number of flakiness probe runs after fixes are applied (default: 3)"
required: false
type: string
- name: test-target
description: "Cypress test target — 'all', 'regression', a spec file, or grepTags pattern (default: all)"
required: false
type: string
---
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/productize-iterations.md around
lines 1 - 20, Add a top-level `name` field to the YAML frontmatter for this
command (between the leading and trailing ---) so the command becomes
discoverable; pick a stable identifier like `productize-iterations` (lowercase,
hyphenated) and add it above the `description` field, then run any command
registry/lint that validates frontmatter to ensure it passes discovery checks
and matches other commands’ naming conventions.

required: false
type: string
- name: test-target
description: "Cypress test target — 'all', 'regression', a spec file, or grepTags pattern (default: all)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

test-target contract and resolution table are inconsistent.

Line 17 says test target can be grepTags pattern, but the table has no grep-pattern row. Also, specific-file mapping prepends cypress/e2e/incidents/{target}, which can corrupt full-path inputs.

Also applies to: 185-190

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/productize-iterations.md at line 17,
The docs for the test-target option are inconsistent and the specific-file
mapping can corrupt full paths: update the description for "test-target" and the
resolution table so they both mention the grepTags/grep-pattern option (add a
"grep-pattern" row to the table), and change the specific-file resolution rule
that prepends "cypress/e2e/incidents/{target}" so it only prepends when the
target is a bare filename (e.g., no '/' and not already starting with
"cypress/"); if the target looks like a full path (contains '/' or starts with
"cypress/") leave it unchanged and document that behavior in the table and the
"test-target" description.

Comment on lines +265 to +266
git remote set-url origin https://$(gh auth token)@github.com/<owner>/<repo>.git
git push origin $1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid embedding gh auth token into git remote URL.

Line 265 persists a credential-bearing remote URL, which can leak via config/logging/history. Prefer ephemeral auth headers or credential helpers instead of rewriting remote with tokenized URL.

🔧 Safer alternative
-git remote set-url origin https://$(gh auth token)@github.com/<owner>/<repo>.git
-git push origin $1
+git -c http.https://github.com/.extraheader="AUTHORIZATION: bearer $(gh auth token)" push origin $1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
git remote set-url origin https://$(gh auth token)@github.com/<owner>/<repo>.git
git push origin $1
git -c http.https://github.com/.extraheader="AUTHORIZATION: bearer $(gh auth token)" push origin $1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/cypress/test-iteration/productize-iterations.md around
lines 265 - 266, Replace the credential-bearing rewrite of the remote URL (the
git remote set-url origin https://$(gh auth token)@github.com/<owner>/<repo>.git
line) with an ephemeral auth header or credential helper when pushing; for
example, remove the set-url step and invoke push using an in-memory header like
using git -c http.extraheader="AUTHORIZATION: bearer $(gh auth token)" push
origin $1 or configure and rely on the Git credential helper / gh auth login
instead of embedding the token in the remote URL.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 16, 2026

@DavidRajnoha: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants