OBSINTA-1287: Agentic test iteration skills and Docker sandbox#879
OBSINTA-1287: Agentic test iteration skills and Docker sandbox#879DavidRajnoha wants to merge 6 commits intoopenshift:mainfrom
Conversation
Add architecture documentation for the agentic test iteration system and a roadmap with future improvement ideas including Slack notifications, cloud execution options, and interaction models. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add four Claude Code skills for agentic test iteration: - /analyze-ci-results: analyze OpenShift CI (Prow) test results - /diagnose-test-failure: diagnose Cypress test failures - /iterate-incident-tests: autonomous local test iteration loop - /iterate-ci-flaky: CI-based flaky test iteration loop Supporting scripts: - poll-ci-status.py: poll Prow CI job status - notify-slack.py: Slack notifications for CI results - review-github.py: GitHub PR comment review flow Also adds test stability ledger for tracking test reliability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a Docker sandbox environment for running Claude Code agents in isolation with controlled blast radius. Includes: - Dockerfile and run script for containerized execution - Security policy (sandbox-policy.yaml) for resource limits - Blast radius analysis documentation - Setup guide and Bun crash workaround report Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename cypress-run.md → run.md, cypress-setup.md → setup.md - Move test development commands to cypress/test-development/ as symlinks - Remove old command symlinks from commands root - Add clean-test-artifacts.sh utility script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@DavidRajnoha: This pull request references OBSINTA-1287 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: DavidRajnoha The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@DavidRajnoha: This pull request references OBSINTA-1287 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughReplaces the old Cypress setup doc with a non-interactive Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Agent as Iterate Agent
participant Git as Git Server
participant Cypress as Cypress Runner
participant Parser as Report Parser
participant Diagnoser as Diagnosis Agent
participant Fixer as Fix Agent
participant Verifier as Validation Step
participant Ledger as Stability Ledger
User->>Agent: /iterate-incident-tests (target, params)
Agent->>Git: create/checkout working branch
Agent->>Ledger: read stability ledger
Agent->>Cypress: run tests (headless, filtered)
Cypress-->>Agent: mochawesome JSON + screenshots
Agent->>Parser: parse results -> failures list
Parser-->>Agent: failing tests
loop per failing test
Agent->>Diagnoser: analyze (screenshot, spec, fixture, error)
Diagnoser-->>Agent: classification + fix proposal
alt fixable
Agent->>Fixer: apply constrained fix
Fixer->>Git: commit changes
Agent->>Cypress: rerun failing tests
Cypress-->>Agent: updated results
Agent->>Verifier: validate (bounded retries)
else blocking
Agent->>User: report blocking issue
end
end
Agent->>Cypress: flakiness probe (N runs)
Cypress-->>Agent: probe results
Agent->>Ledger: update ledger (JSON block + run history)
Agent->>Git: commit ledger updates
Agent-->>User: final structured report
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 10✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 19
🧹 Nitpick comments (20)
.claude/commands/cypress/test-development/fixture-schema-reference.md (1)
1-1: Consider adding a trailing newline.This file (and the other test-development pointer files) lacks a trailing newline at EOF. While this doesn't break functionality for indirection files, POSIX defines a text line as ending with a newline character, and many tools expect this convention.
📝 Proposed fix
-../../../.cursor/commands/fixture-schema-reference.md +../../../.cursor/commands/fixture-schema-reference.md +🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-development/fixture-schema-reference.md at line 1, Add a single trailing newline character to the end of fixture-schema-reference.md so the file ends with a newline (POSIX text file convention); simply ensure the last line is terminated with "\n" and commit the change.web/scripts/clean-test-artifacts.sh (1)
7-8: Add cleanup for the downloads folder.The script cleans
screenshots/andvideos/but notdownloads/. According tocypress.config.ts, thedownloadsFolderis set to'./cypress/downloads', which can accumulate files from download-related tests.📁 Proposed fix to include downloads cleanup
rm -rf "$SCRIPT_DIR/cypress/screenshots/"* rm -rf "$SCRIPT_DIR/cypress/videos/"* +rm -rf "$SCRIPT_DIR/cypress/downloads/"*🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/scripts/clean-test-artifacts.sh` around lines 7 - 8, The cleanup script clean-test-artifacts.sh currently removes screenshots and videos but omits the Cypress downloads folder; update the script to also remove contents of the downloads directory by adding a removal command that targets "$SCRIPT_DIR/cypress/downloads/"* (use the same pattern as the existing rm -rf lines), referencing the SCRIPT_DIR variable and matching the downloadsFolder from cypress.config.ts so downloaded test artifacts are cleaned up.sandbox-policy.yaml (1)
66-73: Avoid hard-coding ephemeral cluster endpoints in policyThe OpenShift endpoint list is tightly bound to one CI lease/environment, which is likely to expire and break reuse.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sandbox-policy.yaml` around lines 66 - 73, The policy currently hard-codes ephemeral CI cluster endpoints (the host entries "10.200.0.1", "*.ci-ln-8bnwfst-76ef8.aws-4.ci.openshift.org", and "console-openshift-console.apps.ci-ln-8bnwfst-76ef8.aws-4.ci.openshift.org") which will expire; change these to stable, parameterized values or well-scoped patterns by removing the specific CI lease names and either use a DNS wildcard that matches the intended domain scope, reference a templated/variable placeholder (e.g., a Helm/Kustomize/ENV variable like {{ .Values.openshift_console }} or ${OPENSHIFT_CONSOLE}) or point to cluster-local service names so the policy is reusable across environments; update the three host entries in sandbox-policy.yaml accordingly and ensure any callers/CI manifests set the variable at runtime..claude/commands/cypress/test-iteration/diagnose-test-failure.md (2)
111-111: Add language specifier to fenced code block.The linter flags this code block as missing a language specifier. Since this is a template for diagnosis output in markdown format, add
markdownas the language.📝 Suggested fix
-``` +```markdown ## Diagnosis🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md at line 111, Add the missing fenced-code language tag by changing the opening ``` before the "## Diagnosis" block to include "markdown" (i.e., use ```markdown); locate the fenced block that contains the "## Diagnosis" heading in diagnose-test-failure.md and update its opening fence so the linter recognizes the block language.
78-81: Path prefix inconsistency with other file references.Line 60 references
web/cypress/views/incidents-page.ts(withweb/prefix), but lines 78-81 referencecypress/support/incidents_prometheus_query_mocks/without theweb/prefix. For consistency and clarity, consider adding theweb/prefix here as well.📝 Suggested fix
-If the error suggests an API/intercept issue, read relevant files in `cypress/support/incidents_prometheus_query_mocks/`: -- `prometheus-mocks.ts` — intercept setup and route matching -- `mock-generators.ts` — response data generation -- `types.ts` — type definitions for fixtures +If the error suggests an API/intercept issue, read relevant files in `web/cypress/support/incidents_prometheus_query_mocks/`: +- `prometheus-mocks.ts` — intercept setup and route matching +- `mock-generators.ts` — response data generation +- `types.ts` — type definitions for fixtures🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md around lines 78 - 81, The documentation has an inconsistent path prefix: earlier it referenced web/cypress/views/incidents-page.ts but the lines mentioning cypress/support/incidents_prometheus_query_mocks/ lack the web/ prefix; update those references (prometheus-mocks.ts, mock-generators.ts, types.ts) to use the same web/ prefix (i.e., web/cypress/support/incidents_prometheus_query_mocks/) so all file paths in this guide are consistent and unambiguous.docs/agentic-development/roadmap/openshell/sandbox-setup-guide.md (1)
87-89: Fix blockquote formatting.The linter flags a blank line inside the blockquote. These two related notes should be combined into a single blockquote block.
📝 Suggested fix
> **Note:** `openshell sandbox upload` treats the destination as a directory and places the source file inside it. So the actual file path inside the sandbox will be `/sandbox/adc.json/application_default_credentials.json`. - -> **Security tradeoff:** The ADC file contains a refresh token (not raw credentials). The agent inside the sandbox can read it. This is a known limitation — see the Troubleshooting section for alternatives. +> +> **Security tradeoff:** The ADC file contains a refresh token (not raw credentials). The agent inside the sandbox can read it. This is a known limitation — see the Troubleshooting section for alternatives.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/agentic-development/roadmap/openshell/sandbox-setup-guide.md` around lines 87 - 89, Combine the two separate blockquote notes into a single blockquote to remove the blank line linter error: merge the "**Note:** `openshell sandbox upload` treats the destination as a directory..." and the "**Security tradeoff:** The ADC file contains a refresh token..." lines into one contiguous blockquote (remove the empty line between them) so both sentences appear under the same blockquote marker and preserve both messages..claude/commands/cypress/test-iteration/scripts/notify-slack.py (2)
206-214: Edge case in remaining time calculation.When
remainingis calculated as 0 or negative (deadline reached), thesleepcall at line 214 will still execute withsleep(1)due tomax(1, remaining). This causes an extra 1-second delay after timeout. Consider checkingremaining > 0before sleeping.📝 Suggested fix
remaining = int(deadline - time.time()) - if remaining > 0: + if remaining > poll_interval: print( f"No reply yet, {remaining}s remaining...", file=sys.stderr, flush=True, ) - - time.sleep(min(poll_interval, max(1, remaining))) + time.sleep(poll_interval) + elif remaining > 0: + time.sleep(remaining)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/scripts/notify-slack.py around lines 206 - 214, The current loop computes remaining = int(deadline - time.time()) and always calls time.sleep(min(poll_interval, max(1, remaining))) which forces a 1s sleep even when remaining <= 0; update the logic around the print and sleep so that you only call time.sleep when remaining > 0 (e.g., after the existing if remaining > 0: print(...) compute sleep_duration = min(poll_interval, remaining) and then time.sleep(sleep_duration)), using the existing variables remaining, poll_interval and deadline to locate the code and avoid the extra 1-second delay after timeout.
97-113: Consider URL scheme validation for webhook.The
webhook_urlcomes from an environment variable and is passed directly tourllib.request.Request. While this is typically set by administrators, validating the URL scheme provides defense in depth against accidental misconfiguration or malicious injection.🛡️ Suggested fix
def send_webhook(webhook_url, blocks): """Option A: Send via incoming webhook.""" + # Validate URL scheme for security + if not webhook_url.startswith(("https://", "http://")): + print(f"Invalid webhook URL scheme: {webhook_url}", file=sys.stderr) + return {"ok": False, "error": "Invalid URL scheme"} + payload = json.dumps({"blocks": blocks}).encode("utf-8")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/scripts/notify-slack.py around lines 97 - 113, The send_webhook function currently accepts webhook_url from env and passes it directly to urllib.request.Request; add a URL scheme validation step before constructing the Request: parse webhook_url (e.g., via urllib.parse.urlparse), ensure scheme is either "https" (preferably) or "http" and reject any other schemes (file:, javascript:, data:, etc.), and if invalid log/print a clear error and return {"ok": False, "error": "invalid webhook URL scheme"} without making the outbound request; keep this validation localized to send_webhook and include webhook_url in the log only as needed (avoid leaking secrets).docs/agentic-development/architecture/test-iteration-system.md (3)
98-101: Path prefix inconsistency.Lines 98-101 reference paths without the
web/prefix (e.g.,cypress/e2e/incidents/), but the actual file paths in the repository includeweb/(e.g.,web/cypress/e2e/incidents/). This inconsistency could cause confusion when the agent attempts to edit files.📝 Suggested fix
Scope — may only edit: -- `cypress/e2e/incidents/**/*.cy.ts` (test files) -- `cypress/fixtures/incident-scenarios/*.yaml` (fixtures) -- `cypress/views/incidents-page.ts` (page object) -- `cypress/support/incidents_prometheus_query_mocks/**` (mock layer) +- `web/cypress/e2e/incidents/**/*.cy.ts` (test files) +- `web/cypress/fixtures/incident-scenarios/*.yaml` (fixtures) +- `web/cypress/views/incidents-page.ts` (page object) +- `web/cypress/support/incidents_prometheus_query_mocks/**` (mock layer)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/agentic-development/architecture/test-iteration-system.md` around lines 98 - 101, Update the documented file paths to match the repository layout by adding the missing "web/" prefix to each listed path: change cypress/e2e/incidents/**/*.cy.ts to web/cypress/e2e/incidents/**/*.cy.ts, cypress/fixtures/incident-scenarios/*.yaml to web/cypress/fixtures/incident-scenarios/*.yaml, cypress/views/incidents-page.ts to web/cypress/views/incidents-page.ts, and cypress/support/incidents_prometheus_query_mocks/** to web/cypress/support/incidents_prometheus_query_mocks/** so the examples in test-iteration-system.md align with actual repo paths.
156-159: Add language specifier for usage examples.The linter flags this code block. Use
textorbashfor command usage examples.📝 Suggested fix
-``` +```bash /cypress:test-iteration:analyze-ci-results ci-url=https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/.../{RUN_ID}/🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/agentic-development/architecture/test-iteration-system.md` around lines 156 - 159, The fenced code blocks containing the example commands starting with "/cypress:test-iteration:analyze-ci-results ci-url=..." are missing a language specifier; update each block to use a language tag like bash (e.g., replace the opening "```" with "```bash" or "```text") and ensure the closing "```" remains, so the examples render and pass the linter.
14-50: Add language specifier for ASCII diagram.The linter flags this code block. For ASCII diagrams, use
textas the language specifier.📝 Suggested fix
-``` +```text User: /cypress:test-iteration:iterate-incident-tests target=regression max-iterations=3🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/agentic-development/architecture/test-iteration-system.md` around lines 14 - 50, The ASCII diagram block beginning with "User: /cypress:test-iteration:iterate-incident-tests target=regression max-iterations=3" needs a language specifier to satisfy the linter; change the opening fence from ``` to ```text so the block reads ```text ... ``` (leave the content and closing fence unchanged) to mark it as plain text ASCII diagram.docs/agentic-development/setup/docker-sandbox-guide.md (1)
115-117: Add language specifier to error message code block.The linter flags this code block as missing a language specifier. Use
textorplaintextfor error message output.📝 Suggested fix
-``` +```text ERROR: GCP ADC credentials not found at ...</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/agentic-development/setup/docker-sandbox-guide.mdaround lines 115 -
117, Update the fenced code block that contains the GCP ADC error message to
include a language specifier (e.g., usetext orplaintext) so the linter
recognizes it as a plain text code block; locate the block in
docs/agentic-development/setup/docker-sandbox-guide.md that currently contains
"ERROR: GCP ADC credentials not found at ..." and change the opening fence to
include the language specifier.</details> </blockquote></details> <details> <summary>.claude/commands/cypress/test-iteration/iterate-ci-flaky.md (2)</summary><blockquote> `142-148`: **Add language specifiers to code blocks.** Multiple code blocks are flagged by the linter for missing language specifiers. These template/output blocks should use appropriate language identifiers. <details> <summary>📝 Suggested fixes</summary> Line 142: Use `text` for the PR summary template ```diff -``` +```text PR #{pr}: {title} ``` Line 156: Use `text` for the stability context template ```diff -``` +```text Stability context (from {N} previous runs): ``` Line 345: Use `text` for the stability report ```diff -``` +```text Stability Report: ``` Line 354: Use `markdown` for the final report template ```diff -``` +```markdown # CI Flaky Test Iteration Report ``` Line 409: Use `markdown` for the table row template ```diff -``` +```markdown | {next_number} | {YYYY-MM-DD} | ci | {branch} | {total_tests} | {passed} | {failed} | {flaky} | {commit_sha} | ``` </details> Also applies to: 156-165, 345-350, 354-389, 409-411 <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In @.claude/commands/cypress/test-iteration/iterate-ci-flaky.md around lines 142
- 148, Several fenced code blocks in the CI flaky iteration template are missing
language specifiers; update the code fence headers for the identified templates
so the linter stops flagging them — specifically, addtext to the "PR #{pr}: {title}" PR summary block and to the "Stability context (from {N} previous runs):" and "Stability Report:" blocks, and addmarkdown to the final report
header "# CI Flaky Test Iteration Report" and the table row template "|
{next_number} | {YYYY-MM-DD} | ci | {branch} | {total_tests} | {passed} |
{failed} | {flaky} | {commit_sha} |"; ensure the same change is applied to the
other occurrences referenced (lines 156-165, 345-350, 354-389, 409-411) so each
triple-backtick fence includes the appropriate language identifier.</details> --- `269-270`: **Path prefix inconsistency with repository structure.** The file paths listed here (e.g., `cypress/e2e/incidents/**`) should include the `web/` prefix to match the actual repository structure (e.g., `web/cypress/e2e/incidents/**`). This inconsistency appears in multiple documentation files in this PR. <details> <summary>📝 Suggested fix</summary> ```diff - May edit: `cypress/e2e/incidents/**`, `cypress/fixtures/incident-scenarios/**`, `cypress/views/incidents-page.ts`, `cypress/support/incidents_prometheus_query_mocks/**` + - May edit: `web/cypress/e2e/incidents/**`, `web/cypress/fixtures/incident-scenarios/**`, `web/cypress/views/incidents-page.ts`, `web/cypress/support/incidents_prometheus_query_mocks/**` ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/iterate-ci-flaky.md around lines 269 - 270, Update the path prefixes in the listed lines of .claude/commands/cypress/test-iteration/iterate-ci-flaky.md so they match the repo layout by prepending "web/" to each path: change "cypress/e2e/incidents/**" to "web/cypress/e2e/incidents/**", "cypress/fixtures/incident-scenarios/**" to "web/cypress/fixtures/incident-scenarios/**", "cypress/views/incidents-page.ts" to "web/cypress/views/incidents-page.ts", "cypress/support/incidents_prometheus_query_mocks/**" to "web/cypress/support/incidents_prometheus_query_mocks/**", and any other listed entries such as "src/**" or cypress config mentions to "web/src/**" or "web/<cypress-config-path>" as appropriate so all documented paths consistently include the web/ prefix. ``` </details> </blockquote></details> <details> <summary>.claude/commands/cypress/test-iteration/scripts/review-github.py (1)</summary><blockquote> `185-186`: **Unused variable `created_at`.** The `created_at` value is unpacked but never used. Prefix with underscore to indicate it's intentionally ignored. <details> <summary>📝 Suggested fix</summary> ```diff - comment_id, created_at = post_comment(args.pr, formatted, args.repo) + comment_id, _created_at = post_comment(args.pr, formatted, args.repo) ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/scripts/review-github.py around lines 185 - 186, The tuple unpacking from post_comment assigns comment_id and created_at but created_at is unused; update the assignment in the post_comment call site (the line assigning comment_id, created_at) to ignore the unused value by renaming it to _created_at (or simply _), e.g., change "comment_id, created_at = post_comment(...)" to "comment_id, _created_at = post_comment(...)" so linters understand the value is intentionally ignored while leaving post_comment and comment_id usage intact. ``` </details> </blockquote></details> <details> <summary>.claude/commands/cypress/test-iteration/analyze-ci-results.md (3)</summary><blockquote> `41-54`: **Specify error handling for parallel metadata fetches.** Step 2 mentions fetching three files in parallel using WebFetch but doesn't specify what should happen if one or more fetches fail. Missing metadata files could indicate an incomplete CI run or incorrect URL. Consider adding error handling guidance: - If `started.json` or `finished.json` is missing → likely incorrect URL or incomplete run, report and stop - If `prowjob.json` is missing → continue but note that PR metadata will be incomplete - Document which fields are critical vs. optional for downstream analysis <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/analyze-ci-results.md around lines 41 - 54, Add explicit error handling when using WebFetch in Step 2 for started.json, finished.json, and prowjob.json: treat started.json and finished.json fetch failures as fatal (log an error that the job root/URL may be wrong and abort analysis), treat prowjob.json failure as non-fatal (continue but mark PR metadata incomplete), and surface which fields are critical (from started.json: timestamp, pull, repos base/PR SHAs; from finished.json: passed, result, revision) versus optional (prowjob.json fields like PR title/author/branch/cluster/duration). Ensure the implementation of the parallel fetch routine records any missing files, returns a clear error for fatal missing files, and annotates downstream data with a flag when prowjob.json is absent. ``` </details> --- `31-33`: **Add URL validation before normalization.** The URL normalization logic assumes well-formed input but doesn't validate the URL pattern before processing. Malformed URLs could cause issues in subsequent WebFetch operations. <details> <summary>💡 Suggested validation step</summary> Add a validation step before normalization: ```markdown ### Step 1: Normalize the URL **First, validate the input URL:** The URL must match one of these patterns: - `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/...` - `https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/...` If the URL doesn't match, report an error: "Invalid CI URL. Expected a gcsweb or Prow URL for an OpenShift monitoring-plugin PR." **Then normalize to the job root:** [existing content continues...] ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/analyze-ci-results.md around lines 31 - 33, Add a validation step before the existing "Normalize the URL" logic: ensure the input matches either the gcsweb pattern (https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/...) or the Prow pattern (https://prow.ci.openshift.org/view/gs/test-platform-results/...), and if not, return/report the error "Invalid CI URL. Expected a gcsweb or Prow URL for an OpenShift monitoring-plugin PR." After validation, continue with the existing normalization (replace the Prow prefix with the gcsweb prefix and append a trailing slash if missing); hook this validation into the same flow that runs the URL normalization step so subsequent WebFetch calls only receive validated, normalized URLs. ``` </details> --- `27-27`: **Add language specifiers to fenced code blocks.** Multiple fenced code blocks lack language specifiers, which reduces syntax highlighting and editor support. Consider adding appropriate language tags (e.g., `text`, `bash`, `json`, `markdown`) to improve readability. <details> <summary>📝 Example fixes for some blocks</summary> Line 27-29 (URL pattern): ```diff -``` +```text https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_monitoring-plugin/{PR}/{JOB}/{RUN_ID}/ ``` ``` Lines 56-66 (summary output): ```diff -``` +```text CI Run Summary: PR: #{PR_NUMBER} - {PR_TITLE} ``` ``` Lines 192-197 (git correlation output): ```diff -``` +```text Commit correlation for {test_name}: ``` ``` Lines 211-261 (final report): ```diff -``` +```markdown # CI Analysis Report ``` ``` </details> Also applies to: 56-56, 77-77, 94-94, 106-106, 117-117, 133-133, 192-192, 200-200, 211-211 <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/analyze-ci-results.md at line 27, Several fenced code blocks in .claude/commands/cypress/test-iteration/analyze-ci-results.md lack language specifiers; add appropriate tags (e.g., text, bash, json, markdown) to each block so editors can syntax-highlight them. Locate the unlabeled blocks referenced in the comment such as the "URL pattern" block (lines showing the GCS URL), the "CI Run Summary" block, the "Commit correlation for {test_name}" block, and the final "# CI Analysis Report" block and update their opening fences to include the correct language (suggest text for plain output blocks and markdown for the report header) and similarly annotate the other listed blocks (56, 77, 94, 106, 117, 133, 192, 200, 211) with appropriate specifiers. ``` </details> </blockquote></details> <details> <summary>.claude/commands/cypress/test-iteration/iterate-incident-tests.md (2)</summary><blockquote> `136-159`: **Consider explicit state tracking for the iteration loop.** The workflow manages complex state across multiple iterations (failure lists, fix attempts, retry counts, stability data) but doesn't specify how this state is persisted between steps. This could lead to state loss if the agent is interrupted or crashes. Consider adding: 1. **A state file** (e.g., `web/cypress/.iteration-state.json`) that tracks: ```json { "iteration": 2, "failures": [...], "fix_attempts": {...}, "retry_counts": {...}, "commit_hashes": [...] } ``` 2. **State checkpoints** after each major step (diagnosis complete, fixes applied, validation complete) 3. **Resume capability**: If the workflow is interrupted, the agent can resume from the last checkpoint instead of starting over This is especially important for the autonomous "run in background" execution model mentioned in Step 5 (line 193). Also applies to: 199-228, 435-484 <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around lines 136 - 159, Add explicit iteration state persistence by writing and checkpointing a JSON state file (e.g., web/cypress/.iteration-state.json) that captures stability_context, iteration number, failures, fix_attempts, retry_counts, and commit_hashes after each major step (diagnosis, fixes applied, validation); implement save_state() and load_state() helpers used wherever stability_context is produced or consumed (Step 2) and around the loop that runs in background (Step 5) so the workflow can resume from the last checkpoint if interrupted, and ensure resume logic checks for an existing state file and restores iteration/failure lists before continuing (used by the Diagnosis Agents in Step 8). ``` </details> --- `142-142`: **Add language specifiers to fenced code blocks.** Multiple fenced code blocks lack language specifiers, reducing readability and editor support. <details> <summary>📝 Suggested language tags</summary> - Line 142: `text` (stability context output) - Line 209: `typescript` or `javascript` (test result object structure) - Line 225: `text` (mochawesome suite structure) - Line 239: `text` (screenshot naming pattern) - Line 280: `markdown` (fix constraints heading) - Line 341: `text` (commit message template) - Line 377: `text` (flakiness report) - Line 401: `markdown` (final report) - Line 448: `text` (table row format) </details> Also applies to: 209-209, 225-225, 239-239, 280-280, 341-341, 377-377, 401-401, 448-448 <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md at line 142, Update the fenced code blocks in .claude/commands/cypress/test-iteration/iterate-incident-tests.md to include proper language specifiers: add `text` to the stability context output block, `typescript` or `javascript` to the test result object structure block, `text` to the mochawesome suite structure and screenshot naming pattern blocks, `markdown` to the fix constraints and final report headings, and `text` to the commit message template, flakiness report, and table row format blocks so editors and linters can correctly highlight and validate those snippets. ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/commands/cypress/setup.md:
- Around line 23-26: The docs change directory with the line "cd web/cypress"
but later commands still reference "web/cypress/export-env.sh"; update those
occurrences (e.g., the commands at the locations referencing
"web/cypress/export-env.sh" around Lines 49-50 and 79-80) to reference
"export-env.sh" (or "./export-env.sh") so they correctly target the script
relative to the already-changed working directory and avoid sourcing/writing the
wrong path.- Around line 51-53: The exported credential lines in export-env.sh
(CYPRESS_BASE_URL, CYPRESS_LOGIN_IDP, CYPRESS_LOGIN_USERS) persist kubeadmin
credentials in plaintext; update export-env.sh to add an explicit “DO NOT COMMIT
/ secrets” header, enforce restrictive file permissions (e.g., create or
instruct to run chmod 600 on the file), and add a .gitignore entry to ignore
this file by name; also include a short comment explaining the secure workflow
(temporary file only, remove after use) and a one-line check in the script that
exits if the file is tracked in git to prevent accidental commits.In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md:
- Around line 90-93: Update the "3. Unsigned Commits" section in
iterate-incident-tests.md to document security implications and prescribe a
mandatory pre-merge validation step: add language in the "final report"
instructions that requires the user to review every generated commit, verify no
malicious changes, and sign the squash-merge with their GPG key; also add a
short threat-model paragraph describing risks if an agent introduces backdoor
code and attribution difficulties for unsigned commits; finally, note
alternative mitigations (use a dedicated agent signing key, require human
approval per commit batch, or run in the Docker sandbox with an automated
commit-verification step before any push) and call out the Docker sandbox as the
recommended place to implement commit verification.- Around line 80-81: Replace the hard-coded absolute paths in the permission and
screenshot commands: change the "Bash(cd
/home/drajnoha/Code/monitoring-plugin:)" and "Bash(find
/home/drajnoha/Code/monitoring-plugin/web/cypress:)" entries to use a
repository-root variable (e.g. "${REPO_ROOT}") or relative paths (e.g. "cd
${REPO_ROOT}" or "find web/cypress/screenshots -name "*.png" -type f" when
running from the repo root or "find ./web/cypress/screenshots ..." from the web/
working dir); also add brief docs noting that REPO_ROOT must be set in the
environment or CI before running the commands.- Around line 99-502: Add explicit execution limits to the orchestrator: define
and check a global wall-clock timeout (e.g., max_wall_clock_time) at the top of
the main orchestration loop (Steps 0–15) and abort with a clear message when
exceeded; enforce a per-test/run timeout by passing/integrating a configured
per_test_timeout into the Cypress invocation (thesource cypress/export-env.sh && npx cypress run ...call in Step 5) and into any retry loops; track total
test executions with a total_run_budget counter (used in Step 12/Step 13) and
stop spawning Diagnosis/Fix agents when exceeded; implement a circuit_breaker
threshold (e.g., circuit_breaker_failure_rate) observed after the first
iteration (Step 5/Step 6) to pause for human review if >50% failures; surface
these config values (max_wall_clock_time, per_test_timeout, total_run_budget,
circuit_breaker_threshold) near the orchestrator entry and check them before
each major loop/agent spawn.In @.claude/commands/cypress/test-iteration/scripts/poll-ci-status.py:
- Around line 89-90: The parsing of attempts and interval from sys.argv in
poll-ci-status.py (variables attempts and interval) lacks validation and can
allow zero/negative or non-integer values; update the code to parse inside
try/except to catch ValueError, coerce to int, and enforce positive bounds
(e.g., attempts >= 1 and interval >= 1), falling back to the existing defaults
(30 and 300) or exiting with a clear error if values are invalid; ensure both
the attempts and interval checks are applied after parsing so negative/zero
values are clamped or rejected before they are used for sleeping/polling.- Around line 54-68: When scanning checks (the checks list) for job_substring,
don't break on the first match; instead filter all entries with
check.get("name","") containing job_substring, pick the most recent run (e.g.,
by check.get("completed_at") or fallback to check.get("created_at") or id) and
evaluate that one; treat terminal_states =
("SUCCESS","FAILURE","CANCELLED","TIMED_OUT","SKIPPED") as CI_COMPLETE (print
CI_COMPLETE state=... url=...), return 0 only for "SUCCESS" and return non-zero
for other terminal states, and remove the current break so the loop continues
polling when no terminal state is found.In
@docs/agentic-development/roadmap/openshell/sandbox-bun-crash-report.md:
- Around line 64-65: Update product name capitalization in the sandbox docs:
change lowercase "claude" to the official "Claude" in the phrase "Create a
claude provider and use-- claudeflag" (and any other occurrences in this
section), and ensure "GitHub" is capitalized where referenced (notably the
occurrences around lines 84–85); make these edits for consistency across the
file while preserving the surrounding text and formatting.- Around line 29-41: The unlabeled fenced code block containing the Bun crash
log should be marked with a language tagtextto satisfy MD040; locate the
triple-backtick block that starts with the Bun crash output (the fenced crash
log block shown in the diff) and change the opening fence fromtotext so
the block is recognized as plain text by the linter.In
@docs/agentic-development/roadmap/test-iteration-ideas.md:
- Line 107: The heading "## Test Stability Ledger" is indented and triggers
markdownlint MD023; remove any leading spaces before the "## Test Stability
Ledger" line so the heading is left-aligned (start the line with "##"
immediately) to satisfy the lint rule and preserve the exact heading text.- Around line 148-150: Several fenced code blocks are missing language labels
(MD040); update each unlabeled triple-backtick block (notably the block
containing "Push fix ──→ [PAUSE: fix_applied] ──→ CI runs (~2h) ──→ [PAUSE:
ci_complete] ──→ Analyze") to include an explicit language: use text for message
examples and bash for shell/command examples, and apply the same labeling to the
other unlabeled blocks called out in the review so all code fences specify
either text or bash as appropriate.In
@sandbox-policy.yaml:
- Around line 11-16: The policy grants broad write access under the read_write
list by including "/home"; narrow this to the minimal required home directory
(e.g. replace "/home" with "/home/sandbox" or another specific user path) so the
read_write entry reduces blast radius; update the read_write array to remove
"/home" and add the exact allowed path (refer to the read_write list and the
"/home" entry).In
@sandbox-start.sh:
- Line 15: Replace the risky global TLS bypass "git config --global
http.sslVerify false" with a scoped change or remove it: either delete that line
entirely, or change it to a repository-scoped command (e.g., use "git -C
config --local http.sslVerify false" or run the config only in a temporary CI
workspace) and add a comment explaining why it's needed and ensure you revert or
limit its scope; reference the exact token "git config --global http.sslVerify
false" when updating the script.- Line 5: Replace the ephemeral install of
@anthropic-ai/claude-code@latest in
the sandbox-start.sh npm install command with a pinned semver (e.g.,
@anthropic-ai/claude-code@) to ensure reproducible sandbox runs;
locate the npm install line that currently contains
"@anthropic-ai/claude-code@latest" and change it to a fixed version (preferably
the same version used elsewhere in the repo, such as in web/package.json), then
commit the updated script.In
@sandbox/Dockerfile:
- Around line 29-30: The RUN that streams openshift-client-linux.tar.gz into tar
should first download and verify the artifact integrity: fetch both the tarball
and its corresponding checksum/signature for openshift-client-linux.tar.gz,
verify the checksum (e.g., SHA256) or validate the signature before extraction,
and only proceed to extract oc and kubectl from openshift-client-linux.tar.gz if
verification succeeds; update the Dockerfile around the RUN that references
openshift-client-linux.tar.gz to perform the download, verification, and
conditional extraction steps so the build fails on checksum/signature mismatch.- Line 33: The Dockerfile RUN that installs
@anthropic-ai/claude-codecurrently
uses the floating tag "@latest" which breaks reproducible image builds; change
the RUN line installing@anthropic-ai/claude-codeto pin a specific version (or
use a build ARG like CLAUDE_CODE_VERSION and reference it in the RUN) so the
installation uses a fixed version instead of "@latest"—update the RUN command
that mentions "@anthropic-ai/claude-code@latest" accordingly.In
@sandbox/run.sh:
- Line 69: Update the printed docs path in sandbox/run.sh: the echo statement
referencing "docs/docker-sandbox-blast-radius.md" should point to the actual
documentation location under
"docs/agentic-development/architecture/security/docker-sandbox-blast-radius.md"
so users are directed to the existing file; locate the echo call in run.sh and
replace the old relative path with the full path above.- Around line 47-52: CURRENT_BRANCH may be empty when the repo is in a detached
HEAD, causing git clone --branch "" to fail; update the script to detect an
empty CURRENT_BRANCH (the variable computed by CURRENT_BRANCH=$(git -C
"$PROJECT_ROOT" branch --show-current)) and provide a fallback: either resolve
the repo's default branch (e.g., query origin's HEAD or use a sensible default
like "main"/"master") and assign that to CURRENT_BRANCH, or exit early with a
clear error message before running git clone --single-branch --branch
"$CURRENT_BRANCH" so the subsequent git clone call never receives an empty
branch name.- Around line 73-84: The docker run line uses an unquoted expansion
${KUBECONFIG_MOUNT:-} which can word-split paths with spaces; replace it with an
arguments array: build a shell array (e.g. DOCKER_OPTS or MOUNTS) and
conditionally append the KUBECONFIG_MOUNT value only when non-empty, then call
docker run with "${DOCKER_OPTS[@]}" between the other fixed flags so the
KUBECONFIG_MOUNT is passed as a single safe argument; update the docker run
invocation that references CONTAINER_NAME and IMAGE_NAME to use the array
element expansion instead of the unquoted ${KUBECONFIG_MOUNT:-}.
Nitpick comments:
In @.claude/commands/cypress/test-development/fixture-schema-reference.md:
- Line 1: Add a single trailing newline character to the end of
fixture-schema-reference.md so the file ends with a newline (POSIX text file
convention); simply ensure the last line is terminated with "\n" and commit the
change.In @.claude/commands/cypress/test-iteration/analyze-ci-results.md:
- Around line 41-54: Add explicit error handling when using WebFetch in Step 2
for started.json, finished.json, and prowjob.json: treat started.json and
finished.json fetch failures as fatal (log an error that the job root/URL may be
wrong and abort analysis), treat prowjob.json failure as non-fatal (continue but
mark PR metadata incomplete), and surface which fields are critical (from
started.json: timestamp, pull, repos base/PR SHAs; from finished.json: passed,
result, revision) versus optional (prowjob.json fields like PR
title/author/branch/cluster/duration). Ensure the implementation of the parallel
fetch routine records any missing files, returns a clear error for fatal missing
files, and annotates downstream data with a flag when prowjob.json is absent.- Around line 31-33: Add a validation step before the existing "Normalize the
URL" logic: ensure the input matches either the gcsweb pattern
(https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/...)
or the Prow pattern
(https://prow.ci.openshift.org/view/gs/test-platform-results/...), and if not,
return/report the error "Invalid CI URL. Expected a gcsweb or Prow URL for an
OpenShift monitoring-plugin PR." After validation, continue with the existing
normalization (replace the Prow prefix with the gcsweb prefix and append a
trailing slash if missing); hook this validation into the same flow that runs
the URL normalization step so subsequent WebFetch calls only receive validated,
normalized URLs.- Line 27: Several fenced code blocks in
.claude/commands/cypress/test-iteration/analyze-ci-results.md lack language
specifiers; add appropriate tags (e.g., text, bash, json, markdown) to each
block so editors can syntax-highlight them. Locate the unlabeled blocks
referenced in the comment such as the "URL pattern" block (lines showing the GCS
URL), the "CI Run Summary" block, the "Commit correlation for {test_name}"
block, and the final "# CI Analysis Report" block and update their opening
fences to include the correct language (suggest text for plain output blocks and
markdown for the report header) and similarly annotate the other listed blocks
(56, 77, 94, 106, 117, 133, 192, 200, 211) with appropriate specifiers.In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md:
- Line 111: Add the missing fenced-code language tag by changing the opening
before the "## Diagnosis" block to include "markdown" (i.e., usemarkdown);
locate the fenced block that contains the "## Diagnosis" heading in
diagnose-test-failure.md and update its opening fence so the linter recognizes
the block language.- Around line 78-81: The documentation has an inconsistent path prefix: earlier
it referenced web/cypress/views/incidents-page.ts but the lines mentioning
cypress/support/incidents_prometheus_query_mocks/ lack the web/ prefix; update
those references (prometheus-mocks.ts, mock-generators.ts, types.ts) to use the
same web/ prefix (i.e., web/cypress/support/incidents_prometheus_query_mocks/)
so all file paths in this guide are consistent and unambiguous.In @.claude/commands/cypress/test-iteration/iterate-ci-flaky.md:
- Around line 142-148: Several fenced code blocks in the CI flaky iteration
template are missing language specifiers; update the code fence headers for the
identified templates so the linter stops flagging them — specifically, addcontext (from {N} previous runs):" and "Stability Report:" blocks, and add ```markdown to the final report header "# CI Flaky Test Iteration Report" and the table row template "| {next_number} | {YYYY-MM-DD} | ci | {branch} | {total_tests} | {passed} | {failed} | {flaky} | {commit_sha} |"; ensure the same change is applied to the other occurrences referenced (lines 156-165, 345-350, 354-389, 409-411) so each triple-backtick fence includes the appropriate language identifier. - Around line 269-270: Update the path prefixes in the listed lines of .claude/commands/cypress/test-iteration/iterate-ci-flaky.md so they match the repo layout by prepending "web/" to each path: change "cypress/e2e/incidents/**" to "web/cypress/e2e/incidents/**", "cypress/fixtures/incident-scenarios/**" to "web/cypress/fixtures/incident-scenarios/**", "cypress/views/incidents-page.ts" to "web/cypress/views/incidents-page.ts", "cypress/support/incidents_prometheus_query_mocks/**" to "web/cypress/support/incidents_prometheus_query_mocks/**", and any other listed entries such as "src/**" or cypress config mentions to "web/src/**" or "web/<cypress-config-path>" as appropriate so all documented paths consistently include the web/ prefix. In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md: - Around line 136-159: Add explicit iteration state persistence by writing and checkpointing a JSON state file (e.g., web/cypress/.iteration-state.json) that captures stability_context, iteration number, failures, fix_attempts, retry_counts, and commit_hashes after each major step (diagnosis, fixes applied, validation); implement save_state() and load_state() helpers used wherever stability_context is produced or consumed (Step 2) and around the loop that runs in background (Step 5) so the workflow can resume from the last checkpoint if interrupted, and ensure resume logic checks for an existing state file and restores iteration/failure lists before continuing (used by the Diagnosis Agents in Step 8). - Line 142: Update the fenced code blocks in .claude/commands/cypress/test-iteration/iterate-incident-tests.md to include proper language specifiers: add `text` to the stability context output block, `typescript` or `javascript` to the test result object structure block, `text` to the mochawesome suite structure and screenshot naming pattern blocks, `markdown` to the fix constraints and final report headings, and `text` to the commit message template, flakiness report, and table row format blocks so editors and linters can correctly highlight and validate those snippets. In @.claude/commands/cypress/test-iteration/scripts/notify-slack.py: - Around line 206-214: The current loop computes remaining = int(deadline - time.time()) and always calls time.sleep(min(poll_interval, max(1, remaining))) which forces a 1s sleep even when remaining <= 0; update the logic around the print and sleep so that you only call time.sleep when remaining > 0 (e.g., after the existing if remaining > 0: print(...) compute sleep_duration = min(poll_interval, remaining) and then time.sleep(sleep_duration)), using the existing variables remaining, poll_interval and deadline to locate the code and avoid the extra 1-second delay after timeout. - Around line 97-113: The send_webhook function currently accepts webhook_url from env and passes it directly to urllib.request.Request; add a URL scheme validation step before constructing the Request: parse webhook_url (e.g., via urllib.parse.urlparse), ensure scheme is either "https" (preferably) or "http" and reject any other schemes (file:, javascript:, data:, etc.), and if invalid log/print a clear error and return {"ok": False, "error": "invalid webhook URL scheme"} without making the outbound request; keep this validation localized to send_webhook and include webhook_url in the log only as needed (avoid leaking secrets). In @.claude/commands/cypress/test-iteration/scripts/review-github.py: - Around line 185-186: The tuple unpacking from post_comment assigns comment_id and created_at but created_at is unused; update the assignment in the post_comment call site (the line assigning comment_id, created_at) to ignore the unused value by renaming it to _created_at (or simply _), e.g., change "comment_id, created_at = post_comment(...)" to "comment_id, _created_at = post_comment(...)" so linters understand the value is intentionally ignored while leaving post_comment and comment_id usage intact. In `@docs/agentic-development/architecture/test-iteration-system.md`: - Around line 98-101: Update the documented file paths to match the repository layout by adding the missing "web/" prefix to each listed path: change cypress/e2e/incidents/**/*.cy.ts to web/cypress/e2e/incidents/**/*.cy.ts, cypress/fixtures/incident-scenarios/*.yaml to web/cypress/fixtures/incident-scenarios/*.yaml, cypress/views/incidents-page.ts to web/cypress/views/incidents-page.ts, and cypress/support/incidents_prometheus_query_mocks/** to web/cypress/support/incidents_prometheus_query_mocks/** so the examples in test-iteration-system.md align with actual repo paths. - Around line 156-159: The fenced code blocks containing the example commands starting with "/cypress:test-iteration:analyze-ci-results ci-url=..." are missing a language specifier; update each block to use a language tag like bash (e.g., replace the opening "```" with "```bash" or "```text") and ensure the closing "```" remains, so the examples render and pass the linter. - Around line 14-50: The ASCII diagram block beginning with "User: /cypress:test-iteration:iterate-incident-tests target=regression max-iterations=3" needs a language specifier to satisfy the linter; change the opening fence from ``` to ```text so the block reads ```text ... ``` (leave the content and closing fence unchanged) to mark it as plain text ASCII diagram. In `@docs/agentic-development/roadmap/openshell/sandbox-setup-guide.md`: - Around line 87-89: Combine the two separate blockquote notes into a single blockquote to remove the blank line linter error: merge the "**Note:** `openshell sandbox upload` treats the destination as a directory..." and the "**Security tradeoff:** The ADC file contains a refresh token..." lines into one contiguous blockquote (remove the empty line between them) so both sentences appear under the same blockquote marker and preserve both messages. In `@docs/agentic-development/setup/docker-sandbox-guide.md`: - Around line 115-117: Update the fenced code block that contains the GCP ADC error message to include a language specifier (e.g., use ```text or ```plaintext) so the linter recognizes it as a plain text code block; locate the block in docs/agentic-development/setup/docker-sandbox-guide.md that currently contains "ERROR: GCP ADC credentials not found at ..." and change the opening fence to include the language specifier. In `@sandbox-policy.yaml`: - Around line 66-73: The policy currently hard-codes ephemeral CI cluster endpoints (the host entries "10.200.0.1", "*.ci-ln-8bnwfst-76ef8.aws-4.ci.openshift.org", and "console-openshift-console.apps.ci-ln-8bnwfst-76ef8.aws-4.ci.openshift.org") which will expire; change these to stable, parameterized values or well-scoped patterns by removing the specific CI lease names and either use a DNS wildcard that matches the intended domain scope, reference a templated/variable placeholder (e.g., a Helm/Kustomize/ENV variable like {{ .Values.openshift_console }} or ${OPENSHIFT_CONSOLE}) or point to cluster-local service names so the policy is reusable across environments; update the three host entries in sandbox-policy.yaml accordingly and ensure any callers/CI manifests set the variable at runtime. In `@web/scripts/clean-test-artifacts.sh`: - Around line 7-8: The cleanup script clean-test-artifacts.sh currently removes screenshots and videos but omits the Cypress downloads folder; update the script to also remove contents of the downloads directory by adding a removal command that targets "$SCRIPT_DIR/cypress/downloads/"* (use the same pattern as the existing rm -rf lines), referencing the SCRIPT_DIR variable and matching the downloadsFolder from cypress.config.ts so downloaded test artifacts are cleaned up.🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID:
ae39984a-7f8b-449e-9358-5243dbe0f792📒 Files selected for processing (32)
.claude/commands/cypress/cypress-setup.md.claude/commands/cypress/run.md.claude/commands/cypress/setup.md.claude/commands/cypress/test-development/fixture-schema-reference.md.claude/commands/cypress/test-development/generate-incident-fixture.md.claude/commands/cypress/test-development/generate-regression-test.md.claude/commands/cypress/test-development/refactor-regression-test.md.claude/commands/cypress/test-development/validate-incident-fixtures.md.claude/commands/cypress/test-iteration/analyze-ci-results.md.claude/commands/cypress/test-iteration/diagnose-test-failure.md.claude/commands/cypress/test-iteration/iterate-ci-flaky.md.claude/commands/cypress/test-iteration/iterate-incident-tests.md.claude/commands/cypress/test-iteration/scripts/notify-slack.py.claude/commands/cypress/test-iteration/scripts/poll-ci-status.py.claude/commands/cypress/test-iteration/scripts/review-github.py.claude/commands/fixture-schema-reference.md.claude/commands/generate-incident-fixture.md.claude/commands/generate-regression-test.md.claude/commands/refactor-regression-test.md.claude/commands/validate-incident-fixtures.mddocs/agentic-development/architecture/security/docker-sandbox-blast-radius.mddocs/agentic-development/architecture/test-iteration-system.mddocs/agentic-development/roadmap/openshell/sandbox-bun-crash-report.mddocs/agentic-development/roadmap/openshell/sandbox-setup-guide.mddocs/agentic-development/roadmap/test-iteration-ideas.mddocs/agentic-development/setup/docker-sandbox-guide.mdsandbox-policy.yamlsandbox-start.shsandbox/Dockerfilesandbox/run.shweb/cypress/reports/test-stability.mdweb/scripts/clean-test-artifacts.sh💤 Files with no reviewable changes (6)
- .claude/commands/refactor-regression-test.md
- .claude/commands/generate-regression-test.md
- .claude/commands/fixture-schema-reference.md
- .claude/commands/generate-incident-fixture.md
- .claude/commands/validate-incident-fixtures.md
- .claude/commands/cypress/cypress-setup.md
| 1. **Navigate to the cypress directory**: | ||
| ```bash | ||
| cd web/cypress | ||
| ``` |
There was a problem hiding this comment.
Path flow is inconsistent after cd web/cypress
After Line 25, commands in Lines 49 and 79 should target export-env.sh directly, not web/cypress/export-env.sh. Current instructions can write/source the wrong location.
🧩 Proposed fix
-cat > web/cypress/export-env.sh << 'ENVEOF'
+cat > export-env.sh << 'ENVEOF'
...
-source web/cypress/export-env.sh
+source export-env.shAlso applies to: 49-50, 79-80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/commands/cypress/setup.md around lines 23 - 26, The docs change
directory with the line "cd web/cypress" but later commands still reference
"web/cypress/export-env.sh"; update those occurrences (e.g., the commands at the
locations referencing "web/cypress/export-env.sh" around Lines 49-50 and 79-80)
to reference "export-env.sh" (or "./export-env.sh") so they correctly target the
script relative to the already-changed working directory and avoid
sourcing/writing the wrong path.
| export CYPRESS_BASE_URL='<console-url>' | ||
| export CYPRESS_LOGIN_IDP='kube:admin' | ||
| export CYPRESS_LOGIN_USERS='kubeadmin:<password>' |
There was a problem hiding this comment.
Add explicit secret-handling safeguards for export-env.sh
These lines persist kubeadmin credentials in plaintext. Add guardrails: restrictive permissions, ignore-from-git, and an explicit “do not commit” warning.
🔐 Proposed addition
+chmod 600 export-env.sh
+# IMPORTANT: export-env.sh contains secrets. Do not commit it.
+# Ensure it is listed in .gitignore.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/commands/cypress/setup.md around lines 51 - 53, The exported
credential lines in export-env.sh (CYPRESS_BASE_URL, CYPRESS_LOGIN_IDP,
CYPRESS_LOGIN_USERS) persist kubeadmin credentials in plaintext; update
export-env.sh to add an explicit “DO NOT COMMIT / secrets” header, enforce
restrictive file permissions (e.g., create or instruct to run chmod 600 on the
file), and add a .gitignore entry to ignore this file by name; also include a
short comment explaining the secure workflow (temporary file only, remove after
use) and a one-line check in the script that exits if the file is tracked in git
to prevent accidental commits.
| "Bash(cd /home/drajnoha/Code/monitoring-plugin:*)", | ||
| "Bash(find /home/drajnoha/Code/monitoring-plugin/web/cypress:*)", |
There was a problem hiding this comment.
Replace hard-coded absolute paths with variables or relative paths.
The permissions configuration and screenshot find command contain hard-coded absolute paths (/home/drajnoha/Code/monitoring-plugin). This makes the skill non-portable and will fail for other users or execution environments (including the Docker sandbox mentioned in the PR).
🔧 Proposed fix
For the permissions block (lines 80-81), use a placeholder or environment variable:
"Bash(source cypress/export-env.sh:*)",
- "Bash(cd /home/drajnoha/Code/monitoring-plugin:*)",
- "Bash(find /home/drajnoha/Code/monitoring-plugin/web/cypress:*)",
+ "Bash(cd ${REPO_ROOT}:*)",
+ "Bash(find ${REPO_ROOT}/web/cypress:*)",
"Bash(ls:*)"Add documentation explaining that ${REPO_ROOT} should be set to the repository root path.
For the find command (line 235), make it relative to the working directory:
```bash
-find /home/drajnoha/Code/monitoring-plugin/web/cypress/screenshots -name "*.png" -type f
+find web/cypress/screenshots -name "*.png" -type f
Since the instructions specify running from the `web/` directory in Step 4, the path should be relative:
```diff
-find /home/drajnoha/Code/monitoring-plugin/web/cypress/screenshots -name "*.png" -type f
+find cypress/screenshots -name "*.png" -type f
Also applies to: 235-235
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around
lines 80 - 81, Replace the hard-coded absolute paths in the permission and
screenshot commands: change the "Bash(cd
/home/drajnoha/Code/monitoring-plugin:*)" and "Bash(find
/home/drajnoha/Code/monitoring-plugin/web/cypress:*)" entries to use a
repository-root variable (e.g. "${REPO_ROOT}") or relative paths (e.g. "cd
${REPO_ROOT}" or "find web/cypress/screenshots -name \"*.png\" -type f" when
running from the repo root or "find ./web/cypress/screenshots ..." from the web/
working dir); also add brief docs noting that REPO_ROOT must be set in the
environment or CI before running the commands.
| ### 3. Unsigned Commits | ||
|
|
||
| All commits in this workflow use `--no-gpg-sign` to avoid GPG passphrase prompts blocking the loop. These unsigned commits live on a working branch and are intended to be **squash-merged** by the user with their own signature when approved. Never push unsigned commits directly to main. | ||
|
|
There was a problem hiding this comment.
Document the security implications and review process for unsigned commits.
The workflow uses --no-gpg-sign to avoid blocking on GPG prompts during autonomous execution. While the text mentions these commits are "intended to be squash-merged by the user with their own signature," this creates a supply-chain risk if the review process isn't followed strictly.
Consider adding explicit safeguards:
-
Add a pre-merge validation step in the final report that reminds the user to:
- Review every generated commit before merging
- Verify no malicious changes were introduced
- Sign the squash-merge commit with their GPG key
-
Document the threat model: What happens if a malicious diagnosis/fix agent introduces backdoor code? The unsigned commits make attribution harder.
-
Consider an alternative approach:
- Use a separate signing key for the autonomous agent
- Or require human approval for each commit batch (defeats the "autonomous" goal but increases security)
- Or run in an isolated environment (Docker sandbox) with commit verification before pushing
The Docker sandbox infrastructure added in this PR might be the right place to implement commit verification before any changes are pushed to the main repository.
Also applies to: 337-338, 483-484
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around
lines 90 - 93, Update the "3. Unsigned Commits" section in
iterate-incident-tests.md to document security implications and prescribe a
mandatory pre-merge validation step: add language in the "final report"
instructions that requires the user to review every generated commit, verify no
malicious changes, and sign the squash-merge with their GPG key; also add a
short threat-model paragraph describing risks if an agent introduces backdoor
code and attribution difficulties for unsigned commits; finally, note
alternative mitigations (use a dedicated agent signing key, require human
approval per commit batch, or run in the Docker sandbox with an automated
commit-verification step before any push) and call out the Docker sandbox as the
recommended place to implement commit verification.
| ## Instructions | ||
|
|
||
| Execute the following steps in order. This is the main orchestrator — it coordinates sub-agents and manages the iteration loop. | ||
|
|
||
| ### Step 0: CI Context (optional) | ||
|
|
||
| If `ci-url` is provided, run `/cypress:test-iteration:analyze-ci-results` first to get CI failure context. | ||
|
|
||
| Capture the CI analysis output: | ||
| - If **all failures are INFRA_***: Report the infrastructure issues to the user and **STOP**. No test changes will help. | ||
| - If **mixed infra + test/code**: Note the infra issues for the user, but proceed with the test/code failures only. | ||
| - If **all test/code**: Proceed. Use the CI diagnosis (commit correlation, screenshots) as context for the local iteration. | ||
|
|
||
| Store the CI analysis as `ci_context` for later reference by diagnosis agents. | ||
|
|
||
| ### Step 1: Branch Setup | ||
|
|
||
| First, check the current branch: | ||
| ```bash | ||
| git rev-parse --abbrev-ref HEAD | ||
| ``` | ||
|
|
||
| **Decision logic:** | ||
| - If `skip-branch` is "true": Stay on the current branch, skip to Step 3. | ||
| - If already on a `test/incident-robustness-*` branch: Stay on it, skip to Step 3. | ||
| - If on any other non-main working branch (e.g., `agentic-test-iteration`, a feature branch): Ask the user whether to create a child branch or work on the current one. | ||
| - If on `main`: Create a new branch. | ||
|
|
||
| To create a branch (only when needed): | ||
| ```bash | ||
| git checkout -b test/incident-robustness-$(date +%Y-%m-%d) | ||
| ``` | ||
|
|
||
| If that branch name already exists, append a suffix: `-2`, `-3`, etc. | ||
|
|
||
| **IMPORTANT**: Do NOT combine `cd` and `git` in the same command — compound `cd && git` commands trigger a security approval prompt that blocks autonomous execution. Always use separate Bash calls, or set the working directory before running git. | ||
|
|
||
| ### Step 2: Read Stability Ledger | ||
|
|
||
| Read `web/cypress/reports/test-stability.md` and parse the JSON block between `STABILITY_DATA_START` and `STABILITY_DATA_END`. | ||
|
|
||
| Extract a compressed summary for use during diagnosis: | ||
|
|
||
| ``` | ||
| Stability context (from {N} previous runs): | ||
| Known flaky: | ||
| - "test full title" — passed {X}/{Y} runs, last failure: {date}, reason: {reason} | ||
| - "test full title" — passed {X}/{Y} runs, last failure: {date}, reason: {reason} | ||
| Recently fixed: | ||
| - "test full title" — fixed by {commit}, stable since {date} | ||
| Persistent failures: | ||
| - "test full title" — failed {X}/{Y} runs, never fixed | ||
| ``` | ||
|
|
||
| If the ledger has no data yet (first run), note "No prior stability data" and continue. | ||
|
|
||
| Store this as `stability_context` — it will be passed to Diagnosis Agents in Step 8 to help them: | ||
| - Prioritize known-flaky tests (these are higher value to fix) | ||
| - Avoid re-attempting fixes that already failed in previous runs | ||
| - Distinguish new failures from recurring ones | ||
|
|
||
| ### Step 3: Resolve Target | ||
|
|
||
| Based on the `target` parameter, determine the Cypress run command: | ||
|
|
||
| | Target | Spec | Grep Tags | | ||
| |--------|------|-----------| | ||
| | `all` | `cypress/e2e/incidents/**/*.cy.ts` | `@incidents --@e2e-real --@flaky --@demo` | | ||
| | `regression` | `cypress/e2e/incidents/regression/**/*.cy.ts` | `@incidents --@e2e-real --@flaky` | | ||
| | specific file | `cypress/e2e/incidents/{target}` | (none) | | ||
| | grep pattern | `cypress/e2e/incidents/**/*.cy.ts` | (none, use `--env grepTags="{target}"`) | | ||
|
|
||
| ### Step 4: Clean Previous Results | ||
|
|
||
| From the `web/` directory: | ||
| ```bash | ||
| bash scripts/clean-test-artifacts.sh | ||
| ``` | ||
|
|
||
| ### Step 5: Run Tests | ||
|
|
||
| Execute Cypress inline (NOT in a separate terminal). From the `web/` directory: | ||
|
|
||
| ```bash | ||
| source cypress/export-env.sh && npx cypress run --spec "{SPEC}" --env grepTags="{GREP_TAGS}" | ||
| ``` | ||
|
|
||
| Where `{GREP_TAGS}` comes from the "Grep Tags" column in the target table above. If the target has no grep tags, omit the `--env` flag entirely. | ||
|
|
||
| **IMPORTANT**: Use `grepTags` (not `grep`). The `grep` option searches test names as text, while `grepTags` filters by `@tag` annotations. Using `grep` with tag strings like `@incidents --@e2e-real` will match nothing and cause all specs to run unfiltered. | ||
|
|
||
| Note: `source && npx` is one logical operation (env setup + run) and is acceptable as a single command. | ||
|
|
||
| **IMPORTANT**: Tests can run for a long time, especially e2e tests that wait for alerts to fire. Use `run_in_background` to avoid blocking, and check the output when notified of completion. | ||
|
|
||
| Capture the exit code: | ||
| - `0` = all passed | ||
| - non-zero = failures occurred | ||
|
|
||
| ### Step 6: Parse Results | ||
|
|
||
| Merge mochawesome reports and parse. From the `web/` directory: | ||
|
|
||
| ```bash | ||
| npx mochawesome-merge screenshots/cypress_report_*.json -o screenshots/merged-report.json | ||
| ``` | ||
|
|
||
| Read `screenshots/merged-report.json` and extract: | ||
|
|
||
| For each test: | ||
| ``` | ||
| { | ||
| spec_file: string, // from results[].fullFile | ||
| suite: string, // from suites[].title | ||
| test_name: string, // from tests[].title | ||
| full_title: string, // from tests[].fullTitle | ||
| state: "passed" | "failed" | "skipped", | ||
| error_message: string, // from tests[].err.message (if failed) | ||
| stack_trace: string, // from tests[].err.estack (if failed) | ||
| duration_ms: number // from tests[].duration | ||
| } | ||
| ``` | ||
|
|
||
| Build a failure list and a pass list. | ||
|
|
||
| **Note**: Mochawesome JSON has nested suites. Walk the tree recursively: | ||
| ``` | ||
| results[] -> suites[] -> tests[] | ||
| -> suites[] -> tests[] (nested suites) | ||
| ``` | ||
|
|
||
| ### Step 7: Identify Screenshots | ||
|
|
||
| For each failure, find the corresponding screenshot: | ||
|
|
||
| ```bash | ||
| find /home/drajnoha/Code/monitoring-plugin/web/cypress/screenshots -name "*.png" -type f | ||
| ``` | ||
|
|
||
| Match screenshots to failures using the naming convention: | ||
| ``` | ||
| {Suite Name} -- {Test Title} (failed).png | ||
| {Suite Name} -- {Test Title} -- before all hook (failed).png | ||
| ``` | ||
|
|
||
| ### Step 8: Diagnosis Loop | ||
|
|
||
| **If no failures** (exit code 0): Skip to Step 13 (flakiness probe). | ||
|
|
||
| **If failures exist**: For each failing test, spawn a **Diagnosis Agent** (Explore-type sub-agent). | ||
|
|
||
| Use the `/cypress:test-iteration:diagnose-test-failure` skill prompt. Provide: | ||
| - `test-name`: the full title | ||
| - `spec-file`: the spec file path | ||
| - `error-message`: the error message | ||
| - `screenshot-path`: absolute path to the failure screenshot | ||
| - `stack-trace`: the error stack trace | ||
| - `ci-context`: any relevant context from Step 0 | ||
| - `stability-context`: the compressed stability summary from Step 2 (prior flakiness history, previous failure patterns, previously attempted fixes) | ||
|
|
||
| **Parallelization**: If failures are in **different spec files**, spawn diagnosis agents in parallel. If they're in the **same spec file**, diagnose sequentially (they may share root causes like a broken `before all` hook). | ||
|
|
||
| **Before-all hook failures**: If a `before all` hook failed, all tests in that suite were skipped. Diagnose only the hook failure — fixing it will unblock all skipped tests. | ||
|
|
||
| Collect all diagnoses. Separate into: | ||
| - **Fixable**: `TEST_BUG`, `FIXTURE_ISSUE`, `PAGE_OBJECT_GAP`, `MOCK_ISSUE` | ||
| - **Blocking**: `REAL_REGRESSION`, `INFRA_ISSUE` | ||
|
|
||
| If any **blocking** issues found: Report them to the user. Continue fixing the fixable issues. | ||
|
|
||
| ### Step 9: Fix Loop | ||
|
|
||
| For each fixable failure, spawn a **Fix Agent** (general-purpose sub-agent). | ||
|
|
||
| Provide the Fix Agent with: | ||
| 1. The full diagnosis from Step 8 | ||
| 2. The test file content (read it) | ||
| 3. The page object content (read `cypress/views/incidents-page.ts`) | ||
| 4. The fixture content (if relevant) | ||
| 5. These constraints: | ||
|
|
||
| ``` | ||
| ## Fix Constraints | ||
|
|
||
| You may ONLY edit files in these paths: | ||
| - web/cypress/e2e/incidents/**/*.cy.ts (test files) | ||
| - web/cypress/fixtures/incident-scenarios/*.yaml (fixtures) | ||
| - web/cypress/views/incidents-page.ts (page object) | ||
| - web/cypress/support/incidents_prometheus_query_mocks/** (mock layer) | ||
|
|
||
| You must NOT edit: | ||
| - web/src/** (source code — that's Phase 2) | ||
| - Non-incident test files | ||
| - Cypress config or support infrastructure | ||
| - Any file outside the web/ directory | ||
|
|
||
| ## Fix Guidelines | ||
|
|
||
| - Prefer the minimal change that fixes the issue | ||
| - Don't refactor surrounding code — only fix the failing test | ||
| - If adding a wait/timeout, prefer Cypress retry-ability (.should()) over cy.wait() | ||
| - If fixing a selector, check that the new selector exists in the current DOM | ||
| by reading the relevant React component in src/ (read-only, don't edit) | ||
| - If fixing a fixture, validate it against the fixture schema | ||
| (run /cypress:test-development:validate-incident-fixtures mentally or reference the schema) | ||
| - If adding a page object method, follow existing naming conventions | ||
| ``` | ||
|
|
||
| After the Fix Agent returns, verify the fix makes sense: | ||
| - Does the edit address the diagnosed root cause? | ||
| - Could the edit break other tests? | ||
| - Is it the minimal change needed? | ||
|
|
||
| If the fix looks wrong, re-diagnose with additional context. | ||
|
|
||
| ### Step 10: Validate Fixes | ||
|
|
||
| After applying fixes, re-run **only the previously failing tests**: | ||
|
|
||
| From the `web/` directory: | ||
| ```bash | ||
| source cypress/export-env.sh && npx cypress run --spec "{SPEC}" --env grepTags="{FAILING_TEST_TAGS}" | ||
| ``` | ||
|
|
||
| For each test: | ||
| - **Now passes**: Stage the fix files with `git add` | ||
| - **Still fails**: Re-diagnose (increment retry counter). Max 2 retries per test. | ||
| - **After 2 retries still failing**: Mark as `UNRESOLVED` and report to user | ||
|
|
||
| ### Step 11: Commit Batch | ||
|
|
||
| After all fixable failures are addressed (or max retries reached): | ||
|
|
||
| Stage and commit as separate commands (never chain `cd && git`): | ||
| ```bash | ||
| git add <fixed-files> | ||
| ``` | ||
| ```bash | ||
| git commit --no-gpg-sign -m "<message>" | ||
| ``` | ||
|
|
||
| Commit message format: | ||
| ``` | ||
| fix(tests): <summary of what was fixed> | ||
|
|
||
| - <file>: <change description> | ||
| - <file>: <change description> | ||
|
|
||
| Classifications: N TEST_BUG, N FIXTURE_ISSUE, N PAGE_OBJECT_GAP, N MOCK_ISSUE | ||
| Unresolved: N (if any) | ||
|
|
||
| Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> | ||
| ``` | ||
|
|
||
| Track commit count. If commit count reaches **5**: Notify the user that the review threshold has been reached and ask whether to continue or pause for review. | ||
|
|
||
| ### Step 12: Iterate | ||
|
|
||
| If there were failures and `current_iteration < max-iterations`: | ||
| - Increment iteration counter | ||
| - Go back to **Step 4** (clean results and re-run) | ||
|
|
||
| This catches cascading fixes — e.g., fixing a `before all` hook unblocks skipped tests that may have their own issues. | ||
|
|
||
| If all tests pass: Proceed to Step 13. | ||
|
|
||
| ### Step 13: Flakiness Probe | ||
|
|
||
| Run the full target test suite `flakiness-runs` times (default: 3), even if everything is green. | ||
|
|
||
| For each run: | ||
| 1. Clean previous results (Step 4) | ||
| 2. Run tests (Step 5) | ||
| 3. Parse results (Step 6) | ||
| 4. Record per-test pass/fail | ||
|
|
||
| After all runs, compute flakiness: | ||
|
|
||
| ``` | ||
| Flakiness Report: | ||
| Total tests: N | ||
| Stable (all runs passed): N | ||
| Flaky (some runs failed): N | ||
| Broken (all runs failed): N | ||
|
|
||
| Flaky tests: | ||
| - "test name" — passed 2/3 runs | ||
| Error on failure: <error message> | ||
| - "test name" — passed 1/3 runs | ||
| Error on failure: <error message> | ||
| ``` | ||
|
|
||
| For each **flaky** test: | ||
| - Diagnose it using `/cypress:test-iteration:diagnose-test-failure` with the context that it's intermittent | ||
| - Common flaky patterns: race conditions, animation timing, network mock timing, DOM detach/reattach | ||
| - Apply fix if confident (add `.should('exist')` guards, use `{ timeout: N }`, avoid `.eq(N)` on dynamic lists) | ||
| - Re-run flakiness probe on just the fixed tests to verify | ||
|
|
||
| ### Step 14: Final Report | ||
|
|
||
| Output a summary: | ||
|
|
||
| ``` | ||
| # Iteration Complete | ||
|
|
||
| ## Branch: test/incident-robustness-YYYY-MM-DD | ||
| ## Commits: N | ||
| ## Iterations: N | ||
|
|
||
| ## Results | ||
| - Tests run: N | ||
| - Passing: N | ||
| - Fixed in this session: N | ||
| - Unresolved: N (details below) | ||
| - Flaky (stabilized): N | ||
| - Flaky (remaining): N | ||
|
|
||
| ## Fixes Applied | ||
| 1. [commit-sha] fix(tests): <summary> | ||
| - <file>: <change> | ||
|
|
||
| 2. [commit-sha] fix(tests): <summary> | ||
| - <file>: <change> | ||
|
|
||
| ## Unresolved Issues | ||
| - "test name": REAL_REGRESSION — <description>. Source file X was modified in PR #N. | ||
| - "test name": UNRESOLVED after 2 retries — <last error> | ||
|
|
||
| ## Remaining Flakiness | ||
| - "test name": 2/3 passed — timing issue in chart rendering, needs investigation | ||
|
|
||
| ## Recommendations | ||
| - [Next steps for unresolved issues] | ||
| - [Whether to merge current fixes or wait] | ||
| ``` | ||
|
|
||
| ### Step 15: Update Stability Ledger | ||
|
|
||
| After the final report, update `web/cypress/reports/test-stability.md`. | ||
|
|
||
| Read the file and update both sections: | ||
|
|
||
| **1. Current Status table** — for each test in this run: | ||
| - If test already in table: update pass rate (rolling average across all recorded runs), update trend | ||
| - If test is new: add a row | ||
| - Pass rate = total passes / total runs across all recorded iterations | ||
| - Trend: compare last 3 runs — improving / stable / degrading | ||
|
|
||
| **2. Run History log** — append a new row: | ||
| ``` | ||
| | {next_number} | {YYYY-MM-DD} | local | {branch} | {total_tests} | {passed} | {failed} | {flaky} | {commit_sha} | | ||
| ``` | ||
|
|
||
| **3. Machine-readable data** — update the JSON block between `STABILITY_DATA_START` and `STABILITY_DATA_END`: | ||
| ```json | ||
| { | ||
| "tests": { | ||
| "test full title": { | ||
| "results": ["pass", "pass", "fail", "pass"], | ||
| "last_failure_reason": "Timed out...", | ||
| "last_failure_date": "2026-03-23", | ||
| "fixed_by": "abc1234" | ||
| } | ||
| }, | ||
| "runs": [ | ||
| { | ||
| "date": "2026-03-23", | ||
| "type": "local", | ||
| "branch": "test/incident-robustness-2026-03-23", | ||
| "total": 15, | ||
| "passed": 15, | ||
| "failed": 0, | ||
| "flaky": 0, | ||
| "commit": "abc1234" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| Commit the ledger update together with the final batch of fixes if any, or as a standalone commit: | ||
| ```bash | ||
| git add web/cypress/reports/test-stability.md | ||
| ``` | ||
| ```bash | ||
| git commit --no-gpg-sign -m "docs: update test stability ledger — {passed}/{total} passed, {flaky} flaky" | ||
| ``` | ||
|
|
||
| ### Error Handling | ||
|
|
||
| - **Cypress crashes** (not just test failures): Check if it's an OOM issue (`--max-old-space-size`), a missing dependency, or a config problem. Report to user. | ||
| - **No `export-env.sh`**: Create it directly using the cluster credentials from the conversation. Do NOT use the interactive `/cypress:setup`. | ||
| - **No mochawesome reports generated**: Check if the reporter config is correct. Fall back to parsing Cypress console output. | ||
| - **Git conflicts**: If the working branch has conflicts with changes, report to user and stop. | ||
| - **Sub-agent failure**: If a Diagnosis or Fix agent fails, log the error and skip that test. Don't let one broken agent block the whole loop. | ||
|
|
||
| ### Guardrails | ||
|
|
||
| - **Never edit source code** (`src/`) in Phase 1 | ||
| - **Never disable a test** — if a test can't be fixed, mark it as unresolved, don't add `.skip()` | ||
| - **Never add `@flaky` tag** to a test — that's a human decision | ||
| - **Never change test assertions to match wrong behavior** — if the UI is wrong, it's a REAL_REGRESSION | ||
| - **Max 2 retries per test** to avoid infinite loops | ||
| - **Max 5 commits before pausing** for user review | ||
| - **Always run flakiness probe** before declaring success |
There was a problem hiding this comment.
Add timeout and resource limits to prevent runaway autonomous execution.
The autonomous workflow can potentially run indefinitely with nested loops (max iterations × max retries × flakiness runs). There's no global timeout, memory limit, or total execution time budget specified.
Consider adding:
- Global timeout: Maximum wall-clock time for the entire workflow (e.g., 4 hours)
- Per-test timeout: Maximum time for a single test run (inherited from Cypress config, but should be documented)
- Total resource budget: Maximum number of test executions (e.g., max_iterations × flakiness_runs × 2 should not exceed 50 total runs)
- Circuit breaker: If >50% of tests fail in iteration 1, pause for human review instead of continuing
These limits protect against:
- Infinite loops due to bugs in the orchestration logic
- Resource exhaustion on the execution environment
- Runaway costs in cloud execution scenarios (mentioned in the PR's architecture docs)
🧰 Tools
🪛 LanguageTool
[style] ~125-~125: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ld branch or work on the current one. - If on main: Create a new branch. To cre...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~267-~267: Consider using a different verb for a more formal wording.
Context: ...ound: Report them to the user. Continue fixing the fixable issues. ### Step 9: Fix Lo...
(FIX_RESOLVE)
🪛 markdownlint-cli2 (0.22.0)
[warning] 142-142: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 209-209: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 225-225: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 239-239: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 280-280: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 341-341: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 377-377: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 401-401: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 448-448: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around
lines 99 - 502, Add explicit execution limits to the orchestrator: define and
check a global wall-clock timeout (e.g., max_wall_clock_time) at the top of the
main orchestration loop (Steps 0–15) and abort with a clear message when
exceeded; enforce a per-test/run timeout by passing/integrating a configured
per_test_timeout into the Cypress invocation (the `source cypress/export-env.sh
&& npx cypress run ...` call in Step 5) and into any retry loops; track total
test executions with a total_run_budget counter (used in Step 12/Step 13) and
stop spawning Diagnosis/Fix agents when exceeded; implement a circuit_breaker
threshold (e.g., circuit_breaker_failure_rate) observed after the first
iteration (Step 5/Step 6) to pause for human review if >50% failures; surface
these config values (max_wall_clock_time, per_test_timeout, total_run_budget,
circuit_breaker_threshold) near the orchestrator entry and check them before
each major loop/agent spawn.
| RUN curl -fsSL https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz \ | ||
| | tar xz -C /usr/local/bin oc kubectl |
There was a problem hiding this comment.
Verify OpenShift client artifact integrity before extracting
Lines 29–30 stream a remote tarball directly into tar without checksum/signature validation. That is a supply-chain gap for a privileged build step.
🛡️ Proposed fix
-RUN curl -fsSL https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz \
- | tar xz -C /usr/local/bin oc kubectl
+RUN set -euo pipefail; \
+ OC_URL="https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz"; \
+ curl -fsSL "$OC_URL" -o /tmp/openshift-client-linux.tar.gz; \
+ curl -fsSL "${OC_URL}.sha256" -o /tmp/openshift-client-linux.tar.gz.sha256; \
+ sha256sum -c /tmp/openshift-client-linux.tar.gz.sha256; \
+ tar xzf /tmp/openshift-client-linux.tar.gz -C /usr/local/bin oc kubectl; \
+ rm -f /tmp/openshift-client-linux.tar.gz /tmp/openshift-client-linux.tar.gz.sha256📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RUN curl -fsSL https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz \ | |
| | tar xz -C /usr/local/bin oc kubectl | |
| RUN set -euo pipefail; \ | |
| OC_URL="https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz"; \ | |
| curl -fsSL "$OC_URL" -o /tmp/openshift-client-linux.tar.gz; \ | |
| curl -fsSL "${OC_URL}.sha256" -o /tmp/openshift-client-linux.tar.gz.sha256; \ | |
| sha256sum -c /tmp/openshift-client-linux.tar.gz.sha256; \ | |
| tar xzf /tmp/openshift-client-linux.tar.gz -C /usr/local/bin oc kubectl; \ | |
| rm -f /tmp/openshift-client-linux.tar.gz /tmp/openshift-client-linux.tar.gz.sha256 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sandbox/Dockerfile` around lines 29 - 30, The RUN that streams
openshift-client-linux.tar.gz into tar should first download and verify the
artifact integrity: fetch both the tarball and its corresponding
checksum/signature for openshift-client-linux.tar.gz, verify the checksum (e.g.,
SHA256) or validate the signature before extraction, and only proceed to extract
oc and kubectl from openshift-client-linux.tar.gz if verification succeeds;
update the Dockerfile around the RUN that references
openshift-client-linux.tar.gz to perform the download, verification, and
conditional extraction steps so the build fails on checksum/signature mismatch.
| | tar xz -C /usr/local/bin oc kubectl | ||
|
|
||
| # Install Claude Code via npm (uses Node.js, not Bun) | ||
| RUN npm install -g @anthropic-ai/claude-code@latest |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "Dockerfile" -type fRepository: openshift/monitoring-plugin
Length of output: 129
🏁 Script executed:
cat -n sandbox/Dockerfile | head -50Repository: openshift/monitoring-plugin
Length of output: 2076
🏁 Script executed:
rg "npm install" --type dockerfileRepository: openshift/monitoring-plugin
Length of output: 105
🏁 Script executed:
rg "npm install" --type jsonRepository: openshift/monitoring-plugin
Length of output: 53
🏁 Script executed:
rg "npm install" --type-list | grep -i dockerRepository: openshift/monitoring-plugin
Length of output: 460
🏁 Script executed:
rg "npm install" sandbox/Repository: openshift/monitoring-plugin
Length of output: 141
🏁 Script executed:
rg "@latest" sandbox/Repository: openshift/monitoring-plugin
Length of output: 141
🏁 Script executed:
find . -name "package.json" -type f -exec cat -n {} + | head -100Repository: openshift/monitoring-plugin
Length of output: 7385
🏁 Script executed:
cat sandbox/Dockerfile | grep -E "(ARG|FROM|npm|RUN)" | head -20Repository: openshift/monitoring-plugin
Length of output: 669
Pin Claude Code version in image builds
Line 33 uses @latest, which prevents reproducible builds and can cause drift in deployed images, leading to unexplained CI/sandbox regressions.
📌 Proposed fix
-RUN npm install -g `@anthropic-ai/claude-code`@latest
+ARG CLAUDE_CODE_VERSION=<pin-known-good-version>
+RUN npm install -g "@anthropic-ai/claude-code@${CLAUDE_CODE_VERSION}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sandbox/Dockerfile` at line 33, The Dockerfile RUN that installs
`@anthropic-ai/claude-code` currently uses the floating tag "@latest" which breaks
reproducible image builds; change the RUN line installing
`@anthropic-ai/claude-code` to pin a specific version (or use a build ARG like
CLAUDE_CODE_VERSION and reference it in the RUN) so the installation uses a
fixed version instead of "@latest"—update the RUN command that mentions
"@anthropic-ai/claude-code@latest" accordingly.
| CURRENT_BRANCH=$(git -C "$PROJECT_ROOT" branch --show-current) | ||
| REMOTE_URL=$(git -C "$PROJECT_ROOT" remote get-url origin 2>/dev/null || true) | ||
|
|
||
| echo "Cloning branch '$CURRENT_BRANCH' into $SANDBOX_DIR..." | ||
| git clone --single-branch --branch "$CURRENT_BRANCH" --depth 50 "$PROJECT_ROOT" "$SANDBOX_DIR" | ||
|
|
There was a problem hiding this comment.
Handle detached HEAD before cloning by branch
If Line 47 returns an empty branch (detached HEAD), Line 51 fails (--branch ""). Add a fallback path (e.g., clone default branch or fail early with a clear error).
🧭 Proposed fix
CURRENT_BRANCH=$(git -C "$PROJECT_ROOT" branch --show-current)
+if [ -z "$CURRENT_BRANCH" ]; then
+ echo "ERROR: repository is in detached HEAD; cannot determine branch for sandbox clone"
+ echo "Please checkout a branch and retry."
+ exit 1
+fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sandbox/run.sh` around lines 47 - 52, CURRENT_BRANCH may be empty when the
repo is in a detached HEAD, causing git clone --branch "" to fail; update the
script to detect an empty CURRENT_BRANCH (the variable computed by
CURRENT_BRANCH=$(git -C "$PROJECT_ROOT" branch --show-current)) and provide a
fallback: either resolve the repo's default branch (e.g., query origin's HEAD or
use a sensible default like "main"/"master") and assign that to CURRENT_BRANCH,
or exit early with a clear error message before running git clone
--single-branch --branch "$CURRENT_BRANCH" so the subsequent git clone call
never receives an empty branch name.
| echo " Kubeconfig: $([ -n "${KUBECONFIG_MOUNT:-}" ] && echo 'mounted (read-only)' || echo 'NOT SET')" | ||
| echo "" | ||
| echo " Filesystem: Only worktree is writable. Host is isolated." | ||
| echo " See docs/docker-sandbox-blast-radius.md for full security analysis." |
There was a problem hiding this comment.
Fix printed documentation path
Line 69 references docs/docker-sandbox-blast-radius.md, but the documented file path is under docs/agentic-development/architecture/security/. The current output can send users to a non-existent path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sandbox/run.sh` at line 69, Update the printed docs path in sandbox/run.sh:
the echo statement referencing "docs/docker-sandbox-blast-radius.md" should
point to the actual documentation location under
"docs/agentic-development/architecture/security/docker-sandbox-blast-radius.md"
so users are directed to the existing file; locate the echo call in run.sh and
replace the old relative path with the full path above.
| docker run -it --rm \ | ||
| --name "$CONTAINER_NAME" \ | ||
| -v "${SANDBOX_DIR}:/sandbox" \ | ||
| -v "${ADC_PATH}:/tmp/adc.json:ro" \ | ||
| ${KUBECONFIG_MOUNT:-} \ | ||
| -e "GOOGLE_APPLICATION_CREDENTIALS=/tmp/adc.json" \ | ||
| -e "CLAUDE_CODE_USE_VERTEX=1" \ | ||
| -e "ANTHROPIC_VERTEX_PROJECT_ID=$VERTEX_PROJECT" \ | ||
| -e "CLOUD_ML_REGION=$VERTEX_REGION" \ | ||
| -e "KUBECONFIG=/tmp/kubeconfig" \ | ||
| -e "GITHUB_TOKEN=${GITHUB_TOKEN:-}" \ | ||
| "$IMAGE_NAME" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "run.sh" -path "*/sandbox/*" | head -5Repository: openshift/monitoring-plugin
Length of output: 87
🏁 Script executed:
wc -l sandbox/run.sh 2>/dev/null || echo "File not found at that path"Repository: openshift/monitoring-plugin
Length of output: 88
🏁 Script executed:
cat -n sandbox/run.sh | sed -n '70,90p'Repository: openshift/monitoring-plugin
Length of output: 665
🏁 Script executed:
grep -n "KUBECONFIG_MOUNT\|KUBECONFIG_PATH" sandbox/run.shRepository: openshift/monitoring-plugin
Length of output: 486
Use an argument array for optional mounts to prevent word-splitting issues
Line 77 expands ${KUBECONFIG_MOUNT:-} unquoted in the docker command. This is unsafe because if the path contains spaces, word-splitting will occur and break the mount specification. Using an array pattern ensures proper argument handling.
🛠️ Proposed fix
+DOCKER_ARGS=(
+ -it --rm
+ --name "$CONTAINER_NAME"
+ -v "${SANDBOX_DIR}:/sandbox"
+ -v "${ADC_PATH}:/tmp/adc.json:ro"
+ -e "GOOGLE_APPLICATION_CREDENTIALS=/tmp/adc.json"
+ -e "CLAUDE_CODE_USE_VERTEX=1"
+ -e "ANTHROPIC_VERTEX_PROJECT_ID=$VERTEX_PROJECT"
+ -e "CLOUD_ML_REGION=$VERTEX_REGION"
+ -e "KUBECONFIG=/tmp/kubeconfig"
+ -e "GITHUB_TOKEN=${GITHUB_TOKEN:-}"
+)
+if [ -n "${KUBECONFIG_MOUNT:-}" ]; then
+ DOCKER_ARGS+=(-v "${KUBECONFIG_PATH}:/tmp/kubeconfig:ro")
+fi
-docker run -it --rm \
- --name "$CONTAINER_NAME" \
- -v "${SANDBOX_DIR}:/sandbox" \
- -v "${ADC_PATH}:/tmp/adc.json:ro" \
- ${KUBECONFIG_MOUNT:-} \
- -e "GOOGLE_APPLICATION_CREDENTIALS=/tmp/adc.json" \
- -e "CLAUDE_CODE_USE_VERTEX=1" \
- -e "ANTHROPIC_VERTEX_PROJECT_ID=$VERTEX_PROJECT" \
- -e "CLOUD_ML_REGION=$VERTEX_REGION" \
- -e "KUBECONFIG=/tmp/kubeconfig" \
- -e "GITHUB_TOKEN=${GITHUB_TOKEN:-}" \
- "$IMAGE_NAME"
+docker run "${DOCKER_ARGS[@]}" "$IMAGE_NAME"🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 77-77: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sandbox/run.sh` around lines 73 - 84, The docker run line uses an unquoted
expansion ${KUBECONFIG_MOUNT:-} which can word-split paths with spaces; replace
it with an arguments array: build a shell array (e.g. DOCKER_OPTS or MOUNTS) and
conditionally append the KUBECONFIG_MOUNT value only when non-empty, then call
docker run with "${DOCKER_OPTS[@]}" between the other fixed flags so the
KUBECONFIG_MOUNT is passed as a single safe argument; update the docker run
invocation that references CONTAINER_NAME and IMAGE_NAME to use the array
element expansion instead of the unquoted ${KUBECONFIG_MOUNT:-}.
Skill that takes one or more agentic test iteration branches and produces a clean, shippable fix branch. Key steps: 1. Discover and fetch source branches 2. Analyze each branch's changes and intent 3. Evaluate overlaps (complementary/progressive/redundant/conflicting) 4. Present evaluation and wait for user confirmation 5. Apply changes with inline lint fixing (no separate lint commits) 6. Structure commits by logical concern, not by source branch 7. Verify with regression run + e2e-real + flakiness probe 8. Confirm with user before pushing Codifies the workflow from the OBSINTA-1290 consolidation session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…osis skills Agentic iteration loops lack git history context and can re-introduce patterns that were previously removed for good reasons (e.g. cy.reload() breaking dynamic plugin chunk loading). This caused a real regression during the OBSINTA-1290 consolidation. Added checks to three skills: - productize-iterations: Step 3 now flags REGRESSIVE overlaps where an iteration branch re-introduces a removed pattern - iterate-incident-tests: Fix Guidelines now require checking git log before applying any fix - diagnose-test-failure: Diagnosis output now includes a History Check section that flags previously reverted patterns Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@DavidRajnoha: This pull request references OBSINTA-1287 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
.claude/commands/cypress/test-iteration/iterate-incident-tests.md (2)
80-81:⚠️ Potential issue | 🟠 MajorReplace machine-specific absolute paths with repo-relative or
${REPO_ROOT}paths.Lines 80–81 and Line 235 still hardcode
/home/drajnoha/..., which breaks portability in other developer machines/CI/sandbox environments.Also applies to: 235-235
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around lines 80 - 81, The hardcoded absolute paths ("Bash(cd /home/drajnoha/Code/monitoring-plugin:*)" and "Bash(find /home/drajnoha/Code/monitoring-plugin/web/cypress:*)") must be made portable: replace those literal /home/drajnoha/... strings with repo-relative paths or use the ${REPO_ROOT} variable (e.g., "${REPO_ROOT}/monitoring-plugin" and "${REPO_ROOT}/monitoring-plugin/web/cypress") wherever those two entries and the similar occurrence around line 235 appear so the commands run on other developer machines and CI environments.
99-510:⚠️ Potential issue | 🟠 MajorAdd global timeout/run-budget/circuit-breaker guardrails for autonomous loops.
The workflow still has nested retry/iteration/flakiness loops without an explicit wall-clock cutoff, total execution budget, or first-iteration failure-rate circuit breaker.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around lines 99 - 510, The orchestration flow (Steps 0–15, variables like current_iteration, max-iterations, flakiness-runs, run_in_background and the commit threshold) lacks global timeout/ budget and a first-iteration failure-rate circuit breaker; add configurable guardrails: introduce global config variables (e.g., GLOBAL_WALLCLOCK_TIMEOUT_MS, MAX_TOTAL_RUN_SECONDS, MAX_AUTONOMOUS_COMMITS, and FIRST_ITER_FAIL_RATE_THRESHOLD) and enforce them at the top-level orchestrator so the loop aborts when wall-clock or total-run budget is exceeded, when commits reach MAX_AUTONOMOUS_COMMITS (already tracked at 5), or when the first iteration failure rate > FIRST_ITER_FAIL_RATE_THRESHOLD—return a clear "budget_exhausted" or "circuit_breaker_triggered" result and stop spawning Diagnosis/Fix agents; ensure these checks are evaluated before Step 4 (clean) and between iterations (before incrementing current_iteration) and persisted in the final report.
🧹 Nitpick comments (4)
.claude/commands/cypress/test-iteration/diagnose-test-failure.md (1)
111-145: Specify fenced code block language.At Line 111, the fenced block has no language tag (MD040). Use
textto keep it lint-clean and unambiguous.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md around lines 111 - 145, The fenced code block under the "## Diagnosis" section is missing a language tag (MD040); add a language identifier `text` to the opening triple-backtick fence (the ``` block that contains the Diagnosis template) so it becomes ```text to satisfy the linter and keep the block unambiguous..claude/commands/cypress/test-iteration/productize-iterations.md (2)
36-37: Use named parameters consistently instead of shell positional placeholders.The document declares named params (
branch-name,source-branches, etc.) but operational steps use$1/$2/$3/$4. This mismatch increases execution mistakes when invoked as a command skill.Also applies to: 131-131, 183-183, 218-218, 240-240, 260-260
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/productize-iterations.md around lines 36 - 37, The doc mixes named parameters (branch-name, source-branches, etc.) with shell positional placeholders ($1/$2/$3/$4); update all operational examples and commands to use the declared named parameters consistently (e.g., replace occurrences of $1/$2/$3/$4 with the corresponding named tokens like source-branches, branch-name, target-branch, iteration-number or a consistent ${source-branches} style) so commands and instructions match the declared parameter names; apply this change for the referenced occurrences (including the `$2` glob handling example and the other noted instances) and ensure any example command parsing logic (the glob check for source-branches) references the named param instead of $2.
49-49: Specify fenced code block languages (MD040).Add explicit language tags to these fenced blocks to satisfy markdownlint and improve readability.
Also applies to: 64-64, 104-104, 169-169, 224-224, 237-237
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/productize-iterations.md at line 49, The markdown contains multiple fenced code blocks that use plain ``` with no language tag (MD040); update each bare fence (specifically the blocks currently at lines 49, 64, 104, 169, 224, and 237) to include an appropriate language identifier (e.g., ```bash, ```sh, ```json, ```yaml, or ```text) that matches the block content so markdownlint MD040 is satisfied and readability is improved; after updating the fences, run markdownlint to verify no MD040 violations remain..claude/commands/cypress/test-iteration/iterate-incident-tests.md (1)
142-142: Add language identifiers to fenced code blocks.MD040 warnings are valid on these blocks; add
text,bash, orjsonas appropriate.Also applies to: 209-209, 225-225, 239-239, 280-280, 349-349, 385-385, 409-409, 456-456
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md at line 142, Update the Markdown file's fenced code blocks flagged by MD040: replace bare triple backticks (```) with language-specific fences (e.g., ```bash for shell commands, ```json for JSON snippets, ```text for plain output/examples) for each code block in the iterate-incident-tests markdown so the linter no longer reports MD040; ensure each block's language matches the block content (commands → bash, JSON → json, plain text/example → text).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md:
- Around line 4-22: The parameters block is missing the stability-context
parameter the orchestrator sends; add a new parameter entry named
"stability-context" in the parameters list (same level as test-name, spec-file,
error-message, etc.) so the handler accepts the orchestrator payload, mark it
required: false, and include a short description like "Context about test
stability from the orchestrator"; update any validation that enumerates accepted
parameter names (if present) to include stability-context and ensure functions
that read inputs (e.g., code that references test-name, spec-file,
error-message) can also access stability-context.
In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md:
- Around line 10-11: The docs are inconsistent about the "target" and test
filtering: the text says target can be a full spec path or a grep pattern for a
test title, but the implementation/documentation prepends
"cypress/e2e/incidents/{target}" (risking double-prefix) and uses grepTags
(which filters tags, not title text). Fix by making "target" semantics explicit
and robust: (1) update the logic/docs so when composing the spec path you check
if target already starts with "cypress/" before prepending
"cypress/e2e/incidents/", and (2) use the correct filter flag—use "grep" when
the intent is to match test titles and "grepTags" only when matching tag
expressions; adjust the relevant mentions/usages of target, grep, and grepTags
in the file so examples and instructions are consistent. Ensure examples show
both a full-spec path and a filename case, and show separate examples for "grep"
(title) vs "grepTags" (tag) usage.
In @.claude/commands/cypress/test-iteration/productize-iterations.md:
- Line 17: The docs for the test-target option are inconsistent and the
specific-file mapping can corrupt full paths: update the description for
"test-target" and the resolution table so they both mention the
grepTags/grep-pattern option (add a "grep-pattern" row to the table), and change
the specific-file resolution rule that prepends "cypress/e2e/incidents/{target}"
so it only prepends when the target is a bare filename (e.g., no '/' and not
already starting with "cypress/"); if the target looks like a full path
(contains '/' or starts with "cypress/") leave it unchanged and document that
behavior in the table and the "test-target" description.
- Around line 1-20: Add a top-level `name` field to the YAML frontmatter for
this command (between the leading and trailing ---) so the command becomes
discoverable; pick a stable identifier like `productize-iterations` (lowercase,
hyphenated) and add it above the `description` field, then run any command
registry/lint that validates frontmatter to ensure it passes discovery checks
and matches other commands’ naming conventions.
- Around line 265-266: Replace the credential-bearing rewrite of the remote URL
(the git remote set-url origin https://$(gh auth
token)@github.com/<owner>/<repo>.git line) with an ephemeral auth header or
credential helper when pushing; for example, remove the set-url step and invoke
push using an in-memory header like using git -c
http.extraheader="AUTHORIZATION: bearer $(gh auth token)" push origin $1 or
configure and rely on the Git credential helper / gh auth login instead of
embedding the token in the remote URL.
---
Duplicate comments:
In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md:
- Around line 80-81: The hardcoded absolute paths ("Bash(cd
/home/drajnoha/Code/monitoring-plugin:*)" and "Bash(find
/home/drajnoha/Code/monitoring-plugin/web/cypress:*)") must be made portable:
replace those literal /home/drajnoha/... strings with repo-relative paths or use
the ${REPO_ROOT} variable (e.g., "${REPO_ROOT}/monitoring-plugin" and
"${REPO_ROOT}/monitoring-plugin/web/cypress") wherever those two entries and the
similar occurrence around line 235 appear so the commands run on other developer
machines and CI environments.
- Around line 99-510: The orchestration flow (Steps 0–15, variables like
current_iteration, max-iterations, flakiness-runs, run_in_background and the
commit threshold) lacks global timeout/ budget and a first-iteration
failure-rate circuit breaker; add configurable guardrails: introduce global
config variables (e.g., GLOBAL_WALLCLOCK_TIMEOUT_MS, MAX_TOTAL_RUN_SECONDS,
MAX_AUTONOMOUS_COMMITS, and FIRST_ITER_FAIL_RATE_THRESHOLD) and enforce them at
the top-level orchestrator so the loop aborts when wall-clock or total-run
budget is exceeded, when commits reach MAX_AUTONOMOUS_COMMITS (already tracked
at 5), or when the first iteration failure rate >
FIRST_ITER_FAIL_RATE_THRESHOLD—return a clear "budget_exhausted" or
"circuit_breaker_triggered" result and stop spawning Diagnosis/Fix agents;
ensure these checks are evaluated before Step 4 (clean) and between iterations
(before incrementing current_iteration) and persisted in the final report.
---
Nitpick comments:
In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md:
- Around line 111-145: The fenced code block under the "## Diagnosis" section is
missing a language tag (MD040); add a language identifier `text` to the opening
triple-backtick fence (the ``` block that contains the Diagnosis template) so it
becomes ```text to satisfy the linter and keep the block unambiguous.
In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md:
- Line 142: Update the Markdown file's fenced code blocks flagged by MD040:
replace bare triple backticks (```) with language-specific fences (e.g., ```bash
for shell commands, ```json for JSON snippets, ```text for plain
output/examples) for each code block in the iterate-incident-tests markdown so
the linter no longer reports MD040; ensure each block's language matches the
block content (commands → bash, JSON → json, plain text/example → text).
In @.claude/commands/cypress/test-iteration/productize-iterations.md:
- Around line 36-37: The doc mixes named parameters (branch-name,
source-branches, etc.) with shell positional placeholders ($1/$2/$3/$4); update
all operational examples and commands to use the declared named parameters
consistently (e.g., replace occurrences of $1/$2/$3/$4 with the corresponding
named tokens like source-branches, branch-name, target-branch, iteration-number
or a consistent ${source-branches} style) so commands and instructions match the
declared parameter names; apply this change for the referenced occurrences
(including the `$2` glob handling example and the other noted instances) and
ensure any example command parsing logic (the glob check for source-branches)
references the named param instead of $2.
- Line 49: The markdown contains multiple fenced code blocks that use plain ```
with no language tag (MD040); update each bare fence (specifically the blocks
currently at lines 49, 64, 104, 169, 224, and 237) to include an appropriate
language identifier (e.g., ```bash, ```sh, ```json, ```yaml, or ```text) that
matches the block content so markdownlint MD040 is satisfied and readability is
improved; after updating the fences, run markdownlint to verify no MD040
violations remain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: b44329ab-1bff-4022-b0f8-017b8ec937cf
📒 Files selected for processing (3)
.claude/commands/cypress/test-iteration/diagnose-test-failure.md.claude/commands/cypress/test-iteration/iterate-incident-tests.md.claude/commands/cypress/test-iteration/productize-iterations.md
| parameters: | ||
| - name: test-name | ||
| description: "Full title of the failing test (from mochawesome 'fullTitle' or Cypress output)" | ||
| required: true | ||
| - name: spec-file | ||
| description: "Path to the spec file (e.g., cypress/e2e/incidents/regression/01.reg_filtering.cy.ts)" | ||
| required: true | ||
| - name: error-message | ||
| description: "The error message from the test failure" | ||
| required: true | ||
| - name: screenshot-path | ||
| description: "Absolute path to the failure screenshot (will be read with multimodal vision)" | ||
| required: false | ||
| - name: stack-trace | ||
| description: "The error stack trace (estack from mochawesome)" | ||
| required: false | ||
| - name: ci-context | ||
| description: "Optional context from /cypress:test-iteration:analyze-ci-results (commit correlation, infra status)" | ||
| required: false |
There was a problem hiding this comment.
Add missing stability-context parameter to match orchestrator contract.
Line 4–22 define accepted inputs, but the orchestrator passes stability-context (see iterate workflow). Without declaring it here, that context can be silently dropped during diagnosis.
🔧 Proposed fix
parameters:
- name: test-name
@@
- name: ci-context
description: "Optional context from /cypress:test-iteration:analyze-ci-results (commit correlation, infra status)"
required: false
+ - name: stability-context
+ description: "Compressed prior stability history from test-stability ledger for flaky/persistent-failure correlation"
+ required: false📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| parameters: | |
| - name: test-name | |
| description: "Full title of the failing test (from mochawesome 'fullTitle' or Cypress output)" | |
| required: true | |
| - name: spec-file | |
| description: "Path to the spec file (e.g., cypress/e2e/incidents/regression/01.reg_filtering.cy.ts)" | |
| required: true | |
| - name: error-message | |
| description: "The error message from the test failure" | |
| required: true | |
| - name: screenshot-path | |
| description: "Absolute path to the failure screenshot (will be read with multimodal vision)" | |
| required: false | |
| - name: stack-trace | |
| description: "The error stack trace (estack from mochawesome)" | |
| required: false | |
| - name: ci-context | |
| description: "Optional context from /cypress:test-iteration:analyze-ci-results (commit correlation, infra status)" | |
| required: false | |
| parameters: | |
| - name: test-name | |
| description: "Full title of the failing test (from mochawesome 'fullTitle' or Cypress output)" | |
| required: true | |
| - name: spec-file | |
| description: "Path to the spec file (e.g., cypress/e2e/incidents/regression/01.reg_filtering.cy.ts)" | |
| required: true | |
| - name: error-message | |
| description: "The error message from the test failure" | |
| required: true | |
| - name: screenshot-path | |
| description: "Absolute path to the failure screenshot (will be read with multimodal vision)" | |
| required: false | |
| - name: stack-trace | |
| description: "The error stack trace (estack from mochawesome)" | |
| required: false | |
| - name: ci-context | |
| description: "Optional context from /cypress:test-iteration:analyze-ci-results (commit correlation, infra status)" | |
| required: false | |
| - name: stability-context | |
| description: "Compressed prior stability history from test-stability ledger for flaky/persistent-failure correlation" | |
| required: false |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/commands/cypress/test-iteration/diagnose-test-failure.md around
lines 4 - 22, The parameters block is missing the stability-context parameter
the orchestrator sends; add a new parameter entry named "stability-context" in
the parameters list (same level as test-name, spec-file, error-message, etc.) so
the handler accepts the orchestrator payload, mark it required: false, and
include a short description like "Context about test stability from the
orchestrator"; update any validation that enumerates accepted parameter names
(if present) to include stability-context and ensure functions that read inputs
(e.g., code that references test-name, spec-file, error-message) can also access
stability-context.
| - a specific spec file path (e.g., "cypress/e2e/incidents/01.incidents.cy.ts") | ||
| - a grep pattern for a specific test (e.g., "should filter by severity") |
There was a problem hiding this comment.
Target resolution is internally inconsistent (grep vs grepTags, and spec path composition).
Line 10 shows target can be a full spec path, but Line 168 prepends cypress/e2e/incidents/{target} (can double-prefix paths).
Line 11 says grep pattern for test title, while Line 169/Line 188 force grepTags, which filters tags, not title text.
🔧 Proposed fix
-| specific file | `cypress/e2e/incidents/{target}` | (none) |
-| grep pattern | `cypress/e2e/incidents/**/*.cy.ts` | (none, use `--env grepTags="{target}"`) |
+| specific file | `{target}` | (none) |
+| grep pattern (test title) | `cypress/e2e/incidents/**/*.cy.ts` | (none, use `--env grep="{target}"`) |-**IMPORTANT**: Use `grepTags` (not `grep`). The `grep` option searches test names as text, while `grepTags` filters by `@tag` annotations.
+**IMPORTANT**:
+- Use `grepTags` only for `@tag` filtering.
+- Use `grep` for test-title pattern filtering.Also applies to: 168-169, 188-188
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/commands/cypress/test-iteration/iterate-incident-tests.md around
lines 10 - 11, The docs are inconsistent about the "target" and test filtering:
the text says target can be a full spec path or a grep pattern for a test title,
but the implementation/documentation prepends "cypress/e2e/incidents/{target}"
(risking double-prefix) and uses grepTags (which filters tags, not title text).
Fix by making "target" semantics explicit and robust: (1) update the logic/docs
so when composing the spec path you check if target already starts with
"cypress/" before prepending "cypress/e2e/incidents/", and (2) use the correct
filter flag—use "grep" when the intent is to match test titles and "grepTags"
only when matching tag expressions; adjust the relevant mentions/usages of
target, grep, and grepTags in the file so examples and instructions are
consistent. Ensure examples show both a full-spec path and a filename case, and
show separate examples for "grep" (title) vs "grepTags" (tag) usage.
| --- | ||
| description: Consolidate one or more agentic test iteration branches into a clean, shippable fix branch — analyzes overlaps, merges intelligently, verifies with flakiness probing, and pushes | ||
| parameters: | ||
| - name: branch-name | ||
| description: "Target branch name (e.g. test/OBSINTA-1290-incident-stability). Created from main." | ||
| required: true | ||
| type: string | ||
| - name: source-branches | ||
| description: "Comma-separated list of iteration branch names or a glob (e.g. 'FIX-PROPOSAL-AGENTIC/*' or 'branch-a,branch-b,branch-c'). Fetched from origin." | ||
| required: true | ||
| type: string | ||
| - name: flakiness-runs | ||
| description: "Number of flakiness probe runs after fixes are applied (default: 3)" | ||
| required: false | ||
| type: string | ||
| - name: test-target | ||
| description: "Cypress test target — 'all', 'regression', a spec file, or grepTags pattern (default: all)" | ||
| required: false | ||
| type: string | ||
| --- |
There was a problem hiding this comment.
Frontmatter is missing required command identifier (name).
Other command specs in this PR define name, but this file does not. That can prevent the command from being discoverable/invokable.
🔧 Proposed fix
---
+name: productize-iterations
description: Consolidate one or more agentic test iteration branches into a clean, shippable fix branch — analyzes overlaps, merges intelligently, verifies with flakiness probing, and pushes
parameters:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| --- | |
| description: Consolidate one or more agentic test iteration branches into a clean, shippable fix branch — analyzes overlaps, merges intelligently, verifies with flakiness probing, and pushes | |
| parameters: | |
| - name: branch-name | |
| description: "Target branch name (e.g. test/OBSINTA-1290-incident-stability). Created from main." | |
| required: true | |
| type: string | |
| - name: source-branches | |
| description: "Comma-separated list of iteration branch names or a glob (e.g. 'FIX-PROPOSAL-AGENTIC/*' or 'branch-a,branch-b,branch-c'). Fetched from origin." | |
| required: true | |
| type: string | |
| - name: flakiness-runs | |
| description: "Number of flakiness probe runs after fixes are applied (default: 3)" | |
| required: false | |
| type: string | |
| - name: test-target | |
| description: "Cypress test target — 'all', 'regression', a spec file, or grepTags pattern (default: all)" | |
| required: false | |
| type: string | |
| --- | |
| --- | |
| name: productize-iterations | |
| description: Consolidate one or more agentic test iteration branches into a clean, shippable fix branch — analyzes overlaps, merges intelligently, verifies with flakiness probing, and pushes | |
| parameters: | |
| - name: branch-name | |
| description: "Target branch name (e.g. test/OBSINTA-1290-incident-stability). Created from main." | |
| required: true | |
| type: string | |
| - name: source-branches | |
| description: "Comma-separated list of iteration branch names or a glob (e.g. 'FIX-PROPOSAL-AGENTIC/*' or 'branch-a,branch-b,branch-c'). Fetched from origin." | |
| required: true | |
| type: string | |
| - name: flakiness-runs | |
| description: "Number of flakiness probe runs after fixes are applied (default: 3)" | |
| required: false | |
| type: string | |
| - name: test-target | |
| description: "Cypress test target — 'all', 'regression', a spec file, or grepTags pattern (default: all)" | |
| required: false | |
| type: string | |
| --- |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/commands/cypress/test-iteration/productize-iterations.md around
lines 1 - 20, Add a top-level `name` field to the YAML frontmatter for this
command (between the leading and trailing ---) so the command becomes
discoverable; pick a stable identifier like `productize-iterations` (lowercase,
hyphenated) and add it above the `description` field, then run any command
registry/lint that validates frontmatter to ensure it passes discovery checks
and matches other commands’ naming conventions.
| required: false | ||
| type: string | ||
| - name: test-target | ||
| description: "Cypress test target — 'all', 'regression', a spec file, or grepTags pattern (default: all)" |
There was a problem hiding this comment.
test-target contract and resolution table are inconsistent.
Line 17 says test target can be grepTags pattern, but the table has no grep-pattern row. Also, specific-file mapping prepends cypress/e2e/incidents/{target}, which can corrupt full-path inputs.
Also applies to: 185-190
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/commands/cypress/test-iteration/productize-iterations.md at line 17,
The docs for the test-target option are inconsistent and the specific-file
mapping can corrupt full paths: update the description for "test-target" and the
resolution table so they both mention the grepTags/grep-pattern option (add a
"grep-pattern" row to the table), and change the specific-file resolution rule
that prepends "cypress/e2e/incidents/{target}" so it only prepends when the
target is a bare filename (e.g., no '/' and not already starting with
"cypress/"); if the target looks like a full path (contains '/' or starts with
"cypress/") leave it unchanged and document that behavior in the table and the
"test-target" description.
| git remote set-url origin https://$(gh auth token)@github.com/<owner>/<repo>.git | ||
| git push origin $1 |
There was a problem hiding this comment.
Avoid embedding gh auth token into git remote URL.
Line 265 persists a credential-bearing remote URL, which can leak via config/logging/history. Prefer ephemeral auth headers or credential helpers instead of rewriting remote with tokenized URL.
🔧 Safer alternative
-git remote set-url origin https://$(gh auth token)@github.com/<owner>/<repo>.git
-git push origin $1
+git -c http.https://github.com/.extraheader="AUTHORIZATION: bearer $(gh auth token)" push origin $1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| git remote set-url origin https://$(gh auth token)@github.com/<owner>/<repo>.git | |
| git push origin $1 | |
| git -c http.https://github.com/.extraheader="AUTHORIZATION: bearer $(gh auth token)" push origin $1 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/commands/cypress/test-iteration/productize-iterations.md around
lines 265 - 266, Replace the credential-bearing rewrite of the remote URL (the
git remote set-url origin https://$(gh auth token)@github.com/<owner>/<repo>.git
line) with an ephemeral auth header or credential helper when pushing; for
example, remove the set-url step and invoke push using an in-memory header like
using git -c http.extraheader="AUTHORIZATION: bearer $(gh auth token)" push
origin $1 or configure and rely on the Git credential helper / gh auth login
instead of embedding the token in the remote URL.
|
@DavidRajnoha: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
/analyze-ci-results,/diagnose-test-failure,/iterate-incident-tests, and/iterate-ci-flakyreview
security policy and blast radius documentation
.claude/commands/directory structure intocypress/test-iteration/andcypress/test-development/subdirectoriesSummary by CodeRabbit
New Features
Documentation