USHIFT-6967: Use deterministic workflow for CI Doctor analysis parallelization by ggiguash · Pull Request #192 · openshift-eng/edge-tooling

ggiguash · 2026-06-13T06:59:55Z

Testing the changes in openshift/release#80503 for Prow.

Summary by CodeRabbit

Refactor
- Doctor now performs per-job analysis using Workflow-based parallel execution, improving throughput and consistency versus launching many concurrent agents.
Documentation
- Updated doctor skill instructions to reflect Workflow orchestration, including stricter “no fallback” behavior when workflow execution fails.
Chores
- Consolidated the shared doctor-analyze implementation and updated CI scripts to use it.
- The prepare step now generates a workflow job list JSON to drive the analysis phase.

openshift-ci · 2026-06-13T06:59:59Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2026-06-13T07:00:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ggiguash

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [ggiguash]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-06-13T07:00:07Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

This PR introduces a shared doctor-analyze.js script and ${WORKDIR}/workflows/analyze-jobs.json generation to orchestrate parallel per-job analysis, replaces plugin-local scripts with symlinks to the shared implementation, and updates lvms-ci and microshift-ci skill definitions to call the shared analyzer via the Workflow tool instead of spawning concurrent Agent instances.

Changes

Workflow-based job analysis and plugin integration

Layer / File(s)	Summary
Generate analyze-jobs.json with job metadata `plugins/shared/scripts/doctor.sh`	Adds logic to `prepare` command to create `${WORKDIR}/workflows/` and write `analyze-jobs.json` containing all release job entries and filtered failure-status PR job entries, each with `artifacts_dir`, deterministic `output_path`, and `label`.
Shared doctor-analyze script for parallel job processing `plugins/shared/scripts/doctor-analyze.js`	New script exports `meta` definition and implements `Analyze` phase: defensively normalizes `args` (object or JSON string), runs per-job parallel `agent(...)` invocations with configured `prow_job_skill`, aggregates results into `{analyzed, failed, total}`.
lvms-ci plugin script symlink and Workflow-based orchestration `plugins/lvms-ci/scripts/doctor-analyze.js`, `plugins/lvms-ci/skills/doctor/SKILL.md`	Plugin script becomes symlink to shared implementation. Skill front-matter switches `allowed-tools` from `Agent` to `Workflow`. Step 2 reads `analyze-jobs.json` and invokes `Workflow(doctor-analyze.js)` with jobs and `prow_job_skill` path, waits for completion, and hard-fails on workflow errors. Notes updated to reflect parallel execution guarantee.
microshift-ci plugin script symlink and Workflow-based orchestration `plugins/microshift-ci/scripts/doctor-analyze.js`, `plugins/microshift-ci/skills/doctor/SKILL.md`	Plugin script becomes symlink to shared implementation. Skill front-matter enables `Workflow`. Step 2 rewritten to read `analyze-jobs.json`, invoke `Workflow(doctor-analyze.js)`, wait for completion, surface `{analyzed, failed, total}`, and introduce critical no-fallback stop-on-failure rule. Step 3 adjusted to launch single `microshift-ci:create-bugs` foreground agent. Related Skills and Notes updated.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

openshift-eng/edge-tooling#67: Both PRs modify plugins/microshift-ci/skills/doctor/SKILL.md control flow across Step 2–Step 4 phases to change how doctor workflow transitions.
openshift-eng/edge-tooling#112: Both PRs update plugins/microshift-ci/skills/doctor/SKILL.md Step 2/3 workflow and agent orchestration details.
openshift-eng/edge-tooling#147: Both PRs wire lvms-ci plugin to use the shared doctor workflow for per-job parallel analysis via Workflow tool.

🚥 Pre-merge checks | ✅ 10 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Ai-Attribution	⚠️ Warning	Raw summary indicates AI tool usage ("AI-generated summary of changes"), but commit `b8a335b` lacks required Red Hat attribution trailers (Assisted-by/Generated-by). No proper attribution found.	Add appropriate attribution trailer (Assisted-by or Generated-by) to commit message when AI tools are used, per Red Hat standards.

✅ Passed checks (10 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: converting CI Doctor analysis to use a deterministic workflow for parallelization instead of sequential agent spawning.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
No-Weak-Crypto	✅ Passed	No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons detected in any modified files.
Container-Privileges	✅ Passed	The PR contains only JavaScript scripts, bash scripts, and markdown docs—no K8s manifests or container configs where privilege escalation settings could be present.
No-Sensitive-Data-In-Logs	✅ Passed	All logging statements in the modified code log only non-sensitive data: job counts (integers), release versions, PR numbers, and public build IDs. No passwords, tokens, API keys, PII, or sensitive...
No-Hardcoded-Secrets	✅ Passed	No hardcoded secrets detected in any modified files. Code contains only standard configuration parameters, file paths, bot account references, and legitimate string literals.
No-Injection-Vectors	✅ Passed	No injection vectors found. Shell scripts use safe jq --arg passing, JSON.parse only parses workflow config (not executed as code), and all user input is safely concatenated into prompts/paths with...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/microshift-ci/skills/doctor/SKILL.md`:
- Around line 88-126: Add a new "## Edge Case" section in both orchestrator
SKILL docs to explicitly document workflow failure modes and the no-fallback
policy: in plugins/microshift-ci/skills/doctor/SKILL.md (lines 88-126) insert an
"## Edge Case" block that states the Workflow tool call must be treated as
blocking on error (if the Workflow invocation fails for any reason — script
error, timeout, API error — stop the skill and report the error to the user; do
not fall back to sequential agent spawning), documents that individual
job-analysis agents may fail without aborting the workflow (the workflow
continues), and instructs to use the returned {analyzed, failed, total} values
for summaries; in plugins/lvms-ci/skills/doctor/SKILL.md (lines 68-100) add the
same "## Edge Case" block with identical rules (state Workflow failure =
stop+report, per-job failures allowed, use returned summary values, and
explicitly prohibit sequential fallback). Ensure wording is concise and matches
the existing doc style.

In `@plugins/shared/scripts/doctor-analyze.js`:
- Around line 46-50: The top-level illegal `return { analyzed: analyzed, failed:
failed, total: results.length }` must be replaced with a proper module export or
function return; change it to `module.exports = { analyzed, failed, total:
results.length }` (or wrap the surrounding logic in a function like `analyze()`
and return from that function) so the file no longer uses a module-scope return;
reference the variables `analyzed`, `failed`, and `results.length` (or `total`)
when exporting.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2d2b9be2-4b4f-475d-b858-b749df2fe1d1

📥 Commits

Reviewing files that changed from the base of the PR and between 17c108d and 1ac3d17.

📒 Files selected for processing (5)

plugins/lvms-ci/scripts/doctor-analyze.js
plugins/lvms-ci/skills/doctor/SKILL.md
plugins/microshift-ci/scripts/doctor-analyze.js
plugins/microshift-ci/skills/doctor/SKILL.md
plugins/shared/scripts/doctor-analyze.js

coderabbitai

🧹 Nitpick comments (1)

plugins/shared/scripts/doctor.sh (1)

238-244: ⚡ Quick win

Sort the job arrays before deriving index-based filenames.

Lines 242-254 derive output_path and label from .key + 1, so the filenames are only as deterministic as the incoming JSON order. That makes the PR’s deterministic-workflow guarantee depend on download-jobs.sh preserving order after parallel downloads. Sort on a stable key before to_entries[] so reruns keep the same filenames for the same jobs.

♻️ Proposed change

-        workflow_jobs=$(echo "${workflow_jobs}" | jq --arg r "${release}" --arg w "${WORKDIR}" \
+        workflow_jobs=$(echo "${workflow_jobs}" | jq --arg r "${release}" --arg w "${WORKDIR}" \
             --slurpfile jobs "${jobs_file}" \
-            '. + [$jobs[0] | to_entries[] | {
+            '. + [($jobs[0] | sort_by(.build_id)) | to_entries[] | {
                 artifacts_dir: .value.artifacts_dir,
                 output_path:   ($w + "/jobs/release-" + $r + "-job-" + ((.key+1)|tostring) + "-" + .value.build_id + ".txt"),
                 label:         ($r + "-job-" + ((.key+1)|tostring))
             }]')
...
-        workflow_jobs=$(echo "${workflow_jobs}" | jq --arg w "${WORKDIR}" \
+        workflow_jobs=$(echo "${workflow_jobs}" | jq --arg w "${WORKDIR}" \
             --slurpfile jobs "${WORKDIR}/jobs/prs-jobs.json" \
-            '. + [$jobs[0] | to_entries[] | {
+            '. + [($jobs[0] | sort_by(.pr_number, .build_id)) | to_entries[] | {
                 artifacts_dir: .value.artifacts_dir,
                 output_path:   ($w + "/jobs/prs-job-" + ((.key+1)|tostring) + "-pr" + (.value.pr_number|tostring) + "-" + .value.build_id + ".txt"),
                 label:         ("pr" + (.value.pr_number|tostring) + "-job-" + ((.key+1)|tostring)),
                 status:        .value.status
             }]')

Also applies to: 249-256

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/shared/scripts/doctor.sh` around lines 238 - 244, The jq pipeline
building workflow_jobs uses $jobs[0] | to_entries[] and then .key to form
index-based filenames, which is nondeterministic; change the pipeline to sort
the jobs before to_entries (e.g., $jobs[0] | sort_by(.value.build_id) |
to_entries[]) so indices are stable (use a stable key like value.build_id or
another unique stable field), and apply the same change to the other similar jq
expression mentioned in the comment so both places produce deterministic
filenames/labels.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@plugins/shared/scripts/doctor.sh`:
- Around line 238-244: The jq pipeline building workflow_jobs uses $jobs[0] |
to_entries[] and then .key to form index-based filenames, which is
nondeterministic; change the pipeline to sort the jobs before to_entries (e.g.,
$jobs[0] | sort_by(.value.build_id) | to_entries[]) so indices are stable (use a
stable key like value.build_id or another unique stable field), and apply the
same change to the other similar jq expression mentioned in the comment so both
places produce deterministic filenames/labels.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 4dadb3fc-dd94-4b5c-8152-f599a4b79abc

📥 Commits

Reviewing files that changed from the base of the PR and between 1ac3d17 and 93c4df0.

📒 Files selected for processing (3)

plugins/lvms-ci/skills/doctor/SKILL.md
plugins/microshift-ci/skills/doctor/SKILL.md
plugins/shared/scripts/doctor.sh

🚧 Files skipped from review as they are similar to previous changes (2)

plugins/lvms-ci/skills/doctor/SKILL.md
plugins/microshift-ci/skills/doctor/SKILL.md

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/shared/scripts/doctor-analyze.js`:
- Around line 14-18: After parsing the args into variable `a`, add validation to
ensure required properties exist before accessing them. Specifically, validate
that `a.jobs` exists and is an array (before accessing `a.jobs.length`), and
that `a.prow_job_skill` exists. If either validation fails, log a clear error
message and exit the script gracefully to prevent runtime crashes from malformed
or missing arguments.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: f21d0e26-2329-4871-8095-91290d4a87dd

📥 Commits

Reviewing files that changed from the base of the PR and between 3491f0a and b8a335b.

📒 Files selected for processing (2)

plugins/shared/scripts/doctor-analyze.js
plugins/shared/scripts/doctor.sh

stbenjam · 2026-06-15T10:44:35Z

This essentially locks your skills to Claude Code which may be fine if its significantly improving your results.

This doesn't strike me as a substantial workflow, though -- what is this getting you that driving Claude via -p wouldn't? We do multi-stage workflows in other Prow jobs with just multiple Cluade invocations and it works fine to produce deterministic actions, and has the benefit of being portable to OpenCode or Pi later on.

openshift-eng/ai-helpers#545 should make the unstable version of Claude available in ai-helpers if you'd like to use it (I'd like to keep stable because unstable Claude Code has broken the payload agent several times....)

ggiguash added 2 commits June 13, 2026 09:57

Add analyze workflow scripts

4bdb88b

Update doctor skills to use analyze workflow

1ac3d17

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 13, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2026

coderabbitai Bot requested changes Jun 13, 2026

View reviewed changes

Comment thread plugins/microshift-ci/skills/doctor/SKILL.md

Comment thread plugins/shared/scripts/doctor-analyze.js

Script workflow argument preparation

93c4df0

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

Address Claude review fixes

058246e

coderabbitai Bot approved these changes Jun 13, 2026

View reviewed changes

openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 13, 2026

Fix microshift-ci:create-bugs skill call

3491f0a

coderabbitai Bot removed the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 13, 2026

coderabbitai Bot requested changes Jun 14, 2026

View reviewed changes

Comment thread plugins/shared/scripts/doctor-analyze.js

Move PR failed job filtering to doctor.sh

14c86b4

ggiguash force-pushed the ci-doctor-analyse-workflow branch from b8a335b to 14c86b4 Compare June 14, 2026 09:26

coderabbitai Bot approved these changes Jun 14, 2026

View reviewed changes

openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 14, 2026

ggiguash marked this pull request as ready for review June 15, 2026 05:17

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 15, 2026

ggiguash marked this pull request as draft June 15, 2026 06:43

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 15, 2026

Make Workflow tool a hard requirement

04c2b97

stbenjam reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

USHIFT-6967: Use deterministic workflow for CI Doctor analysis parallelization#192

USHIFT-6967: Use deterministic workflow for CI Doctor analysis parallelization#192
ggiguash wants to merge 7 commits into
openshift-eng:mainfrom
ggiguash:ci-doctor-analyse-workflow

ggiguash commented Jun 13, 2026 •

edited

Loading

Uh oh!

openshift-ci Bot commented Jun 13, 2026

Uh oh!

openshift-ci Bot commented Jun 13, 2026

Uh oh!

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

stbenjam Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggiguash commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented Jun 13, 2026

Uh oh!

openshift-ci Bot commented Jun 13, 2026

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stbenjam Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggiguash commented Jun 13, 2026 •

edited

Loading

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading