USHIFT-6967: Use deterministic workflow for CI Doctor analysis parallelization#192
USHIFT-6967: Use deterministic workflow for CI Doctor analysis parallelization#192ggiguash wants to merge 7 commits into
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ggiguash The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR introduces a shared ChangesWorkflow-based job analysis and plugin integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/microshift-ci/skills/doctor/SKILL.md`:
- Around line 88-126: Add a new "## Edge Case" section in both orchestrator
SKILL docs to explicitly document workflow failure modes and the no-fallback
policy: in plugins/microshift-ci/skills/doctor/SKILL.md (lines 88-126) insert an
"## Edge Case" block that states the Workflow tool call must be treated as
blocking on error (if the Workflow invocation fails for any reason — script
error, timeout, API error — stop the skill and report the error to the user; do
not fall back to sequential agent spawning), documents that individual
job-analysis agents may fail without aborting the workflow (the workflow
continues), and instructs to use the returned {analyzed, failed, total} values
for summaries; in plugins/lvms-ci/skills/doctor/SKILL.md (lines 68-100) add the
same "## Edge Case" block with identical rules (state Workflow failure =
stop+report, per-job failures allowed, use returned summary values, and
explicitly prohibit sequential fallback). Ensure wording is concise and matches
the existing doc style.
In `@plugins/shared/scripts/doctor-analyze.js`:
- Around line 46-50: The top-level illegal `return { analyzed: analyzed, failed:
failed, total: results.length }` must be replaced with a proper module export or
function return; change it to `module.exports = { analyzed, failed, total:
results.length }` (or wrap the surrounding logic in a function like `analyze()`
and return from that function) so the file no longer uses a module-scope return;
reference the variables `analyzed`, `failed`, and `results.length` (or `total`)
when exporting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 2d2b9be2-4b4f-475d-b858-b749df2fe1d1
📒 Files selected for processing (5)
plugins/lvms-ci/scripts/doctor-analyze.jsplugins/lvms-ci/skills/doctor/SKILL.mdplugins/microshift-ci/scripts/doctor-analyze.jsplugins/microshift-ci/skills/doctor/SKILL.mdplugins/shared/scripts/doctor-analyze.js
There was a problem hiding this comment.
🧹 Nitpick comments (1)
plugins/shared/scripts/doctor.sh (1)
238-244: ⚡ Quick winSort the job arrays before deriving index-based filenames.
Lines 242-254 derive
output_pathandlabelfrom.key + 1, so the filenames are only as deterministic as the incoming JSON order. That makes the PR’s deterministic-workflow guarantee depend ondownload-jobs.shpreserving order after parallel downloads. Sort on a stable key beforeto_entries[]so reruns keep the same filenames for the same jobs.♻️ Proposed change
- workflow_jobs=$(echo "${workflow_jobs}" | jq --arg r "${release}" --arg w "${WORKDIR}" \ + workflow_jobs=$(echo "${workflow_jobs}" | jq --arg r "${release}" --arg w "${WORKDIR}" \ --slurpfile jobs "${jobs_file}" \ - '. + [$jobs[0] | to_entries[] | { + '. + [($jobs[0] | sort_by(.build_id)) | to_entries[] | { artifacts_dir: .value.artifacts_dir, output_path: ($w + "/jobs/release-" + $r + "-job-" + ((.key+1)|tostring) + "-" + .value.build_id + ".txt"), label: ($r + "-job-" + ((.key+1)|tostring)) }]') ... - workflow_jobs=$(echo "${workflow_jobs}" | jq --arg w "${WORKDIR}" \ + workflow_jobs=$(echo "${workflow_jobs}" | jq --arg w "${WORKDIR}" \ --slurpfile jobs "${WORKDIR}/jobs/prs-jobs.json" \ - '. + [$jobs[0] | to_entries[] | { + '. + [($jobs[0] | sort_by(.pr_number, .build_id)) | to_entries[] | { artifacts_dir: .value.artifacts_dir, output_path: ($w + "/jobs/prs-job-" + ((.key+1)|tostring) + "-pr" + (.value.pr_number|tostring) + "-" + .value.build_id + ".txt"), label: ("pr" + (.value.pr_number|tostring) + "-job-" + ((.key+1)|tostring)), status: .value.status }]')Also applies to: 249-256
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/shared/scripts/doctor.sh` around lines 238 - 244, The jq pipeline building workflow_jobs uses $jobs[0] | to_entries[] and then .key to form index-based filenames, which is nondeterministic; change the pipeline to sort the jobs before to_entries (e.g., $jobs[0] | sort_by(.value.build_id) | to_entries[]) so indices are stable (use a stable key like value.build_id or another unique stable field), and apply the same change to the other similar jq expression mentioned in the comment so both places produce deterministic filenames/labels.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@plugins/shared/scripts/doctor.sh`:
- Around line 238-244: The jq pipeline building workflow_jobs uses $jobs[0] |
to_entries[] and then .key to form index-based filenames, which is
nondeterministic; change the pipeline to sort the jobs before to_entries (e.g.,
$jobs[0] | sort_by(.value.build_id) | to_entries[]) so indices are stable (use a
stable key like value.build_id or another unique stable field), and apply the
same change to the other similar jq expression mentioned in the comment so both
places produce deterministic filenames/labels.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 4dadb3fc-dd94-4b5c-8152-f599a4b79abc
📒 Files selected for processing (3)
plugins/lvms-ci/skills/doctor/SKILL.mdplugins/microshift-ci/skills/doctor/SKILL.mdplugins/shared/scripts/doctor.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- plugins/lvms-ci/skills/doctor/SKILL.md
- plugins/microshift-ci/skills/doctor/SKILL.md
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/shared/scripts/doctor-analyze.js`:
- Around line 14-18: After parsing the args into variable `a`, add validation to
ensure required properties exist before accessing them. Specifically, validate
that `a.jobs` exists and is an array (before accessing `a.jobs.length`), and
that `a.prow_job_skill` exists. If either validation fails, log a clear error
message and exit the script gracefully to prevent runtime crashes from malformed
or missing arguments.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: f21d0e26-2329-4871-8095-91290d4a87dd
📒 Files selected for processing (2)
plugins/shared/scripts/doctor-analyze.jsplugins/shared/scripts/doctor.sh
b8a335b to
14c86b4
Compare
There was a problem hiding this comment.
This essentially locks your skills to Claude Code which may be fine if its significantly improving your results.
This doesn't strike me as a substantial workflow, though -- what is this getting you that driving Claude via -p wouldn't? We do multi-stage workflows in other Prow jobs with just multiple Cluade invocations and it works fine to produce deterministic actions, and has the benefit of being portable to OpenCode or Pi later on.
openshift-eng/ai-helpers#545 should make the unstable version of Claude available in ai-helpers if you'd like to use it (I'd like to keep stable because unstable Claude Code has broken the payload agent several times....)
Testing the changes in openshift/release#80503 for Prow.
Summary by CodeRabbit
Refactor
Documentation
Chores
doctor-analyzeimplementation and updated CI scripts to use it.