OCPEDGE-2753: Add catalog index image info to LVMS CI doctor report#186
OCPEDGE-2753: Add catalog index image info to LVMS CI doctor report#186kasturinarra wants to merge 4 commits into
Conversation
The prow-job skill now extracts the LVMS catalog index image from lvms-catalogsource build logs and resolves the digest via skopeo. create-report.py renders this as an info box per release section, showing image, digest, build date, and source commit. Gated on component=lvm-operator so MicroShift is unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kasturinarra The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR adds LVMS index image metadata extraction and reporting. A new ChangesLVMS Index Image Extraction and Reporting
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/shared/scripts/create-report.py`:
- Around line 705-729: extract_index_image currently returns info from the first
file found in sorted(glob_mod.glob(pattern)), which can pick stale/arbitrary
metadata; change the selection to be deterministic by sorting candidate files by
modification time (most recent first) before extracting metadata: get the file
list from glob_mod.glob(pattern), sort it using os.path.getmtime (or
pathlib.Path.stat().st_mtime) in descending order, then iterate that sorted
list, parse each file as done in the function (refer to variables pattern,
filepath and the info extraction block) and return the first info with an
"image".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 50f8bd69-56ce-4b5f-a3ff-4de58f4d3dd9
📒 Files selected for processing (2)
plugins/lvms-ci/skills/prow-job/SKILL.mdplugins/shared/scripts/create-report.py
The extract_index_image function returns None when no ## Index Image section exists in the report files, so the component gate is unnecessary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/shared/scripts/create-report.py (1)
698-730:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPer CONTRIBUTING.md: Parsing logic should be robust and testable, not brittle manual string splitting.
The current implementation uses fragile string operations (
line.split("**Image:**", 1)[1].strip()) that:
- Break silently with format variations (extra whitespace, markdown changes)
- Extract invalid data without validation (e.g., malformed SHA, non-ISO dates, invalid image refs)
- Lack test coverage to prevent regressions
As per coding guidelines: "Parsing/validation logic (e.g., scanning build logs and extracting 'Index Image' fields) should include positive and negative test cases rather than relying on brittle/manual parsing."
🛡️ Recommended improvements
- Use regex patterns with validation:
+import re + def extract_index_image(workdir, version): """Extract index image info from per-job report files. Scans per-job report files for an '## Index Image' section containing Image, Digest, Built, and Source Commit fields. """ + # Patterns with validation + IMAGE_PATTERN = re.compile(r'^\s*-\s*\*\*Image:\*\*\s+(.+\S)\s*$') + DIGEST_PATTERN = re.compile(r'^\s*-\s*\*\*Digest:\*\*\s+(sha256:[a-f0-9]{64})\s*$') + BUILT_PATTERN = re.compile(r'^\s*-\s*\*\*Built:\*\*\s+(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)\s*$') + COMMIT_PATTERN = re.compile(r'^\s*-\s*\*\*Source Commit:\*\*\s+([a-f0-9]{7,40})\s*$') + pattern = os.path.join(workdir, "jobs", f"release-{version}-job-*.txt") for filepath in sorted(glob_mod.glob(pattern)): try: with open(filepath, "r") as f: content = f.read() except IOError: continue if "## Index Image" not in content: continue info = {} for line in content.split("\n"): - line = line.strip() - if line.startswith("- **Image:**"): - info["image"] = line.split("**Image:**", 1)[1].strip() - elif line.startswith("- **Digest:**"): - info["digest"] = line.split("**Digest:**", 1)[1].strip() - elif line.startswith("- **Built:**"): - info["built"] = line.split("**Built:**", 1)[1].strip() - elif line.startswith("- **Source Commit:**"): - info["commit"] = line.split("**Source Commit:**", 1)[1].strip() + if m := IMAGE_PATTERN.match(line): + info["image"] = m.group(1) + elif m := DIGEST_PATTERN.match(line): + info["digest"] = m.group(1) + elif m := BUILT_PATTERN.match(line): + info["built"] = m.group(1) + elif m := COMMIT_PATTERN.match(line): + info["commit"] = m.group(1) if info.get("image"): return info return None
- Add unit tests covering:
- Valid "## Index Image" section extraction
- Missing sections (returns None)
- Malformed field lines (skipped)
- Invalid digest/date/commit formats (rejected)
- Multiple job files (picks first with valid data)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/shared/scripts/create-report.py` around lines 698 - 730, The parsing in extract_index_image is brittle; replace the manual string splits with robust regex parsing and validation: when scanning each file matched by pattern in extract_index_image, locate the "## Index Image" section and run compiled regexes to capture named groups for Image, Digest, Built, and Source Commit; validate the digest with a sha256 hex pattern, validate Built by parsing as ISO date/time (datetime.fromisoformat or strict format), validate the image ref with a permissive image reference regex, and validate the commit as a 7+ hex SHA before accepting the record; skip files or individual fields that fail validation and continue to the next job file, returning the first fully valid info dict, and add unit tests covering valid extraction, missing section, malformed lines, invalid digest/date/commit, and multiple job files picking the first valid one.Source: Coding guidelines
🧹 Nitpick comments (1)
plugins/shared/scripts/create-report.py (1)
744-750: ⚖️ Poor tradeoffConsider making the repository URL derivable from image metadata.
The GitHub commit link hardcodes
openshift/lvm-operator(line 749), which works for LVMS but assumes the catalog image is always built from that specific repository. If the image metadata includes a source repository label (e.g.,org.opencontainers.image.sourceorvcs-url), consider deriving the repo URL from that instead of hardcoding it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/shared/scripts/create-report.py` around lines 744 - 750, The HTML link for the commit currently hardcodes "openshift/lvm-operator" when building the GitHub URL; update the block that uses index_info, commit and the lines.append call to first look for repository source metadata (e.g., index_info.get("org.opencontainers.image.source") or index_info.get("vcs-url")), normalize the value to a GitHub "owner/repo" path (strip protocol and .git suffix), and use that derived path when formatting the commit link, falling back to "openshift/lvm-operator" only if no source label is present; ensure you still escape values with _e(commit/short) and validate the derived URL before inserting into lines.append.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@plugins/shared/scripts/create-report.py`:
- Around line 698-730: The parsing in extract_index_image is brittle; replace
the manual string splits with robust regex parsing and validation: when scanning
each file matched by pattern in extract_index_image, locate the "## Index Image"
section and run compiled regexes to capture named groups for Image, Digest,
Built, and Source Commit; validate the digest with a sha256 hex pattern,
validate Built by parsing as ISO date/time (datetime.fromisoformat or strict
format), validate the image ref with a permissive image reference regex, and
validate the commit as a 7+ hex SHA before accepting the record; skip files or
individual fields that fail validation and continue to the next job file,
returning the first fully valid info dict, and add unit tests covering valid
extraction, missing section, malformed lines, invalid digest/date/commit, and
multiple job files picking the first valid one.
---
Nitpick comments:
In `@plugins/shared/scripts/create-report.py`:
- Around line 744-750: The HTML link for the commit currently hardcodes
"openshift/lvm-operator" when building the GitHub URL; update the block that
uses index_info, commit and the lines.append call to first look for repository
source metadata (e.g., index_info.get("org.opencontainers.image.source") or
index_info.get("vcs-url")), normalize the value to a GitHub "owner/repo" path
(strip protocol and .git suffix), and use that derived path when formatting the
commit link, falling back to "openshift/lvm-operator" only if no source label is
present; ensure you still escape values with _e(commit/short) and validate the
derived URL before inserting into lines.append.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 0f73be2a-dfa8-4d32-b28a-43003759c3b6
📒 Files selected for processing (1)
plugins/shared/scripts/create-report.py
This function is LVMS-specific (as noted in the docstring) and only processes lvm-operator catalog index images. The commit URL will always point to openshift/lvm-operator. No need to add complexity for a case that won't arise |
| - **Source Commit:** <commit hash> | ||
| ``` | ||
|
|
||
| If `skopeo inspect` fails (e.g., image no longer exists), still report the image reference from the build log. |
There was a problem hiding this comment.
This needs to be scripted. Please, make sure to store all related files in a dedicated workdir sub-directory.
There was a problem hiding this comment.
I have addressed your comment, When there are no lvms-catalogsource logs, it silently does nothing — no directory created, no files written, create-report.py gets None back and skips the info box.
Move the LVMS catalog index image extraction from LLM instructions in
SKILL.md to a deterministic shell script (extract-index-image.sh). The
script scans downloaded artifacts for lvms-catalogsource build logs,
extracts the image reference, runs skopeo inspect, and writes JSON to
a dedicated ${WORKDIR}/index-image/ subdirectory. doctor.sh calls it
in finalize/refresh before HTML generation, and create-report.py reads
the JSON files directly instead of parsing markdown from agent reports.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/shared/scripts/extract-index-image.sh (1)
1-163:⚠️ Potential issue | 🟠 MajorAdd test coverage for extract-index-image.sh parsing/validation
Per CONTRIBUTING.md: “any parsing/validation logic … must include tests with positive and negative cases.” There are no tests for
plugins/shared/scripts/extract-index-image.sh, includingextract_image_ref(),process_release(), andinspect_and_write().Required test coverage:
extract_image_ref(): valid extraction, missingLVM_INDEX_IMAGE, malformed/missingbuild-log.txtcontent, invalid/empty image referenceprocess_release(): release with no jobs file, jobs with zero jobs, no lvms-catalogsource logs, invalid/emptyartifacts_dirvalues, successful pathinspect_and_write():skopeomissing,skopeo inspectfailure, successful inspection producing expected JSON (digest/built/commit) and omission of empty fields🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/shared/scripts/extract-index-image.sh` around lines 1 - 163, Add a test suite that exercises extract_image_ref, process_release, and inspect_and_write with positive and negative cases: for extract_image_ref add tests for a log containing "LVM_INDEX_IMAGE is set to <img>", missing line, malformed/empty content, and whitespace-only image; for process_release simulate (via temporary WORKDIR and crafted jobs JSON) a missing jobs file, zero-length jobs array, artifacts entries with missing/invalid artifacts_dir, no lvms-catalogsource logs, and a successful end-to-end run that writes release-<version>.json; for inspect_and_write test behavior when skopeo is absent (e.g. remove from PATH or stub command), when skopeo inspect returns non-zero, and when it returns valid JSON—assert output JSON contains digest/built/commit only when present and omits empty fields. Use temporary directories/files and cleanup, invoke the functions (sourcing the script) and assert stderr messages and output JSON contents; mock/sketch skopeo by creating a wrapper executable in PATH that emits controlled output to cover success and failure paths.Source: Coding guidelines
🧹 Nitpick comments (1)
plugins/shared/scripts/extract-index-image.sh (1)
130-145: ⚡ Quick winValidate WORKDIR for path traversal.
Per coding guidelines: "Path traversal: canonicalize paths, reject ../"
The
--workdirargument is accepted without validation. WhileWORKDIRitself is controlled by the caller (doctor.sh), defense-in-depth suggests validating it here to prevent misuse if called directly.Recommendation:
- Canonicalize
WORKDIRusingrealpath- Optionally validate it's an absolute path
🛡️ Proposed validation
if [[ -z "${WORKDIR}" ]]; then echo "Error: --workdir is required" >&2 echo "Usage: $(basename "$0") --workdir DIR <release1,release2,...>" >&2 return 1 fi + + # Canonicalize workdir + WORKDIR=$(realpath -m "${WORKDIR}" 2>/dev/null || echo "") + if [[ -z "${WORKDIR}" || "${WORKDIR}" != /* ]]; then + echo "Error: invalid --workdir path" >&2 + return 1 + fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/shared/scripts/extract-index-image.sh` around lines 130 - 145, Validate and canonicalize the WORKDIR in main(): after parsing --workdir and before using it, use realpath to canonicalize WORKDIR (e.g., WORKDIR_REAL=$(realpath -m "${WORKDIR}" 2>/dev/null)), check realpath succeeded and that WORKDIR_REAL is an absolute path and a directory (test -d), and reject/exit with an error if it contains path traversal or is not resolvable; replace subsequent uses of WORKDIR with the canonical WORKDIR_REAL and ensure main() returns a non-zero code on validation failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/shared/scripts/create-report.py`:
- Around line 698-706: The function extract_index_image accepts an unsanitized
version and constructs a path directly (used by CLI code that passes user
input), enabling path traversal; fix by validating/sanitizing version (allow
only a tight pattern such as semantic-version or a whitelist of [0-9A-Za-z._-],
reject any '../' or path-separator characters), canonicalize the final path with
os.path.normpath/os.path.abspath and verify it stays inside the workdir (compare
commonprefix or check that abspath startswith workdir abspath) before calling
load_json(path); additionally add unit tests for extract_index_image covering
positive case (valid version and existing JSON), negative cases (missing file
returns None or expected behavior, malformed JSON raises/parses appropriately),
and explicit path-traversal attempts (e.g., version containing ../) to assert
that such inputs are rejected.
In `@plugins/shared/scripts/extract-index-image.sh`:
- Around line 79-124: process_release currently uses the untrusted release
parameter and artifacts_dir values directly in filesystem paths; validate inputs
to prevent path traversal by (1) ensuring release matches an allowed pattern
(e.g., regex like ^(main|[0-9]+\.[0-9]+)$) and rejecting any release containing
.. or slashes before constructing jobs_file or output paths, (2) canonicalizing
each artifacts_dir with realpath or readlink -f and verifying the resolved path
is a descendant of "${WORKDIR}/artifacts" (reject and skip if not), and (3)
applying the same release validation before creating
"${WORKDIR}/index-image/release-${release}.json"; on validation failure echo an
error and return. Use the process_release function, the jobs_file/artifacts_dirs
handling, and where inspect_and_write is called to locate and apply these
checks.
- Around line 27-32: The extract_image_ref function currently greps/seds the log
without validating format; replace the brittle pipeline with a robust
regex-based extraction that matches a valid container image reference (e.g.,
optional registry, repository/path, and either :tag or `@sha256`:digest) and
validate the captured value before returning (reject or return empty on
no-match); update extract_image_ref to accept the log path parameter (build_log)
and perform pattern matching using a single grep -Po or awk/Perl-compatible
regex to validate the image ref, and add unit tests covering: valid extraction,
no match (missing line), malformed lines, and invalid image formats to ensure
regressions are caught.
---
Outside diff comments:
In `@plugins/shared/scripts/extract-index-image.sh`:
- Around line 1-163: Add a test suite that exercises extract_image_ref,
process_release, and inspect_and_write with positive and negative cases: for
extract_image_ref add tests for a log containing "LVM_INDEX_IMAGE is set to
<img>", missing line, malformed/empty content, and whitespace-only image; for
process_release simulate (via temporary WORKDIR and crafted jobs JSON) a missing
jobs file, zero-length jobs array, artifacts entries with missing/invalid
artifacts_dir, no lvms-catalogsource logs, and a successful end-to-end run that
writes release-<version>.json; for inspect_and_write test behavior when skopeo
is absent (e.g. remove from PATH or stub command), when skopeo inspect returns
non-zero, and when it returns valid JSON—assert output JSON contains
digest/built/commit only when present and omits empty fields. Use temporary
directories/files and cleanup, invoke the functions (sourcing the script) and
assert stderr messages and output JSON contents; mock/sketch skopeo by creating
a wrapper executable in PATH that emits controlled output to cover success and
failure paths.
---
Nitpick comments:
In `@plugins/shared/scripts/extract-index-image.sh`:
- Around line 130-145: Validate and canonicalize the WORKDIR in main(): after
parsing --workdir and before using it, use realpath to canonicalize WORKDIR
(e.g., WORKDIR_REAL=$(realpath -m "${WORKDIR}" 2>/dev/null)), check realpath
succeeded and that WORKDIR_REAL is an absolute path and a directory (test -d),
and reject/exit with an error if it contains path traversal or is not
resolvable; replace subsequent uses of WORKDIR with the canonical WORKDIR_REAL
and ensure main() returns a non-zero code on validation failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: feba05f3-9443-4b03-99f7-889e1cd8a855
📒 Files selected for processing (4)
plugins/lvms-ci/scripts/extract-index-image.shplugins/shared/scripts/create-report.pyplugins/shared/scripts/doctor.shplugins/shared/scripts/extract-index-image.sh
The actual build log uses 'LVM_INDEX_IMAGE is set to: <image>' (with a colon) which the sed pattern did not handle, causing the colon to be included in the image reference. Update the pattern to strip both 'set to ' and 'set to: ' variants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
lvms-catalogsource/build-log.txtand resolves digest/build date/source commit viaskopeo inspectcreate-report.pyrenders this as a styled info box in each release section, showing image reference, SHA digest, build date, and a linked source commitcomponent == "lvm-operator"— MicroShift reports are unaffectedTest plan
/lvms-ci:doctorand verify the index image info appears in the HTML report🤖 Generated with Claude Code
Summary by CodeRabbit