Skip to content

OCPEDGE-2753: Add catalog index image info to LVMS CI doctor report#186

Open
kasturinarra wants to merge 4 commits into
openshift-eng:mainfrom
kasturinarra:add-catalog-image-to-report
Open

OCPEDGE-2753: Add catalog index image info to LVMS CI doctor report#186
kasturinarra wants to merge 4 commits into
openshift-eng:mainfrom
kasturinarra:add-catalog-image-to-report

Conversation

@kasturinarra

@kasturinarra kasturinarra commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Prow-job skill now extracts the LVMS catalog index image from lvms-catalogsource/build-log.txt and resolves digest/build date/source commit via skopeo inspect
  • create-report.py renders this as a styled info box in each release section, showing image reference, SHA digest, build date, and a linked source commit
  • Gated on component == "lvm-operator" — MicroShift reports are unaffected
Screenshot From 2026-06-11 23-23-16

Test plan

  • Run /lvms-ci:doctor and verify the index image info appears in the HTML report
  • Verify MicroShift reports are unchanged

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Release reports now display LVMS Index Image metadata (image reference, digest, build timestamp, shortened commit link) and show extraction errors when present.
    • Automatic extraction of catalog image metadata integrated into release workflows so per-release index-image data is collected and included in generated reports.

The prow-job skill now extracts the LVMS catalog index image from
lvms-catalogsource build logs and resolves the digest via skopeo.
create-report.py renders this as an info box per release section,
showing image, digest, build date, and source commit. Gated on
component=lvm-operator so MicroShift is unaffected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kasturinarra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2026
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 854ae1cb-3279-47c2-844c-8e1c86485ed8

📥 Commits

Reviewing files that changed from the base of the PR and between 14fd30e and afe6bf8.

📒 Files selected for processing (1)
  • plugins/shared/scripts/extract-index-image.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/shared/scripts/extract-index-image.sh

Walkthrough

This PR adds LVMS index image metadata extraction and reporting. A new extract-index-image.sh utility scans job artifacts for LVM_INDEX_IMAGE build log entries, resolves image digests and metadata using skopeo, and writes JSON results. doctor.sh orchestrates this extraction during finalize and refresh workflows. create-report.py gains helper functions to load and render the extracted metadata as an styled info box in HTML reports.

Changes

LVMS Index Image Extraction and Reporting

Layer / File(s) Summary
Index image extraction utility script
plugins/shared/scripts/extract-index-image.sh, plugins/lvms-ci/scripts/extract-index-image.sh
New extract-index-image.sh scans job artifacts for LVM_INDEX_IMAGE references in build logs, uses skopeo inspect to resolve digests and build metadata, and writes results to index-image/release-<release>.json. Symlink in lvms-ci directory for access.
Orchestration in doctor.sh workflows
plugins/shared/scripts/doctor.sh
cmd_finalize and cmd_refresh now invoke extract-index-image.sh after PR aggregation and before HTML report generation, with warning logging on extraction failure.
HTML report helpers for index image rendering
plugins/shared/scripts/create-report.py
New .index-image-info CSS styling and helper functions extract_index_image() and _render_index_image() load metadata and render it as conditional HTML with catalog image, digest, build timestamp, GitHub commit link, and optional error text.
Report generation wiring for index image data
plugins/shared/scripts/create-report.py
Extended render_release_section(), generate_html(), and main() signatures to thread index metadata through the report pipeline. main() builds an index_data lookup per release version, generate_html() creates _idx and passes it to render_release_section(), which injects the rendered index-image HTML before the breakdown and issues tables.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 9 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Ai-Attribution ⚠️ Warning PR commit message includes “Co-Authored-By: Claude Opus 4.6 …” but contains no “Generated-by:” or “Assisted-by:” trailers. Add Red Hat “Generated-by:” or “Assisted-by:” trailers for the AI assistance and remove/avoid using “Co-Authored-By” for the AI tool.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding catalog index image information to the LVMS CI doctor report, referencing the associated ticket OCPEDGE-2753.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No-Weak-Crypto ✅ Passed Searched plugins/shared and plugins/lvms-ci scripts for MD5/SHA1/DES/RC4/3DES/Blowfish/ECB and for compare_digest/token/secret== patterns; no matches found in PR-related files.
Container-Privileges ✅ Passed Repo-wide scan found 0 occurrences of privileged:true, hostPID/hostNetwork/hostIPC:true, allowPrivilegeEscalation:true, SYS_ADMIN, runAsUser:0, or runAsNonRoot:false.
No-Sensitive-Data-In-Logs ✅ Passed Logged messages in PR code only include constant warnings, image/digest metadata, and file/exception paths; no passwords/tokens/API keys/PII are printed.
No-Hardcoded-Secrets ✅ Passed Scanned PR-related scripts (create-report.py, doctor.sh, extract-index-image.sh) for PEM private keys, embedded-credential URLs, Authorization/Bearer literals, secret/token/password variable assign...
No-Injection-Vectors ✅ Passed Scanned PR-relevant files (create-report.py, doctor.sh, extract-index-image.sh and symlink target) for eval/exec/pickle.loads/yaml.load/os.system/dangerouslySetInnerHTML/subprocess shell=True; none...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 11, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/shared/scripts/create-report.py`:
- Around line 705-729: extract_index_image currently returns info from the first
file found in sorted(glob_mod.glob(pattern)), which can pick stale/arbitrary
metadata; change the selection to be deterministic by sorting candidate files by
modification time (most recent first) before extracting metadata: get the file
list from glob_mod.glob(pattern), sort it using os.path.getmtime (or
pathlib.Path.stat().st_mtime) in descending order, then iterate that sorted
list, parse each file as done in the function (refer to variables pattern,
filepath and the info extraction block) and return the first info with an
"image".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 50f8bd69-56ce-4b5f-a3ff-4de58f4d3dd9

📥 Commits

Reviewing files that changed from the base of the PR and between 8131026 and 38214ce.

📒 Files selected for processing (2)
  • plugins/lvms-ci/skills/prow-job/SKILL.md
  • plugins/shared/scripts/create-report.py

Comment thread plugins/shared/scripts/create-report.py Outdated
@kasturinarra kasturinarra changed the title Add catalog index image info to LVMS CI doctor report OCPEDGE-2753: Add catalog index image info to LVMS CI doctor report Jun 11, 2026
The extract_index_image function returns None when no ## Index Image
section exists in the report files, so the component gate is unnecessary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai coderabbitai Bot removed the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 11, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/shared/scripts/create-report.py (1)

698-730: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Per CONTRIBUTING.md: Parsing logic should be robust and testable, not brittle manual string splitting.

The current implementation uses fragile string operations (line.split("**Image:**", 1)[1].strip()) that:

  • Break silently with format variations (extra whitespace, markdown changes)
  • Extract invalid data without validation (e.g., malformed SHA, non-ISO dates, invalid image refs)
  • Lack test coverage to prevent regressions

As per coding guidelines: "Parsing/validation logic (e.g., scanning build logs and extracting 'Index Image' fields) should include positive and negative test cases rather than relying on brittle/manual parsing."

🛡️ Recommended improvements
  1. Use regex patterns with validation:
+import re
+
 def extract_index_image(workdir, version):
     """Extract index image info from per-job report files.
 
     Scans per-job report files for an '## Index Image' section containing
     Image, Digest, Built, and Source Commit fields.
     """
+    # Patterns with validation
+    IMAGE_PATTERN = re.compile(r'^\s*-\s*\*\*Image:\*\*\s+(.+\S)\s*$')
+    DIGEST_PATTERN = re.compile(r'^\s*-\s*\*\*Digest:\*\*\s+(sha256:[a-f0-9]{64})\s*$')
+    BUILT_PATTERN = re.compile(r'^\s*-\s*\*\*Built:\*\*\s+(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)\s*$')
+    COMMIT_PATTERN = re.compile(r'^\s*-\s*\*\*Source Commit:\*\*\s+([a-f0-9]{7,40})\s*$')
+
     pattern = os.path.join(workdir, "jobs", f"release-{version}-job-*.txt")
     for filepath in sorted(glob_mod.glob(pattern)):
         try:
             with open(filepath, "r") as f:
                 content = f.read()
         except IOError:
             continue
 
         if "## Index Image" not in content:
             continue
 
         info = {}
         for line in content.split("\n"):
-            line = line.strip()
-            if line.startswith("- **Image:**"):
-                info["image"] = line.split("**Image:**", 1)[1].strip()
-            elif line.startswith("- **Digest:**"):
-                info["digest"] = line.split("**Digest:**", 1)[1].strip()
-            elif line.startswith("- **Built:**"):
-                info["built"] = line.split("**Built:**", 1)[1].strip()
-            elif line.startswith("- **Source Commit:**"):
-                info["commit"] = line.split("**Source Commit:**", 1)[1].strip()
+            if m := IMAGE_PATTERN.match(line):
+                info["image"] = m.group(1)
+            elif m := DIGEST_PATTERN.match(line):
+                info["digest"] = m.group(1)
+            elif m := BUILT_PATTERN.match(line):
+                info["built"] = m.group(1)
+            elif m := COMMIT_PATTERN.match(line):
+                info["commit"] = m.group(1)
 
         if info.get("image"):
             return info
 
     return None
  1. Add unit tests covering:
    • Valid "## Index Image" section extraction
    • Missing sections (returns None)
    • Malformed field lines (skipped)
    • Invalid digest/date/commit formats (rejected)
    • Multiple job files (picks first with valid data)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/shared/scripts/create-report.py` around lines 698 - 730, The parsing
in extract_index_image is brittle; replace the manual string splits with robust
regex parsing and validation: when scanning each file matched by pattern in
extract_index_image, locate the "## Index Image" section and run compiled
regexes to capture named groups for Image, Digest, Built, and Source Commit;
validate the digest with a sha256 hex pattern, validate Built by parsing as ISO
date/time (datetime.fromisoformat or strict format), validate the image ref with
a permissive image reference regex, and validate the commit as a 7+ hex SHA
before accepting the record; skip files or individual fields that fail
validation and continue to the next job file, returning the first fully valid
info dict, and add unit tests covering valid extraction, missing section,
malformed lines, invalid digest/date/commit, and multiple job files picking the
first valid one.

Source: Coding guidelines

🧹 Nitpick comments (1)
plugins/shared/scripts/create-report.py (1)

744-750: ⚖️ Poor tradeoff

Consider making the repository URL derivable from image metadata.

The GitHub commit link hardcodes openshift/lvm-operator (line 749), which works for LVMS but assumes the catalog image is always built from that specific repository. If the image metadata includes a source repository label (e.g., org.opencontainers.image.source or vcs-url), consider deriving the repo URL from that instead of hardcoding it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/shared/scripts/create-report.py` around lines 744 - 750, The HTML
link for the commit currently hardcodes "openshift/lvm-operator" when building
the GitHub URL; update the block that uses index_info, commit and the
lines.append call to first look for repository source metadata (e.g.,
index_info.get("org.opencontainers.image.source") or index_info.get("vcs-url")),
normalize the value to a GitHub "owner/repo" path (strip protocol and .git
suffix), and use that derived path when formatting the commit link, falling back
to "openshift/lvm-operator" only if no source label is present; ensure you still
escape values with _e(commit/short) and validate the derived URL before
inserting into lines.append.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@plugins/shared/scripts/create-report.py`:
- Around line 698-730: The parsing in extract_index_image is brittle; replace
the manual string splits with robust regex parsing and validation: when scanning
each file matched by pattern in extract_index_image, locate the "## Index Image"
section and run compiled regexes to capture named groups for Image, Digest,
Built, and Source Commit; validate the digest with a sha256 hex pattern,
validate Built by parsing as ISO date/time (datetime.fromisoformat or strict
format), validate the image ref with a permissive image reference regex, and
validate the commit as a 7+ hex SHA before accepting the record; skip files or
individual fields that fail validation and continue to the next job file,
returning the first fully valid info dict, and add unit tests covering valid
extraction, missing section, malformed lines, invalid digest/date/commit, and
multiple job files picking the first valid one.

---

Nitpick comments:
In `@plugins/shared/scripts/create-report.py`:
- Around line 744-750: The HTML link for the commit currently hardcodes
"openshift/lvm-operator" when building the GitHub URL; update the block that
uses index_info, commit and the lines.append call to first look for repository
source metadata (e.g., index_info.get("org.opencontainers.image.source") or
index_info.get("vcs-url")), normalize the value to a GitHub "owner/repo" path
(strip protocol and .git suffix), and use that derived path when formatting the
commit link, falling back to "openshift/lvm-operator" only if no source label is
present; ensure you still escape values with _e(commit/short) and validate the
derived URL before inserting into lines.append.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0f73be2a-dfa8-4d32-b28a-43003759c3b6

📥 Commits

Reviewing files that changed from the base of the PR and between 38214ce and a7dd0be.

📒 Files selected for processing (1)
  • plugins/shared/scripts/create-report.py

@openshift-ci openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 11, 2026
@kasturinarra

Copy link
Copy Markdown
Contributor Author

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/shared/scripts/create-report.py (1)> 698-730: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Per CONTRIBUTING.md: Parsing logic should be robust and testable, not brittle manual string splitting.
The current implementation uses fragile string operations (line.split("**Image:**", 1)[1].strip()) that:

  • Break silently with format variations (extra whitespace, markdown changes)
  • Extract invalid data without validation (e.g., malformed SHA, non-ISO dates, invalid image refs)
  • Lack test coverage to prevent regressions

As per coding guidelines: "Parsing/validation logic (e.g., scanning build logs and extracting 'Index Image' fields) should include positive and negative test cases rather than relying on brittle/manual parsing."

🛡️ Recommended improvements

  1. Use regex patterns with validation:
+import re
+
 def extract_index_image(workdir, version):
     """Extract index image info from per-job report files.
 
     Scans per-job report files for an '## Index Image' section containing
     Image, Digest, Built, and Source Commit fields.
     """
+    # Patterns with validation
+    IMAGE_PATTERN = re.compile(r'^\s*-\s*\*\*Image:\*\*\s+(.+\S)\s*$')
+    DIGEST_PATTERN = re.compile(r'^\s*-\s*\*\*Digest:\*\*\s+(sha256:[a-f0-9]{64})\s*$')
+    BUILT_PATTERN = re.compile(r'^\s*-\s*\*\*Built:\*\*\s+(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z)\s*$')
+    COMMIT_PATTERN = re.compile(r'^\s*-\s*\*\*Source Commit:\*\*\s+([a-f0-9]{7,40})\s*$')
+
     pattern = os.path.join(workdir, "jobs", f"release-{version}-job-*.txt")
     for filepath in sorted(glob_mod.glob(pattern)):
         try:
             with open(filepath, "r") as f:
                 content = f.read()
         except IOError:
             continue
 
         if "## Index Image" not in content:
             continue
 
         info = {}
         for line in content.split("\n"):
-            line = line.strip()
-            if line.startswith("- **Image:**"):
-                info["image"] = line.split("**Image:**", 1)[1].strip()
-            elif line.startswith("- **Digest:**"):
-                info["digest"] = line.split("**Digest:**", 1)[1].strip()
-            elif line.startswith("- **Built:**"):
-                info["built"] = line.split("**Built:**", 1)[1].strip()
-            elif line.startswith("- **Source Commit:**"):
-                info["commit"] = line.split("**Source Commit:**", 1)[1].strip()
+            if m := IMAGE_PATTERN.match(line):
+                info["image"] = m.group(1)
+            elif m := DIGEST_PATTERN.match(line):
+                info["digest"] = m.group(1)
+            elif m := BUILT_PATTERN.match(line):
+                info["built"] = m.group(1)
+            elif m := COMMIT_PATTERN.match(line):
+                info["commit"] = m.group(1)
 
         if info.get("image"):
             return info
 
     return None
  1. Add unit tests covering:

    • Valid "## Index Image" section extraction
    • Missing sections (returns None)
    • Malformed field lines (skipped)
    • Invalid digest/date/commit formats (rejected)
    • Multiple job files (picks first with valid data)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/shared/scripts/create-report.py` around lines 698 - 730, The parsing
in extract_index_image is brittle; replace the manual string splits with robust
regex parsing and validation: when scanning each file matched by pattern in
extract_index_image, locate the "## Index Image" section and run compiled
regexes to capture named groups for Image, Digest, Built, and Source Commit;
validate the digest with a sha256 hex pattern, validate Built by parsing as ISO
date/time (datetime.fromisoformat or strict format), validate the image ref with
a permissive image reference regex, and validate the commit as a 7+ hex SHA
before accepting the record; skip files or individual fields that fail
validation and continue to the next job file, returning the first fully valid
info dict, and add unit tests covering valid extraction, missing section,
malformed lines, invalid digest/date/commit, and multiple job files picking the
first valid one.

Source: Coding guidelines

🧹 Nitpick comments (1)

plugins/shared/scripts/create-report.py (1)> 744-750: ⚖️ Poor tradeoff

Consider making the repository URL derivable from image metadata.
The GitHub commit link hardcodes openshift/lvm-operator (line 749), which works for LVMS but assumes the catalog image is always built from that specific repository. If the image metadata includes a source repository label (e.g., org.opencontainers.image.source or vcs-url), consider deriving the repo URL from that instead of hardcoding it.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/shared/scripts/create-report.py` around lines 744 - 750, The HTML
link for the commit currently hardcodes "openshift/lvm-operator" when building
the GitHub URL; update the block that uses index_info, commit and the
lines.append call to first look for repository source metadata (e.g.,
index_info.get("org.opencontainers.image.source") or index_info.get("vcs-url")),
normalize the value to a GitHub "owner/repo" path (strip protocol and .git
suffix), and use that derived path when formatting the commit link, falling back
to "openshift/lvm-operator" only if no source label is present; ensure you still
escape values with _e(commit/short) and validate the derived URL before
inserting into lines.append.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@plugins/shared/scripts/create-report.py`:
- Around line 698-730: The parsing in extract_index_image is brittle; replace
the manual string splits with robust regex parsing and validation: when scanning
each file matched by pattern in extract_index_image, locate the "## Index Image"
section and run compiled regexes to capture named groups for Image, Digest,
Built, and Source Commit; validate the digest with a sha256 hex pattern,
validate Built by parsing as ISO date/time (datetime.fromisoformat or strict
format), validate the image ref with a permissive image reference regex, and
validate the commit as a 7+ hex SHA before accepting the record; skip files or
individual fields that fail validation and continue to the next job file,
returning the first fully valid info dict, and add unit tests covering valid
extraction, missing section, malformed lines, invalid digest/date/commit, and
multiple job files picking the first valid one.

---

Nitpick comments:
In `@plugins/shared/scripts/create-report.py`:
- Around line 744-750: The HTML link for the commit currently hardcodes
"openshift/lvm-operator" when building the GitHub URL; update the block that
uses index_info, commit and the lines.append call to first look for repository
source metadata (e.g., index_info.get("org.opencontainers.image.source") or
index_info.get("vcs-url")), normalize the value to a GitHub "owner/repo" path
(strip protocol and .git suffix), and use that derived path when formatting the
commit link, falling back to "openshift/lvm-operator" only if no source label is
present; ensure you still escape values with _e(commit/short) and validate the
derived URL before inserting into lines.append.

ℹ️ Review info

This function is LVMS-specific (as noted in the docstring) and only processes lvm-operator catalog index images. The commit URL will always point to openshift/lvm-operator. No need to add complexity for a case that won't arise

- **Source Commit:** <commit hash>
```

If `skopeo inspect` fails (e.g., image no longer exists), still report the image reference from the build log.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be scripted. Please, make sure to store all related files in a dedicated workdir sub-directory.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed your comment, When there are no lvms-catalogsource logs, it silently does nothing — no directory created, no files written, create-report.py gets None back and skips the info box.

Move the LVMS catalog index image extraction from LLM instructions in
SKILL.md to a deterministic shell script (extract-index-image.sh). The
script scans downloaded artifacts for lvms-catalogsource build logs,
extracts the image reference, runs skopeo inspect, and writes JSON to
a dedicated ${WORKDIR}/index-image/ subdirectory. doctor.sh calls it
in finalize/refresh before HTML generation, and create-report.py reads
the JSON files directly instead of parsing markdown from agent reports.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai coderabbitai Bot removed the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 12, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/shared/scripts/extract-index-image.sh (1)

1-163: ⚠️ Potential issue | 🟠 Major

Add test coverage for extract-index-image.sh parsing/validation

Per CONTRIBUTING.md: “any parsing/validation logic … must include tests with positive and negative cases.” There are no tests for plugins/shared/scripts/extract-index-image.sh, including extract_image_ref(), process_release(), and inspect_and_write().

Required test coverage:

  • extract_image_ref(): valid extraction, missing LVM_INDEX_IMAGE, malformed/missing build-log.txt content, invalid/empty image reference
  • process_release(): release with no jobs file, jobs with zero jobs, no lvms-catalogsource logs, invalid/empty artifacts_dir values, successful path
  • inspect_and_write(): skopeo missing, skopeo inspect failure, successful inspection producing expected JSON (digest/built/commit) and omission of empty fields
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/shared/scripts/extract-index-image.sh` around lines 1 - 163, Add a
test suite that exercises extract_image_ref, process_release, and
inspect_and_write with positive and negative cases: for extract_image_ref add
tests for a log containing "LVM_INDEX_IMAGE is set to <img>", missing line,
malformed/empty content, and whitespace-only image; for process_release simulate
(via temporary WORKDIR and crafted jobs JSON) a missing jobs file, zero-length
jobs array, artifacts entries with missing/invalid artifacts_dir, no
lvms-catalogsource logs, and a successful end-to-end run that writes
release-<version>.json; for inspect_and_write test behavior when skopeo is
absent (e.g. remove from PATH or stub command), when skopeo inspect returns
non-zero, and when it returns valid JSON—assert output JSON contains
digest/built/commit only when present and omits empty fields. Use temporary
directories/files and cleanup, invoke the functions (sourcing the script) and
assert stderr messages and output JSON contents; mock/sketch skopeo by creating
a wrapper executable in PATH that emits controlled output to cover success and
failure paths.

Source: Coding guidelines

🧹 Nitpick comments (1)
plugins/shared/scripts/extract-index-image.sh (1)

130-145: ⚡ Quick win

Validate WORKDIR for path traversal.

Per coding guidelines: "Path traversal: canonicalize paths, reject ../"

The --workdir argument is accepted without validation. While WORKDIR itself is controlled by the caller (doctor.sh), defense-in-depth suggests validating it here to prevent misuse if called directly.

Recommendation:

  • Canonicalize WORKDIR using realpath
  • Optionally validate it's an absolute path
🛡️ Proposed validation
     if [[ -z "${WORKDIR}" ]]; then
         echo "Error: --workdir is required" >&2
         echo "Usage: $(basename "$0") --workdir DIR <release1,release2,...>" >&2
         return 1
     fi
+    
+    # Canonicalize workdir
+    WORKDIR=$(realpath -m "${WORKDIR}" 2>/dev/null || echo "")
+    if [[ -z "${WORKDIR}" || "${WORKDIR}" != /* ]]; then
+        echo "Error: invalid --workdir path" >&2
+        return 1
+    fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/shared/scripts/extract-index-image.sh` around lines 130 - 145,
Validate and canonicalize the WORKDIR in main(): after parsing --workdir and
before using it, use realpath to canonicalize WORKDIR (e.g.,
WORKDIR_REAL=$(realpath -m "${WORKDIR}" 2>/dev/null)), check realpath succeeded
and that WORKDIR_REAL is an absolute path and a directory (test -d), and
reject/exit with an error if it contains path traversal or is not resolvable;
replace subsequent uses of WORKDIR with the canonical WORKDIR_REAL and ensure
main() returns a non-zero code on validation failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/shared/scripts/create-report.py`:
- Around line 698-706: The function extract_index_image accepts an unsanitized
version and constructs a path directly (used by CLI code that passes user
input), enabling path traversal; fix by validating/sanitizing version (allow
only a tight pattern such as semantic-version or a whitelist of [0-9A-Za-z._-],
reject any '../' or path-separator characters), canonicalize the final path with
os.path.normpath/os.path.abspath and verify it stays inside the workdir (compare
commonprefix or check that abspath startswith workdir abspath) before calling
load_json(path); additionally add unit tests for extract_index_image covering
positive case (valid version and existing JSON), negative cases (missing file
returns None or expected behavior, malformed JSON raises/parses appropriately),
and explicit path-traversal attempts (e.g., version containing ../) to assert
that such inputs are rejected.

In `@plugins/shared/scripts/extract-index-image.sh`:
- Around line 79-124: process_release currently uses the untrusted release
parameter and artifacts_dir values directly in filesystem paths; validate inputs
to prevent path traversal by (1) ensuring release matches an allowed pattern
(e.g., regex like ^(main|[0-9]+\.[0-9]+)$) and rejecting any release containing
.. or slashes before constructing jobs_file or output paths, (2) canonicalizing
each artifacts_dir with realpath or readlink -f and verifying the resolved path
is a descendant of "${WORKDIR}/artifacts" (reject and skip if not), and (3)
applying the same release validation before creating
"${WORKDIR}/index-image/release-${release}.json"; on validation failure echo an
error and return. Use the process_release function, the jobs_file/artifacts_dirs
handling, and where inspect_and_write is called to locate and apply these
checks.
- Around line 27-32: The extract_image_ref function currently greps/seds the log
without validating format; replace the brittle pipeline with a robust
regex-based extraction that matches a valid container image reference (e.g.,
optional registry, repository/path, and either :tag or `@sha256`:digest) and
validate the captured value before returning (reject or return empty on
no-match); update extract_image_ref to accept the log path parameter (build_log)
and perform pattern matching using a single grep -Po or awk/Perl-compatible
regex to validate the image ref, and add unit tests covering: valid extraction,
no match (missing line), malformed lines, and invalid image formats to ensure
regressions are caught.

---

Outside diff comments:
In `@plugins/shared/scripts/extract-index-image.sh`:
- Around line 1-163: Add a test suite that exercises extract_image_ref,
process_release, and inspect_and_write with positive and negative cases: for
extract_image_ref add tests for a log containing "LVM_INDEX_IMAGE is set to
<img>", missing line, malformed/empty content, and whitespace-only image; for
process_release simulate (via temporary WORKDIR and crafted jobs JSON) a missing
jobs file, zero-length jobs array, artifacts entries with missing/invalid
artifacts_dir, no lvms-catalogsource logs, and a successful end-to-end run that
writes release-<version>.json; for inspect_and_write test behavior when skopeo
is absent (e.g. remove from PATH or stub command), when skopeo inspect returns
non-zero, and when it returns valid JSON—assert output JSON contains
digest/built/commit only when present and omits empty fields. Use temporary
directories/files and cleanup, invoke the functions (sourcing the script) and
assert stderr messages and output JSON contents; mock/sketch skopeo by creating
a wrapper executable in PATH that emits controlled output to cover success and
failure paths.

---

Nitpick comments:
In `@plugins/shared/scripts/extract-index-image.sh`:
- Around line 130-145: Validate and canonicalize the WORKDIR in main(): after
parsing --workdir and before using it, use realpath to canonicalize WORKDIR
(e.g., WORKDIR_REAL=$(realpath -m "${WORKDIR}" 2>/dev/null)), check realpath
succeeded and that WORKDIR_REAL is an absolute path and a directory (test -d),
and reject/exit with an error if it contains path traversal or is not
resolvable; replace subsequent uses of WORKDIR with the canonical WORKDIR_REAL
and ensure main() returns a non-zero code on validation failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: feba05f3-9443-4b03-99f7-889e1cd8a855

📥 Commits

Reviewing files that changed from the base of the PR and between a7dd0be and 14fd30e.

📒 Files selected for processing (4)
  • plugins/lvms-ci/scripts/extract-index-image.sh
  • plugins/shared/scripts/create-report.py
  • plugins/shared/scripts/doctor.sh
  • plugins/shared/scripts/extract-index-image.sh

Comment thread plugins/shared/scripts/create-report.py
Comment thread plugins/shared/scripts/extract-index-image.sh
Comment thread plugins/shared/scripts/extract-index-image.sh
The actual build log uses 'LVM_INDEX_IMAGE is set to: <image>'
(with a colon) which the sed pattern did not handle, causing the
colon to be included in the image reference. Update the pattern
to strip both 'set to ' and 'set to: ' variants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants