OCPBUGS-94089: Fix helm backend test setup reliability#16687
Conversation
Chartmuseum v0.14.0 (from Jan 2022) crashes silently on startup in the current CI builder image, causing cacertCreate.sh to fail with connection refused on localhost:9443. This blocks all PRs' backend CI. - Upgrade chartmuseum from v0.14.0 to v0.16.5 - Replace blind time.Sleep calls with TCP readiness checks that verify each server is actually listening before proceeding Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@jhadvig: This pull request references Jira Issue OCPBUGS-94089, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughHelm test setup now waits for ChartMuseum and Zot TCP ports to become reachable instead of sleeping. The ChartMuseum fixture scripts validate binaries, create storage directories, and start servers with logging. The download script now fetches the tarball with fail-fast options and marks the binary executable. ChangesHelm test readiness and fixture startup
Estimated code review effort: 2 (Simple) | ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 11❌ Failed checks (1 warning, 10 inconclusive)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Tools execution failed with the following error: Failed to run tools: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error) Comment |
|
@jhadvig: This pull request references Jira Issue OCPBUGS-94089, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
- Fix curl flags conflict (-o/-O) in downloadChartmuseum.sh that may save the archive with wrong filename - Add explicit chmod +x on extracted binary - Create storage directories before starting chartmuseum - Add binary existence check and startup logging to all chartmuseum scripts - Use exec to replace shell with chartmuseum process Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| ls -la "./$GOOS-$GOARCH/" >&2 | ||
| exit 1 | ||
| fi | ||
| echo "Starting chartmuseum TLS on port 9443..." >&2 |
There was a problem hiding this comment.
I think that ChartMuseum already prints this line.
|
/test backend |
v0.16.5 starts but silently exits before binding the port. The root cause was the curl -o/-O flag conflict in the download script, not the chartmuseum version. Revert to v0.14.0 which works correctly with the fixed download script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/lgtm |
|
/test ci/prow/backend |
|
/test backend |
|
/retest |
Chartmuseum starts (logs "Starting ChartMuseum") but silently exits before binding to port 9443. Run it in background with PID file (matching zot's pattern), redirect output to a log file, and verify the process is alive after 1 second. On readiness timeout, print the captured log to reveal the actual exit reason. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/helm/actions/testdata/chartmuseum.sh (1)
17-25: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winBackground start + liveness check pattern not applied to sibling scripts.
This script backgrounds chartmuseum, redirects logs, and does a quick post-start liveness check (
kill -0) — a good diagnostic addition per the PR's goal of surfacing startup failures early. However,chartmuseumWithBasicAuth.shandchartmuseumWithoutTls.shstill use blockingexecwith no log capture or liveness check, so crash-on-start failures for those variants will only surface after the full 30swaitForTCPtimeout instead of failing fast with diagnostics.Consider applying the same background+log+liveness pattern to the other two scripts for consistency and faster CI failure feedback.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/helm/actions/testdata/chartmuseum.sh` around lines 17 - 25, Apply the same background-start diagnostic pattern used in chartmuseum.sh to chartmuseumWithBasicAuth.sh and chartmuseumWithoutTls.sh: replace the blocking exec launch with a background process, redirect stdout/stderr to a log file, store the PID, and add a short sleep plus kill -0 liveness check. If startup fails, print the captured log and exit early so failures surface before waitForTCP times out; use the existing chartmuseum startup script structure and its PID/log handling as the reference.pkg/helm/actions/testdata/chartmuseumWithBasicAuth.sh (1)
4-11: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winNo post-start failure diagnostics, unlike the TLS variant.
chartmuseum.shnow backgrounds the process, captures logs, and performs a quick liveness check to fail fast on startup errors. This script insteadexecs the binary directly with no log file and no liveness check, so an immediate crash (e.g., port conflict) will only be detected via the 30swaitForTCPtimeout insetup_test.go, with less diagnostic output available.Consider aligning with the TLS script's pattern for consistency and quicker failure detection.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/helm/actions/testdata/chartmuseumWithBasicAuth.sh` around lines 4 - 11, The basic-auth ChartMuseum startup script is missing the same post-start failure diagnostics used by the TLS variant, so startup crashes are only caught later with little context. Update chartmuseumWithBasicAuth.sh to follow the chartmuseum.sh pattern by launching the binary in the background, capturing its logs to a file, and performing an immediate liveness check before continuing. Keep the behavior aligned around the chartmuseum startup flow so failures like port conflicts surface quickly with useful diagnostics.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/helm/actions/testdata/downloadChartmuseum.sh`:
- Line 10: The curl invocation in downloadChartmuseum.sh uses the
CHARTMUSEUM_ARTIFACT_URL variable unquoted, which can trigger word-splitting or
globbing; update the curl command to quote that variable in the
downloadChartmuseum.sh script so the URL is passed as a single argument.
---
Nitpick comments:
In `@pkg/helm/actions/testdata/chartmuseum.sh`:
- Around line 17-25: Apply the same background-start diagnostic pattern used in
chartmuseum.sh to chartmuseumWithBasicAuth.sh and chartmuseumWithoutTls.sh:
replace the blocking exec launch with a background process, redirect
stdout/stderr to a log file, store the PID, and add a short sleep plus kill -0
liveness check. If startup fails, print the captured log and exit early so
failures surface before waitForTCP times out; use the existing chartmuseum
startup script structure and its PID/log handling as the reference.
In `@pkg/helm/actions/testdata/chartmuseumWithBasicAuth.sh`:
- Around line 4-11: The basic-auth ChartMuseum startup script is missing the
same post-start failure diagnostics used by the TLS variant, so startup crashes
are only caught later with little context. Update chartmuseumWithBasicAuth.sh to
follow the chartmuseum.sh pattern by launching the binary in the background,
capturing its logs to a file, and performing an immediate liveness check before
continuing. Keep the behavior aligned around the chartmuseum startup flow so
failures like port conflicts surface quickly with useful diagnostics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f197ee5c-c97c-4bce-b016-ed915226759f
📒 Files selected for processing (5)
pkg/helm/actions/setup_test.gopkg/helm/actions/testdata/chartmuseum.shpkg/helm/actions/testdata/chartmuseumWithBasicAuth.shpkg/helm/actions/testdata/chartmuseumWithoutTls.shpkg/helm/actions/testdata/downloadChartmuseum.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/helm/actions/setup_test.go
|
|
||
| if [ ! -f "$GOOS-$GOARCH/chartmuseum" ]; then | ||
| curl -o chartmuseum.tar.gz -O $CHARTMUSEUM_ARTIFACT_URL | ||
| curl -fL -o chartmuseum.tar.gz $CHARTMUSEUM_ARTIFACT_URL |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Quote the URL variable to avoid globbing/word-splitting.
Shellcheck (SC2086) flags the unquoted $CHARTMUSEUM_ARTIFACT_URL.
🛠️ Proposed fix
-curl -fL -o chartmuseum.tar.gz $CHARTMUSEUM_ARTIFACT_URL
+curl -fL -o chartmuseum.tar.gz "$CHARTMUSEUM_ARTIFACT_URL"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| curl -fL -o chartmuseum.tar.gz $CHARTMUSEUM_ARTIFACT_URL | |
| curl -fL -o chartmuseum.tar.gz "$CHARTMUSEUM_ARTIFACT_URL" |
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 10-10: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/helm/actions/testdata/downloadChartmuseum.sh` at line 10, The curl
invocation in downloadChartmuseum.sh uses the CHARTMUSEUM_ARTIFACT_URL variable
unquoted, which can trigger word-splitting or globbing; update the curl command
to quote that variable in the downloadChartmuseum.sh script so the URL is passed
as a single argument.
Source: Linters/SAST tools
Chartmuseum is alive but never binds to port 9443. Add ss -tlnp output and /proc PID check on timeout to determine if the process is alive, what ports are listening, and whether chartmuseum is blocking or dead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/test backend |
1 similar comment
|
/test backend |
|
/lgtm |
|
@jhadvig: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| conn.Close() | ||
| return nil | ||
| } | ||
| time.Sleep(time.Second) |
There was a problem hiding this comment.
Not a blocker. Should we increase the time in each loop to 3-5 seconds
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jhadvig, sowmya-sl, stefanonardo The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified by CI |
|
@jhadvig: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Analysis / Root cause
The
pkg/helm/actionsbackend test suite was failing consistently in CI with chartmuseum not starting on port 9443. The root cause was a curl flag conflict indownloadChartmuseum.sh: the script used both-o chartmuseum.tar.gzand-O, which in newer curl versions causes the archive to be saved with the remote filename instead ofchartmuseum.tar.gz. The subsequenttar xf chartmuseum.tar.gzsilently extracted nothing, leaving no binary forchartmuseum.shto run.This blocked
ci/prow/backendfor all PRs — including ones with zero backend changes (e.g., #16240, #16351).Solution description
-Oflag so the archive is saved aschartmuseum.tar.gz, add-fLfor proper error handling and redirects, andchmod +xthe extracted binarytime.Sleepcalls with TCP readiness checks (waitForTCP) that verify each server is actually listening before proceeding, with clear error messages on timeoutexecfor cleaner process management in all chartmuseum scriptsTest setup
CI backend job — no cluster needed.
Test cases
ci/prow/backendpassesAdditional info
Reviewers and assignees:
/assign @jhadvig
🤖 Generated with Claude Code
Summary by CodeRabbit