feat: use containerd gc for image-fetcher cleanup#8065
feat: use containerd gc for image-fetcher cleanup#8065awesomenix wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the VHD image preloading flow to rely on containerd’s native GC mechanics (via GC label filtering) rather than scanning images for cleanup, and adds an explicit GC trigger once the preload batch completes.
Changes:
- Add an
--gcmode toimage-fetcherthat triggers containerd GC via lease create/delete. - Update the
image-fetcherpull path to apply containerd GC child-label filtering for layer blobs. - Invoke
image-fetcher --gconce after all parallel image pulls complete during VHD build.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
vhdbuilder/packer/install-dependencies.sh |
Runs a single post-batch image-fetcher --gc after the image preload jobs finish. |
image-fetcher/main.go |
Adds --gc entrypoint and applies WithChildLabelMap(images.ChildGCLabelsFilterLayers) for the pull/unpack path. |
| @@ -699,6 +699,7 @@ while IFS= read -r imageToBePulled; do | |||
| done <<< "$ContainerImages" | |||
| echo "Waiting for container image pulls to finish. PID: ${image_pids[@]}" | |||
| wait ${image_pids[@]} | |||
There was a problem hiding this comment.
wait ${image_pids[@]} only returns the exit status of the last PID waited for, so failures in earlier background pulls can be missed (and with set -e the script would still proceed to run image-fetcher --gc). Consider iterating over PIDs (or looping wait -n until all jobs finish) and tracking any non-zero exit status so the build fails reliably before triggering GC.
| wait ${image_pids[@]} | |
| overall_status=0 | |
| for pid in "${image_pids[@]}"; do | |
| if ! wait "$pid"; then | |
| status=$? | |
| echo "Container image pull job with PID $pid failed with exit code $status" >&2 | |
| overall_status=$status | |
| fi | |
| done | |
| if [ "$overall_status" -ne 0 ]; then | |
| echo "One or more container image pulls failed; skipping image-fetcher GC and exiting with code $overall_status" >&2 | |
| exit "$overall_status" | |
| fi |
There was a problem hiding this comment.
@awesomenix this is actually a nice fix, there is a bug with the wait logic and causes the retries to overrun each other in the background.
There was a problem hiding this comment.
Yes i have moved this logic to go, since its easier to control and reason about. But it was too much to be part of this PR, in short Coming Soon ™️
632f382 to
87a653f
Compare
87a653f to
47cb3be
Compare
47cb3be to
4641b84
Compare
4641b84 to
ae63f70
Compare
Mark unpacked pulled layers as GC-eligible during VHD image preloading and trigger a single synchronous containerd GC after the preload batch. This avoids scanning all images per pull while preserving fetch-only image blobs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ae63f70 to
5b21f47
Compare
|
Wow! i was debugging a regression in ArtifactStreaming, and indeed this PR broke it |
|
@djsly this is why we need to have blobs around because it can break artifact streaming. |
Summary
Total of 4GB savings
Before
After