feat: use containerd gc for image-fetcher cleanup by awesomenix · Pull Request #8065 · Azure/AgentBaker

awesomenix · 2026-03-10T23:06:04Z

Summary

use containerd's native GC label filtering for the image-fetcher pull path instead of scanning all images
add an image-fetcher GC trigger entrypoint and invoke it once after the VHD preload batch completes
enable discard of unpacked pulled layers during VHD image preloading while preserving fetch-only image blobs

Total of 4GB savings

Before

----------------------------------------------
FILESYSTEM USAGE
----------------------------------------------
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        29G   22G  7.6G  74% /
/dev/sda15      105M  6.1M   99M   6% /boot/efi
/dev/sdb1       590G   32K  560G   1% /mnt

----------------------------------------------
CONTAINER IMAGES (manifest size)
----------------------------------------------
Note: Sizes shown are compressed manifest sizes, not actual disk usage.
Actual unpacked size is in CONTAINERD STORAGE SUMMARY below.

678.6 MiB	mcr.microsoft.com/aks/aks-gpu-cuda:580.126.09-20260126030251
572.7 MiB	mcr.microsoft.com/containernetworking/cilium/cilium-distroless:v1.18.2-251028
303.9 MiB	mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:6.26.0-main-03-05-2026-701eb75f
293.8 KiB	mcr.microsoft.com/oss/kubernetes/pause:3.6
271.3 MiB	mcr.microsoft.com/containernetworking/cilium/cilium:v1.17.9-260304
268.2 MiB	mcr.microsoft.com/containernetworking/cilium/cilium:v1.17.7-250927
225.8 MiB	mcr.microsoft.com/azuremonitor/containerinsights/ciprod:3.1.35
218.3 MiB	mcr.microsoft.com/containernetworking/azure-cns:v1.5.50
198.0 MiB	mcr.microsoft.com/containernetworking/cilium/cilium-distroless-init:v1.18.2-251028
146.5 MiB	mcr.microsoft.com/oss/v2/kubernetes-csi/blob-csi:v1.27.3
146.4 MiB	mcr.microsoft.com/oss/v2/kubernetes-csi/blob-csi:v1.26.10
137.3 MiB	mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:6.26.0-main-03-05-2026-701eb75f-cfg
...
----------------------------------------------
LARGEST DIRECTORIES (over 100MB)
----------------------------------------------
22G	/
16G	/var/lib
16G	/var
**15G	/var/lib/containerd**
8.7G	/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots
8.7G	/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
6.4G	/var/lib/containerd/io.containerd.content.v1.content/blobs/sha256
6.4G	/var/lib/containerd/io.containerd.content.v1.content/blobs
6.4G	/var/lib/containerd/io.containerd.content.v1.content
3.2G	/opt
2.4G	/usr
2.0G	/opt/bin
1.2G	/usr/lib

----------------------------------------------
LARGEST FILES (over 100MB)
----------------------------------------------
651M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/9dc5addc5ba7ff4645a0e263c04e6f53918576118e27d0e73a3649d81c7e6af4
250M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/377a845986761f149403e6e67433e26c2271bce2324dcb495e8627ae74bc7bec
233M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/9baad8468e052bec22ddb4b291b6f5d10cf799f24eef17d7513338d6b290dfb3
211M /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/46/fs/opt/promconfigvalidator
190M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/2eb279a8bf6a99fccbb0a3ecf77ad638db9153f542b23521f9891c9f0a10683f
171M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/a3d7e5c55a03c58a62b41d4d6fdf557f3276cf8d5a46db109634ef6b725c4739
149M /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/47/fs/root/main
147M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/dca397c8a3d3c118b7879eaf732dd747094deb45ca7b88aaae909a3447860851
147M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/5ec57f772744b1e8a4339a16519916ea671abe27687e169d00d489c4f57ac282
142M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/34c1266aac112361cc7176c60fccef25ede6b4d663eb4eaa392c0c41f70b4e72
131M /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/9/fs/dropgz
130M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/6000f595976ce302e5113760234d3a218bf238bb84dfa7517e3b5f340c98c785
129M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/f3ef7a71edd860fc98756d146d7986046e96b44545860048e6761e45a24ca859
128M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/aca562877b78fbc76a473a44d71bf62a8d3dda28b2f8e597bcd48723f8a5a3cf
127M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/31e819875a97446da6c75bf5e3085f2839f0c530aa202f39a8fb2b09f2575e5f
123M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/c8a61cbce41ed0ab2b72bf0cef93ccba6f03f88f0557404433fdd4fa6df738f8
108M /opt/bin/kubelet-1.29.101-akslts
108M /opt/bin/kubelet-1.29.100-akslts
106M /opt/bin/kubelet-1.28.103-akslts
106M /opt/bin/kubelet-1.28.102-akslts
105M /usr/lib/x86_64-linux-gnu/libbcc.so.0.29.0
103M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/fc62a4c9b2ee3031d7f3afb3b38009149e2b055040b4c4a4ebbc01ca8615853a
103M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/5dd5cd7c7bf44b7289351b49421a5d0ce7e286a3b66d65d8a2b0320b5e1840d1

----------------------------------------------
/opt BREAKDOWN
----------------------------------------------
2.0G	/opt/bin/
594M	/opt/cni/
112M	/opt/azure/
105M	/opt/overlaybd/
101M	/opt/kubectl/
67M	/opt/acr/
61M	/opt/containerd/
52M	/opt/kubelet/
47M	/opt/credentialprovider/
31M	/opt/azure-acr-credential-provider/
30M	/opt/nvidia-device-plugin/
12M	/opt/dcgm-exporter/
11M	/opt/datacenter-gpu-manager-4-core/
6.9M	/opt/microsoft/
5.4M	/opt/datacenter-gpu-manager-4-proprietary/
172K	/opt/bpftrace/
48K	/opt/scripts/
8.0K	/opt/kubernetes/
8.0K	/opt/azure-network/
4.0K	/opt/oras/
4.0K	/opt/gpu/
4.0K	/opt/certs/
4.0K	/opt/aks-secure-tls-bootstrap-client/
4.0K	/opt/actions/

----------------------------------------------
CONTAINERD STORAGE SUMMARY
----------------------------------------------
Content store (compressed blobs): 6.4G
Snapshotter (unpacked layers):    8.7G

After

----------------------------------------------
FILESYSTEM USAGE
----------------------------------------------
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        29G   18G   12G  62% /
/dev/sda15      105M  6.1M   99M   6% /boot/efi
/dev/sdb1       590G   32K  560G   1% /mnt

----------------------------------------------
CONTAINER IMAGES (manifest size)
----------------------------------------------
Note: Sizes shown are compressed manifest sizes, not actual disk usage.

678.6 MiB	mcr.microsoft.com/aks/aks-gpu-cuda:580.126.09-20260126030251
572.7 MiB	mcr.microsoft.com/containernetworking/cilium/cilium-distroless:v1.18.2-251028
303.9 MiB	mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:6.26.0-main-03-05-2026-701eb75f
271.3 MiB	mcr.microsoft.com/containernetworking/cilium/cilium:v1.17.9-260304
268.2 MiB	mcr.microsoft.com/containernetworking/cilium/cilium:v1.17.7-250927
225.8 MiB	mcr.microsoft.com/azuremonitor/containerinsights/ciprod:3.1.35
218.3 MiB	mcr.microsoft.com/containernetworking/azure-cns:v1.5.50
198.0 MiB	mcr.microsoft.com/containernetworking/cilium/cilium-distroless-init:v1.18.2-251028

----------------------------------------------
LARGEST DIRECTORIES (over 100MB)
----------------------------------------------
18G	/
13G	/var
**12G	/var/lib/containerd**
12G	/var/lib
8.7G	/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots
8.7G	/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
3.2G	/opt
2.7G	/var/lib/containerd/io.containerd.content.v1.content/blobs/sha256
2.7G	/var/lib/containerd/io.containerd.content.v1.content/blobs
2.7G	/var/lib/containerd/io.containerd.content.v1.content
2.4G	/usr
2.0G	/opt/bin
1.2G	/usr/lib

----------------------------------------------
LARGEST FILES (over 100MB)
----------------------------------------------
651M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/9dc5addc5ba7ff4645a0e263c04e6f53918576118e27d0e73a3649d81c7e6af4
250M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/377a845986761f149403e6e67433e26c2271bce2324dcb495e8627ae74bc7bec
233M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/9baad8468e052bec22ddb4b291b6f5d10cf799f24eef17d7513338d6b290dfb3
211M /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/47/fs/opt/promconfigvalidator
190M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/2eb279a8bf6a99fccbb0a3ecf77ad638db9153f542b23521f9891c9f0a10683f
171M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/a3d7e5c55a03c58a62b41d4d6fdf557f3276cf8d5a46db109634ef6b725c4739
149M /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/50/fs/root/main
142M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/34c1266aac112361cc7176c60fccef25ede6b4d663eb4eaa392c0c41f70b4e72
131M /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/10/fs/dropgz
130M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/6000f595976ce302e5113760234d3a218bf238bb84dfa7517e3b5f340c98c785
108M /opt/bin/kubelet-1.29.101-akslts
108M /opt/bin/kubelet-1.29.100-akslts
106M /opt/bin/kubelet-1.28.103-akslts
106M /opt/bin/kubelet-1.28.102-akslts
105M /usr/lib/x86_64-linux-gnu/libbcc.so.0.29.0
103M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/fc62a4c9b2ee3031d7f3afb3b38009149e2b055040b4c4a4ebbc01ca8615853a
103M /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/5dd5cd7c7bf44b7289351b49421a5d0ce7e286a3b66d65d8a2b0320b5e1840d1

----------------------------------------------
/opt BREAKDOWN
----------------------------------------------
2.0G	/opt/bin/
594M	/opt/cni/
112M	/opt/azure/
105M	/opt/overlaybd/
101M	/opt/kubectl/
67M	/opt/acr/
61M	/opt/containerd/
52M	/opt/kubelet/
47M	/opt/credentialprovider/
31M	/opt/azure-acr-credential-provider/
30M	/opt/nvidia-device-plugin/
12M	/opt/dcgm-exporter/
11M	/opt/datacenter-gpu-manager-4-core/
6.9M	/opt/microsoft/
5.4M	/opt/datacenter-gpu-manager-4-proprietary/
172K	/opt/bpftrace/
48K	/opt/scripts/
8.0K	/opt/kubernetes/
8.0K	/opt/azure-network/
4.0K	/opt/oras/
4.0K	/opt/gpu/
4.0K	/opt/certs/
4.0K	/opt/aks-secure-tls-bootstrap-client/
4.0K	/opt/actions/

----------------------------------------------
CONTAINERD STORAGE SUMMARY
----------------------------------------------
Content store (compressed blobs): 2.7G
Snapshotter (unpacked layers):    8.7G

Copilot

Pull request overview

This PR updates the VHD image preloading flow to rely on containerd’s native GC mechanics (via GC label filtering) rather than scanning images for cleanup, and adds an explicit GC trigger once the preload batch completes.

Changes:

Add an --gc mode to image-fetcher that triggers containerd GC via lease create/delete.
Update the image-fetcher pull path to apply containerd GC child-label filtering for layer blobs.
Invoke image-fetcher --gc once after all parallel image pulls complete during VHD build.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`vhdbuilder/packer/install-dependencies.sh`	Runs a single post-batch `image-fetcher --gc` after the image preload jobs finish.
`image-fetcher/main.go`	Adds `--gc` entrypoint and applies `WithChildLabelMap(images.ChildGCLabelsFilterLayers)` for the pull/unpack path.

Copilot · 2026-03-10T23:10:20Z

vhdbuilder/packer/install-dependencies.sh

@@ -699,6 +699,7 @@ while IFS= read -r imageToBePulled; do
 done <<< "$ContainerImages"
 echo "Waiting for container image pulls to finish. PID: ${image_pids[@]}"
 wait ${image_pids[@]}


wait ${image_pids[@]} only returns the exit status of the last PID waited for, so failures in earlier background pulls can be missed (and with set -e the script would still proceed to run image-fetcher --gc). Consider iterating over PIDs (or looping wait -n until all jobs finish) and tracking any non-zero exit status so the build fails reliably before triggering GC.

Suggested change

wait ${image_pids[@]}

overall_status=0

for pid in "${image_pids[@]}"; do

if ! wait "$pid"; then

status=$?

echo "Container image pull job with PID $pid failed with exit code $status" >&2

overall_status=$status

fi

done

if [ "$overall_status" -ne 0 ]; then

echo "One or more container image pulls failed; skipping image-fetcher GC and exiting with code $overall_status" >&2

exit "$overall_status"

fi

@awesomenix this is actually a nice fix, there is a bug with the wait logic and causes the retries to overrun each other in the background.

Yes i have moved this logic to go, since its easier to control and reason about. But it was too much to be part of this PR, in short Coming Soon ™️

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

image-fetcher/main.go

vhdbuilder/packer/install-dependencies.sh

image-fetcher/main.go

Mark unpacked pulled layers as GC-eligible during VHD image preloading and trigger a single synchronous containerd GC after the preload batch. This avoids scanning all images per pull while preserving fetch-only image blobs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

awesomenix · 2026-03-12T01:23:18Z

Wow! i was debugging a regression in ArtifactStreaming, and indeed this PR broke it

Yep — those containerd logs fit the PR #8065 regression.

  The key signal is RunPodSandbox + apply failure, attempting cleanup: that’s snapshot unpack/apply failing, not kubelet itself. With ArtifactStreaming, your config
  switches the sandbox image to use snapshotter = "overlaybd" globally. PR #8065 changed VHD preloads to pull small images like pause:3.6 with 
  WithChildLabelMap(images.ChildGCLabelsFilterLayers) and then explicitly runs image-fetcher --gc, which makes layer blobs GC-eligible.

  So the VHD keeps an old unpack for the build-time snapshotter, but the overlaybd runtime later needs those blobs and can’t re-apply them.

  Best fix: don’t GC/filter layers for pause on ArtifactStreaming-capable VHDs.

awesomenix · 2026-03-12T01:24:23Z

@djsly this is why we need to have blobs around because it can break artifact streaming.

Copilot AI review requested due to automatic review settings March 10, 2026 23:06

awesomenix requested review from AbelHu, Devinwong, calvin197, cameronmeissner, djsly, ganeshkumarashok, juan-lee, junjiezhang1997, lilypan26, mxj220, pdamianov-dev, phealy, r2k1, sulixu, surajssd, timmy-wright and zachary-bailey as code owners March 10, 2026 23:06

awesomenix temporarily deployed to test March 10, 2026 23:06 — with GitHub Actions Inactive

Copilot started reviewing on behalf of awesomenix March 10, 2026 23:07 View session

awesomenix changed the title ~~Use containerd GC for image-fetcher cleanup~~ feat: use containerd gc for image-fetcher cleanup Mar 10, 2026

Copilot AI reviewed Mar 10, 2026

View reviewed changes

awesomenix force-pushed the image-fetcher-gc-cleanup branch from 632f382 to 87a653f Compare March 11, 2026 00:07

awesomenix temporarily deployed to test March 11, 2026 00:07 — with GitHub Actions Inactive

Copilot AI review requested due to automatic review settings March 11, 2026 06:35

awesomenix force-pushed the image-fetcher-gc-cleanup branch from 87a653f to 47cb3be Compare March 11, 2026 06:35

awesomenix temporarily deployed to test March 11, 2026 06:35 — with GitHub Actions Inactive

Copilot started reviewing on behalf of awesomenix March 11, 2026 06:35 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

image-fetcher/main.go Show resolved Hide resolved

vhdbuilder/packer/install-dependencies.sh Outdated Show resolved Hide resolved

image-fetcher/main.go Show resolved Hide resolved

image-fetcher/main.go Show resolved Hide resolved

awesomenix force-pushed the image-fetcher-gc-cleanup branch from 47cb3be to 4641b84 Compare March 11, 2026 07:29

awesomenix temporarily deployed to test March 11, 2026 07:29 — with GitHub Actions Inactive

Copilot AI review requested due to automatic review settings March 11, 2026 15:21

awesomenix force-pushed the image-fetcher-gc-cleanup branch from 4641b84 to ae63f70 Compare March 11, 2026 15:21

awesomenix temporarily deployed to test March 11, 2026 15:21 — with GitHub Actions Inactive

Copilot started reviewing on behalf of awesomenix March 11, 2026 15:21 View session

awesomenix force-pushed the image-fetcher-gc-cleanup branch from ae63f70 to 5b21f47 Compare March 11, 2026 15:22

awesomenix temporarily deployed to test March 11, 2026 15:22 — with GitHub Actions Inactive

Copilot AI reviewed Mar 11, 2026

View reviewed changes

awesomenix closed this Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use containerd gc for image-fetcher cleanup#8065

feat: use containerd gc for image-fetcher cleanup#8065
awesomenix wants to merge 1 commit intomainfrom
image-fetcher-gc-cleanup

awesomenix commented Mar 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

djsly Mar 11, 2026

Uh oh!

awesomenix Mar 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

awesomenix commented Mar 12, 2026

Uh oh!

awesomenix commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-wait ${image_pids[@]}
+overall_status=0
+for pid in "${image_pids[@]}"; do
+  if ! wait "$pid"; then
+    status=$?
+    echo "Container image pull job with PID $pid failed with exit code $status" >&2
+    overall_status=$status
+  fi
+done
+if [ "$overall_status" -ne 0 ]; then
+  echo "One or more container image pulls failed; skipping image-fetcher GC and exiting with code $overall_status" >&2
+  exit "$overall_status"
+fi

Conversation

awesomenix commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before

After

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

djsly Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

awesomenix Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

awesomenix commented Mar 12, 2026

Uh oh!

awesomenix commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

awesomenix commented Mar 10, 2026 •

edited

Loading