ci: layered Python base images for cross-matrix dedup by richiejp · Pull Request #9672 · mudler/LocalAI

richiejp · 2026-05-05T10:52:45Z

Experiment to see if creating OS+vendor+language base images can massively reduce CI time and build time locally.

The 234-entry backend matrix runs the same apt-update + GPU SDK install +
Python toolchain bootstrap into N independent registry-cache tags. Factor
that shared work out into a tier-1+2 base image (lang × accel × ubuntu ×
cuda) built once per workflow run, then consumed by every backend that
matches its tuple via BASE_IMAGE_PREBUILT.

The matrix data moves to .github/backend-matrix.yaml so backend.yml can
switch to fromJSON without duplicating the matrix. scripts/changed-backends.js
reads the data file, derives the deduplicated bases-matrix, annotates each
Python entry with the right base-image-prebuilt ref, and runs a collision
check that fails loudly if a future matrix change makes two consumers want
incompatible bases under the same tag-stem.

PR builds tag with -pr so end-to-end validation lives within one PR;
master builds tag without the suffix. The base-images registry cache
parallels the existing per-matrix-entry caches.

Adding a new (accel, cuda) flavour is a backend-matrix.yaml edit; adding
a new language tier is a Dockerfile. recipe + a slim of the
consumer Dockerfile (script auto-detects via .docker/bases/).

10 distinct bases derive from the current 234 entries, replacing the
inline bootstrap that previously ran into ~10 separate cache tags.

Assisted-by: Claude:opus-4-7-1m [Claude Code]

The 234-entry backend matrix runs the same apt-update + GPU SDK install + Python toolchain bootstrap into N independent registry-cache tags. Factor that shared work out into a tier-1+2 base image (lang × accel × ubuntu × cuda) built once per workflow run, then consumed by every backend that matches its tuple via BASE_IMAGE_PREBUILT. The matrix data moves to .github/backend-matrix.yaml so backend.yml can switch to fromJSON without duplicating the matrix. scripts/changed-backends.js reads the data file, derives the deduplicated bases-matrix, annotates each Python entry with the right base-image-prebuilt ref, and runs a collision check that fails loudly if a future matrix change makes two consumers want incompatible bases under the same tag-stem. PR builds tag with -pr<N> so end-to-end validation lives within one PR; master builds tag without the suffix. The base-images registry cache parallels the existing per-matrix-entry caches. Adding a new (accel, cuda) flavour is a backend-matrix.yaml edit; adding a new language tier is a Dockerfile.<lang> recipe + a slim of the consumer Dockerfile (script auto-detects via .docker/bases/). 10 distinct bases derive from the current 234 entries, replacing the inline bootstrap that previously ran into ~10 separate cache tags. Assisted-by: Claude:opus-4-7-1m [Claude Code]

Python's tier-1+2 base image (apt + GPU SDK + lang toolchain) was the only lang previously factored. The remaining 82 matrix entries (62 golang + 9 llama-cpp + 9 turboquant + 1 ik-llama-cpp + 1 rust) still inlined the same bootstrap into per-backend cache tags. Add .docker/bases/Dockerfile.{golang,cpp,rust} mirroring Dockerfile.python's GPU stack, with the lang-specific tail at the bottom (Go + protoc + grpc tooling; protoc + cmake + GRPC; rustup + audio dev libs respectively). Slim the five consumer Dockerfiles to FROM ${BASE_IMAGE_PREBUILT} + the per-backend COPY/make. The C++ trio (llama-cpp, ik-llama-cpp, turboquant) only differ in their make targets, so langOf() in scripts/changed-backends.js remaps all three Dockerfile suffixes to the shared 'cpp' base. That collapses 17 would-be distinct bases to 8. langTriggerSelector and baseTriggerFiles are extended so PRs touching the new recipes fan out canaries; the .docker/bases/ auto-detection picks up the new langs without further script changes. Makefile: add docker-build-{python,golang,cpp,rust}-base targets and a local-base-tag/local-base-target macro pair so each backend's docker-build-X chains through the right base. The previous python-only prereq is now a generic per-lang dispatch. Total distinct bases for the full 234-entry matrix: 29 (was 9 with only python factored). The C++ base also absorbs the previously per-consumer GRPC build stage, removing the dominant cost from the llama-cpp / ik-llama-cpp / turboquant rebuild paths. Assisted-by: Claude:opus-4-7-1m [Claude Code]

richiejp added 2 commits May 5, 2026 10:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: layered Python base images for cross-matrix dedup#9672

ci: layered Python base images for cross-matrix dedup#9672
richiejp wants to merge 2 commits intomasterfrom
ci/layered-base-images

richiejp commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

richiejp commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant