diff --git a/.agents/ci-caching.md b/.agents/ci-caching.md
index c1127b65c9d0..1c8c65470ede 100644
--- a/.agents/ci-caching.md
+++ b/.agents/ci-caching.md
@@ -101,6 +101,161 @@ For ccache, the workflow exports `CMAKE_ARGS=… -DCMAKE_C_COMPILER_LAUNCHER=cca
 
 GitHub Actions caches are limited to 10 GB per repo. Steady-state worst case: ~800 MB Go cache + ~2 GB brew Cellar + up to 2 GB ccache + ~1.5 GB × 5 python backends. If the cap is hit, prefer collapsing the per-backend Python keys into a shared `pyenv-darwin-shared-<week>` key (accepts more cross-backend churn for a smaller footprint) before reducing other caches.
 
+## Layered base images (`localai-base`)
+
+The registry-backed BuildKit cache deduplicates **within** a matrix entry's
+cache tag, but each matrix entry has its own tag — so the same `apt-get`,
+GPU SDK install, and language toolchain bootstrap runs into N different
+cache tags across the backend matrix. The `localai-base` images factor that
+shared work out of the per-backend builds.
+
+### How it fits together
+
+```
+.github/backend-matrix.yaml          # raw matrix data (linux + darwin)
+   │
+   ▼
+backend.yml / backend_pr.yml
+  ├── derive-bases / generate-matrix
+  │     scripts/changed-backends.js
+  │       reads .github/backend-matrix.yaml
+  │       (PR mode also reads changed files)
+  │       emits:
+  │         - matrix         (annotated with base-image-prebuilt)
+  │         - matrix-darwin
+  │         - bases-matrix   (deduplicated by tag-stem)
+  │
+  ├── build-bases  (matrix: bases-matrix)
+  │     uses base_images.yml
+  │       FROM .docker/bases/Dockerfile.<lang>
+  │       pushes quay.io/go-skynet/localai-base:<stem>[-pr<N>]
+  │
+  └── backend-jobs  (matrix: matrix; needs build-bases)
+        uses backend_build.yml
+          FROM ${BASE_IMAGE_PREBUILT}
+            i.e. quay.io/go-skynet/localai-base:<stem>[-pr<N>]
+          only the backend source COPY + `make` remain.
+```
+
+The base image is **always** built before backends consume it, in the same
+workflow run. There is no cross-workflow dependency, no chicken-and-egg
+on first push, and no manual matrix to keep in sync — adding a backend
+matrix entry is just an edit to `.github/backend-matrix.yaml`.
+
+### Tag scheme
+
+`<stem>` is computed by `tagStem()` in `scripts/changed-backends.js` from
+the (lang, build-type, ubuntu, cuda, base-image) tuple. Arch is
+intentionally NOT in the stem — bases are built multi-arch when any
+consumer needs multi-arch, and single-arch otherwise (the `platforms`
+field on each base entry is the union of its consumers' platforms).
+
+| Build-type | Stem template |
+|---|---|
+| `''` (CPU) | `<lang>-cpu-<ubuntu>[-<base-image-slug>]` |
+| `cublas` / `l4t` | `<lang>-<build-type>-<ubuntu>-cuda<major>.<minor>[-<base-image-slug>]` |
+| anything else (vulkan, hipblas, intel, sycl_*) | `<lang>-<build-type>-<ubuntu>[-<base-image-slug>]` |
+
+The base-image slug is empty for the default `ubuntu:24.04` and a short
+parseable suffix otherwise (`jetpack-r36.4.0`, `rocm-7.2.1`,
+`oneapi-2025.3.2`, etc.).
+
+| Event | Pushed tag |
+|---|---|
+| `push` (master/tag) | `:<stem>` |
+| `pull_request` | `:<stem>-pr<PR_NUMBER>` |
+
+The cache for the base build itself lives at
+`quay.io/go-skynet/ci-cache:base-<stem>` (`mode=max,ignore-error=true`),
+parallel to the per-matrix-entry caches.
+
+The script also runs a collision check across consumers of each stem: if
+two consumers map to the same stem but disagree on `base-image` or
+`skip-drivers` (and skip-drivers is meaningful for that build-type), the
+script fails loudly. Resolve by encoding the differing input in
+`tagStem()` rather than letting the dedup silently pick a winner.
+
+### PR testability
+
+PRs run the same pipeline as master: derive bases → build bases (tagged
+`-pr<N>`) → run filtered backend matrix consuming those `-pr<N>` tags.
+End-to-end validation always lives within the PR.
+
+For PRs that only change `.docker/bases/Dockerfile.<lang>` (no backend
+source touched), `changed-backends.js` adds one canary backend matrix
+entry per (lang × build-type × arch × cuda × ubuntu) tuple to the filtered
+matrix so each base flavour gets exercised.
+
+### Existing language tiers
+
+| Tier (lang) | Recipe | Consumer Dockerfile(s) | Distinct stems |
+|---|---|---|---|
+| `python` | `.docker/bases/Dockerfile.python` | `backend/Dockerfile.python` | 9 |
+| `golang` | `.docker/bases/Dockerfile.golang` | `backend/Dockerfile.golang` | 8 |
+| `cpp` | `.docker/bases/Dockerfile.cpp` (apt + GPU + protoc + cmake + GRPC) | `backend/Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}` | 8 |
+| `rust` | `.docker/bases/Dockerfile.rust` | `backend/Dockerfile.rust` | 1 |
+
+The C++ trio share a single `cpp` base because they only differ in their
+per-backend `make` targets. `langOf()` in `scripts/changed-backends.js`
+remaps `Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}` → `cpp` so dedup
+works across the trio. If a future C++ consumer needs a *different* base
+(e.g. without GRPC, or with a different protoc version), give it its own
+`Dockerfile.<newlang>` recipe and remove it from the cpp remap.
+
+### Adding a new (accel × arch × cuda × lang) flavour
+
+Just add the matrix entry to `.github/backend-matrix.yaml` for the new
+flavour. The bases matrix and the per-entry `base-image-prebuilt` are
+derived automatically by `scripts/changed-backends.js`. Nothing else to
+change.
+
+### Adding a new language tier
+
+1. Create `.docker/bases/Dockerfile.<lang>` mirroring an existing tier
+   (apt + accel install + lang-specific toolchain).
+2. Slim `backend/Dockerfile.<lang>` to `FROM ${BASE_IMAGE_PREBUILT}` plus
+   the per-backend source COPY + build (no inline accel install).
+3. Add the new recipe to `baseTriggerFiles` in
+   `scripts/changed-backends.js` so PRs touching it fan out to canaries.
+4. Add `<lang>: (item) => item.dockerfile.endsWith("<lang>")` to
+   `langTriggerSelector` in the same file.
+5. Add a `LOCAL_BASE_<LANG>_TAG`, a `docker-build-<lang>-base` target,
+   and a clause in `local-base-tag` / `local-base-target` in `Makefile`.
+
+The `langsWithBase` set in `scripts/changed-backends.js` is auto-detected
+from the `.docker/bases/` directory at script startup, so step 1 alone is
+enough for the script to start emitting bases (and annotating matrix
+entries with `base-image-prebuilt`) for that lang. Steps 3–5 plug it
+into the canary fan-out and the local-build path.
+
+### Why not just rely on `mode=max` cache?
+
+`mode=max` deduplicates at the layer level, but each matrix entry has its
+own cache tag (`cache<tag-suffix>`). A change that invalidates the GPU SDK
+layer in one backend does not invalidate it in any other; each entry pays
+the full cost on its next rebuild. The shared base image is built once per
+(accel × arch × cuda × lang), then pulled by every backend that consumes
+it — that's the actual cross-matrix dedup.
+
+### Local builds
+
+All `backend/Dockerfile.{python,golang,cpp,rust}` consumers require
+`BASE_IMAGE_PREBUILT` (no inline fallback). The Makefile wires the right
+`docker-build-<lang>-base` as a prerequisite for each backend's
+`docker-build-<backend>` target, so:
+
+```bash
+# Build any backend; the matching base is built first if needed.
+make docker-build-vllm BUILD_TYPE=cublas CUDA_MAJOR_VERSION=12 CUDA_MINOR_VERSION=8
+make docker-build-llama-cpp BUILD_TYPE=cublas CUDA_MAJOR_VERSION=13 CUDA_MINOR_VERSION=0
+make docker-build-rerankers   # golang
+make docker-build-kokoros     # rust
+```
+
+Or build a base directly: `make docker-build-{python,golang,cpp,rust}-base
+BUILD_TYPE=...`. Or pull a pre-built one from quay if it exists for your
+target tuple.
+
 ## Touching the cache pipeline
 
 When changing `image_build.yml`, `backend_build.yml`, or any of the `backend/Dockerfile.*` files:
@@ -109,3 +264,4 @@ When changing `image_build.yml`, `backend_build.yml`, or any of the `backend/Doc
 2. **Keep `tag-suffix` unique per matrix entry** — it's the cache namespace. Two matrix entries sharing a tag-suffix would clobber each other's cache.
 3. **Keep `cache-to` gated on `github.event_name != 'pull_request'`** — PRs must not write.
 4. **Keep `ignore-error=true` on `cache-to`** — quay registry hiccups must not fail builds.
+5. **`tagStem()` in `scripts/changed-backends.js` is the single source of truth for base image tags.** The matrix entries are annotated with `base-image-prebuilt` in the same script run; backend-jobs reads the value as-is. There's no parallel YAML expression to keep in sync. Adding a new dimension to the stem (e.g. a slug for a new base-image variant) is a script change only.
diff --git a/.docker/bases/Dockerfile.cpp b/.docker/bases/Dockerfile.cpp
new file mode 100644
index 000000000000..e7ab763bb3d5
--- /dev/null
+++ b/.docker/bases/Dockerfile.cpp
@@ -0,0 +1,259 @@
+# Shared C++ + accelerator base image for the llama-cpp / ik-llama-cpp /
+# turboquant trio. They differ only in their Makefile targets at build
+# time; the apt + GPU SDK + protoc + cmake + GRPC install is identical.
+#
+# Built once per (build-type, arch, ubuntu-version, cuda-version) combination
+# by .github/workflows/base_images.yml and pushed to
+# quay.io/go-skynet/localai-base:<tag-stem>[-pr<N>]. Consumed by
+# backend/Dockerfile.{llama-cpp,ik-llama-cpp,turboquant} via the
+# BASE_IMAGE_PREBUILT build-arg. See .agents/ci-caching.md.
+
+ARG BASE_IMAGE=ubuntu:24.04
+ARG APT_MIRROR=""
+ARG APT_PORTS_MIRROR=""
+
+FROM ${BASE_IMAGE} AS grpc
+
+ARG GRPC_MAKEFLAGS="-j4 -Otarget"
+ARG GRPC_VERSION=v1.65.0
+ARG CMAKE_FROM_SOURCE=false
+# CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues
+ARG CMAKE_VERSION=3.31.10
+ARG APT_MIRROR
+ARG APT_PORTS_MIRROR
+
+ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
+
+WORKDIR /build
+
+RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
+    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
+    apt-get update && \
+    apt-get install -y --no-install-recommends \
+        ca-certificates \
+        build-essential curl libssl-dev \
+        git wget && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN <<EOT bash
+    if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
+        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
+    else
+        apt-get update && \
+        apt-get install -y \
+            cmake && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+# Build GRPC into /opt/grpc so we can copy it into the final base without
+# pulling in the full source tree. Mirrors the original two-stage layout in
+# Dockerfile.llama-cpp; absorbing it here means consumers no longer pay the
+# GRPC compile cost.
+RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
+    mkdir -p /build/grpc/cmake/build && \
+    cd /build/grpc/cmake/build && \
+    sed -i "216i\  TESTONLY" "../../third_party/abseil-cpp/absl/container/CMakeLists.txt" && \
+    cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX:PATH=/opt/grpc ../.. && \
+    make && \
+    make install && \
+    rm -rf /build
+
+
+FROM ${BASE_IMAGE}
+
+ARG CMAKE_FROM_SOURCE=false
+ARG CMAKE_VERSION=3.31.10
+ARG BUILD_TYPE
+ENV BUILD_TYPE=${BUILD_TYPE}
+ARG CUDA_MAJOR_VERSION
+ARG CUDA_MINOR_VERSION
+ARG SKIP_DRIVERS=false
+ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
+ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
+ENV DEBIAN_FRONTEND=noninteractive
+ARG TARGETARCH
+ARG TARGETVARIANT
+ARG UBUNTU_VERSION=2404
+ARG APT_MIRROR
+ARG APT_PORTS_MIRROR
+
+LABEL org.opencontainers.image.source="https://github.com/mudler/LocalAI"
+LABEL org.opencontainers.image.description="LocalAI C++ (llama-cpp/ik-llama-cpp/turboquant) base image"
+LABEL org.localai.base.lang="cpp"
+
+RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
+    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
+    apt-get update && \
+    apt-get install -y --no-install-recommends \
+        build-essential \
+        ccache git \
+        ca-certificates \
+        make \
+        pkg-config libcurl4-openssl-dev \
+        curl unzip \
+        libssl-dev wget && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# Cuda
+ENV PATH=/usr/local/cuda/bin:${PATH}
+
+# HipBLAS requirements
+ENV PATH=/opt/rocm/bin:${PATH}
+
+
+# Vulkan requirements
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils wget gpg-agent && \
+        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
+            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
+            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
+            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
+            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
+            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
+        if [ "amd64" = "$TARGETARCH" ]; then
+            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
+            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            mkdir -p /opt/vulkan-sdk && \
+            mv 1.4.335.0 /opt/vulkan-sdk/ && \
+            cd /opt/vulkan-sdk/1.4.335.0 && \
+            ./vulkansdk --no-deps --maxjobs \
+                vulkan-loader \
+                vulkan-validationlayers \
+                vulkan-extensionlayer \
+                vulkan-tools \
+                shaderc && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
+            rm -rf /opt/vulkan-sdk
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            mkdir vulkan && cd vulkan && \
+            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
+            tar -xvf vulkan-sdk.tar.xz && \
+            rm vulkan-sdk.tar.xz && \
+            cd 1.4.335.0 && \
+            cp -rfv aarch64/bin/* /usr/bin/ && \
+            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
+            cp -rfv aarch64/include/* /usr/include/ && \
+            cp -rfv aarch64/share/* /usr/share/ && \
+            cd ../.. && \
+            rm -rf vulkan
+        fi
+        ldconfig && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+# CuBLAS requirements
+RUN <<EOT bash
+    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils
+        if [ "amd64" = "$TARGETARCH" ]; then
+            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
+            else
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
+            fi
+        fi
+        dpkg -i cuda-keyring_1.1-1_all.deb && \
+        rm -f cuda-keyring_1.1-1_all.deb && \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
+            apt-get install -y --no-install-recommends \
+            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        fi
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+
+# https://github.com/NVIDIA/Isaac-GR00T/issues/343
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
+        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
+        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get install -y nvpl
+    fi
+EOT
+
+# If we are building with clblas support, we need the libraries for the builds
+RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            libclblast-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* \
+    ; fi
+
+RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            hipblas-dev \
+            hipblaslt-dev \
+            rocblas-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* && \
+        ldconfig && \
+        echo "rocBLAS library data architectures:" && \
+        (ls /opt/rocm*/lib/rocblas/library/Kernels* 2>/dev/null || ls /opt/rocm*/lib64/rocblas/library/Kernels* 2>/dev/null) | grep -oP 'gfx[0-9a-z+-]+' | sort -u || \
+        echo "WARNING: No rocBLAS kernel data found" \
+    ; fi
+
+# Install protoc (the version in 22.04 is too old, and grpc's bundled protoc
+# would pull in a newer absl that breaks stablediffusion).
+RUN <<EOT bash
+    if [ "amd64" = "$TARGETARCH" ]; then
+        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
+        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+        rm protoc.zip
+    fi
+    if [ "arm64" = "$TARGETARCH" ]; then
+        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
+        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+        rm protoc.zip
+    fi
+EOT
+
+# Install CMake (the version in 22.04 is too old)
+RUN <<EOT bash
+    if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
+        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
+    else
+        apt-get update && \
+        apt-get install -y \
+            cmake && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+COPY --from=grpc /opt/grpc /usr/local
diff --git a/.docker/bases/Dockerfile.golang b/.docker/bases/Dockerfile.golang
new file mode 100644
index 000000000000..2bd934395c9d
--- /dev/null
+++ b/.docker/bases/Dockerfile.golang
@@ -0,0 +1,200 @@
+# Shared Go + accelerator base image.
+#
+# Built once per (build-type, arch, ubuntu-version, cuda-version) combination
+# by .github/workflows/base_images.yml and pushed to
+# quay.io/go-skynet/localai-base:<tag-stem>[-pr<N>]. Consumed by
+# backend/Dockerfile.golang via the BASE_IMAGE_PREBUILT build-arg.
+#
+# Mirrors the GPU stack stanzas in Dockerfile.python; the language-specific
+# tail at the bottom installs Go + grpc tooling. See .agents/ci-caching.md.
+
+ARG BASE_IMAGE=ubuntu:24.04
+ARG APT_MIRROR=""
+ARG APT_PORTS_MIRROR=""
+
+FROM ${BASE_IMAGE}
+
+ARG BUILD_TYPE
+ENV BUILD_TYPE=${BUILD_TYPE}
+ARG CUDA_MAJOR_VERSION
+ARG CUDA_MINOR_VERSION
+ARG SKIP_DRIVERS=false
+ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
+ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
+ENV DEBIAN_FRONTEND=noninteractive
+ARG TARGETARCH
+ARG TARGETVARIANT
+ARG GO_VERSION=1.25.4
+ARG UBUNTU_VERSION=2404
+ARG APT_MIRROR
+ARG APT_PORTS_MIRROR
+
+LABEL org.opencontainers.image.source="https://github.com/mudler/LocalAI"
+LABEL org.opencontainers.image.description="LocalAI Go+accelerator base image"
+LABEL org.localai.base.lang="golang"
+
+RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
+    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
+    apt-get update && \
+    apt-get install -y --no-install-recommends \
+        build-essential \
+        gcc-14 g++-14 \
+        git ccache \
+        ca-certificates \
+        make cmake wget libopenblas-dev \
+        curl unzip \
+        libssl-dev && \
+    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-14 100 \
+        --slave /usr/bin/g++ g++ /usr/bin/g++-14 \
+        --slave /usr/bin/gcov gcov /usr/bin/gcov-14 && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# Cuda
+ENV PATH=/usr/local/cuda/bin:${PATH}
+
+# HipBLAS requirements
+ENV PATH=/opt/rocm/bin:${PATH}
+
+# Vulkan requirements
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils wget gpg-agent && \
+        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
+            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
+            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
+            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
+            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
+            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
+        if [ "amd64" = "$TARGETARCH" ]; then
+            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
+            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            mkdir -p /opt/vulkan-sdk && \
+            mv 1.4.335.0 /opt/vulkan-sdk/ && \
+            cd /opt/vulkan-sdk/1.4.335.0 && \
+            ./vulkansdk --no-deps --maxjobs \
+                vulkan-loader \
+                vulkan-validationlayers \
+                vulkan-extensionlayer \
+                vulkan-tools \
+                shaderc && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
+            rm -rf /opt/vulkan-sdk
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            mkdir vulkan && cd vulkan && \
+            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
+            tar -xvf vulkan-sdk.tar.xz && \
+            rm vulkan-sdk.tar.xz && \
+            cd 1.4.335.0 && \
+            cp -rfv aarch64/bin/* /usr/bin/ && \
+            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
+            cp -rfv aarch64/include/* /usr/include/ && \
+            cp -rfv aarch64/share/* /usr/share/ && \
+            cd ../.. && \
+            rm -rf vulkan
+        fi
+        ldconfig && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+# CuBLAS requirements
+RUN <<EOT bash
+    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils
+        if [ "amd64" = "$TARGETARCH" ]; then
+            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
+            else
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
+            fi
+        fi
+        dpkg -i cuda-keyring_1.1-1_all.deb && \
+        rm -f cuda-keyring_1.1-1_all.deb && \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
+            apt-get install -y --no-install-recommends \
+            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        fi
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+
+# https://github.com/NVIDIA/Isaac-GR00T/issues/343
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
+        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
+        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get install -y nvpl
+    fi
+EOT
+
+# If we are building with clblas support, we need the libraries for the builds
+RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            libclblast-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* \
+    ; fi
+
+RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            hipblas-dev \
+            hipblaslt-dev \
+            rocblas-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* && \
+        ldconfig \
+    ; fi
+
+# Install Go
+RUN curl -L -s https://go.dev/dl/go${GO_VERSION}.linux-${TARGETARCH}.tar.gz | tar -C /usr/local -xz
+ENV PATH=$PATH:/root/go/bin:/usr/local/go/bin:/usr/local/bin
+
+# Install grpc compilers
+RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 && \
+    go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+
+# Install protoc (the version in 22.04 is too old, and grpc's bundled protoc
+# would pull in a newer absl that breaks stablediffusion).
+RUN <<EOT bash
+    if [ "amd64" = "$TARGETARCH" ]; then
+        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
+        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+        rm protoc.zip
+    fi
+    if [ "arm64" = "$TARGETARCH" ]; then
+        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
+        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+        rm protoc.zip
+    fi
+EOT
diff --git a/.docker/bases/Dockerfile.python b/.docker/bases/Dockerfile.python
new file mode 100644
index 000000000000..76f4b70eba5e
--- /dev/null
+++ b/.docker/bases/Dockerfile.python
@@ -0,0 +1,212 @@
+# Shared Python + accelerator base image.
+#
+# Built once per (build-type, arch, ubuntu-version, cuda-version) combination
+# by .github/workflows/base_images_python.yml and pushed to
+# quay.io/go-skynet/localai-base:<tag-stem>[-pr<N>]. Consumed by
+# backend/Dockerfile.python via the BASE_IMAGE_PREBUILT build-arg.
+#
+# Keep the install steps below in lock-step with backend/Dockerfile.python's
+# accel-inline stage until the inline fallback is removed. See
+# .agents/ci-caching.md for the migration plan.
+
+ARG BASE_IMAGE=ubuntu:24.04
+ARG APT_MIRROR=""
+ARG APT_PORTS_MIRROR=""
+
+FROM ${BASE_IMAGE}
+
+ARG BUILD_TYPE
+ENV BUILD_TYPE=${BUILD_TYPE}
+ARG CUDA_MAJOR_VERSION
+ARG CUDA_MINOR_VERSION
+ARG SKIP_DRIVERS=false
+ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
+ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
+ENV DEBIAN_FRONTEND=noninteractive
+ARG TARGETARCH
+ARG TARGETVARIANT
+ARG UBUNTU_VERSION=2404
+ARG APT_MIRROR
+ARG APT_PORTS_MIRROR
+
+LABEL org.opencontainers.image.source="https://github.com/mudler/LocalAI"
+LABEL org.opencontainers.image.description="LocalAI Python+accelerator base image"
+LABEL org.localai.base.lang="python"
+
+RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
+    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
+    apt-get update && \
+    apt-get install -y --no-install-recommends \
+        build-essential \
+        ccache \
+        ca-certificates \
+        espeak-ng \
+        curl \
+        libssl-dev \
+        git wget \
+        git-lfs \
+        unzip clang \
+        upx-ucl \
+        curl python3-pip \
+        python-is-python3 \
+        python3-dev llvm \
+        libnuma1 libgomp1 \
+        python3-venv make cmake && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN <<EOT bash
+    if [ "${UBUNTU_VERSION}" = "2404" ]; then
+        pip install --break-system-packages --user --upgrade pip
+    else
+        pip install --upgrade pip
+    fi
+EOT
+
+# Cuda
+ENV PATH=/usr/local/cuda/bin:${PATH}
+
+# HipBLAS requirements
+ENV PATH=/opt/rocm/bin:${PATH}
+
+# Vulkan requirements
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils wget gpg-agent && \
+        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
+            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
+            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
+            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
+            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
+            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
+        if [ "amd64" = "$TARGETARCH" ]; then
+            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
+            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            mkdir -p /opt/vulkan-sdk && \
+            mv 1.4.335.0 /opt/vulkan-sdk/ && \
+            cd /opt/vulkan-sdk/1.4.335.0 && \
+            ./vulkansdk --no-deps --maxjobs \
+                vulkan-loader \
+                vulkan-validationlayers \
+                vulkan-extensionlayer \
+                vulkan-tools \
+                shaderc && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
+            rm -rf /opt/vulkan-sdk
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            mkdir vulkan && cd vulkan && \
+            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
+            tar -xvf vulkan-sdk.tar.xz && \
+            rm vulkan-sdk.tar.xz && \
+            cd 1.4.335.0 && \
+            cp -rfv aarch64/bin/* /usr/bin/ && \
+            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
+            cp -rfv aarch64/include/* /usr/include/ && \
+            cp -rfv aarch64/share/* /usr/share/ && \
+            cd ../.. && \
+            rm -rf vulkan
+        fi
+        ldconfig && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+# CuBLAS requirements
+RUN <<EOT bash
+    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils
+        if [ "amd64" = "$TARGETARCH" ]; then
+            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
+            else
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
+            fi
+        fi
+        dpkg -i cuda-keyring_1.1-1_all.deb && \
+        rm -f cuda-keyring_1.1-1_all.deb && \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
+            apt-get install -y --no-install-recommends \
+            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        fi
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/*
+    fi
+EOT
+
+
+# https://github.com/NVIDIA/Isaac-GR00T/issues/343
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
+        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
+        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get install -y nvpl
+    fi
+EOT
+
+# If we are building with clblas support, we need the libraries for the builds
+RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            libclblast-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* \
+    ; fi
+
+RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            hipblas-dev \
+            hipblaslt-dev \
+            rocblas-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* && \
+        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
+        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
+        ldconfig \
+    ; fi
+
+RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
+    ln -s /opt/rocm-**/lib/llvm/lib/libomp.so /usr/lib/libomp.so \
+    ; fi
+
+# Install uv as a system package
+RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/bin sh
+ENV PATH="/root/.cargo/bin:${PATH}"
+# Increase timeout for uv installs behind slow networks
+ENV UV_HTTP_TIMEOUT=180
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+
+# Install grpcio-tools (the version in 22.04 is too old)
+RUN <<EOT bash
+    if [ "${UBUNTU_VERSION}" = "2404" ]; then
+        pip install --break-system-packages --user grpcio-tools==1.71.0 grpcio==1.71.0
+    else
+        pip install grpcio-tools==1.71.0 grpcio==1.71.0
+    fi
+EOT
diff --git a/.docker/bases/Dockerfile.rust b/.docker/bases/Dockerfile.rust
new file mode 100644
index 000000000000..0201e5978053
--- /dev/null
+++ b/.docker/bases/Dockerfile.rust
@@ -0,0 +1,47 @@
+# Shared Rust base image for the kokoros backend.
+#
+# Built once per (ubuntu-version) by .github/workflows/base_images.yml and
+# pushed to quay.io/go-skynet/localai-base:<tag-stem>[-pr<N>]. The current
+# rust matrix is CPU-only, so this base skips the GPU SDK stanzas; if a
+# future rust backend needs cublas/rocm/etc., promote this recipe to mirror
+# Dockerfile.python's GPU stack. See .agents/ci-caching.md.
+
+ARG BASE_IMAGE=ubuntu:24.04
+ARG APT_MIRROR=""
+ARG APT_PORTS_MIRROR=""
+
+FROM ${BASE_IMAGE}
+
+ENV DEBIAN_FRONTEND=noninteractive
+ARG TARGETARCH
+ARG TARGETVARIANT
+ARG UBUNTU_VERSION=2404
+ARG APT_MIRROR
+ARG APT_PORTS_MIRROR
+
+LABEL org.opencontainers.image.source="https://github.com/mudler/LocalAI"
+LABEL org.opencontainers.image.description="LocalAI Rust base image"
+LABEL org.localai.base.lang="rust"
+
+RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
+    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
+    apt-get update && \
+    apt-get install -y --no-install-recommends \
+        build-essential \
+        git ccache \
+        ca-certificates \
+        make cmake wget \
+        curl unzip \
+        clang \
+        pkg-config \
+        libssl-dev \
+        espeak-ng libespeak-ng-dev \
+        libsonic-dev libpcaudio-dev \
+        libopus-dev \
+        protobuf-compiler && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# Install Rust
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+ENV PATH="/root/.cargo/bin:${PATH}"
diff --git a/.github/backend-matrix.yaml b/.github/backend-matrix.yaml
new file mode 100644
index 000000000000..07de4d55b495
--- /dev/null
+++ b/.github/backend-matrix.yaml
@@ -0,0 +1,3164 @@
+# Backend build matrix data, consumed by:
+#   - .github/workflows/backend.yml (master push)
+#   - .github/workflows/backend_pr.yml (PR filtering)
+#   - scripts/changed-backends.js (matrix derivation)
+# Edit this file to add/remove/modify backend matrix entries; the rest of
+# the build pipeline (base image derivation, build-bases job, fromJSON
+# wiring) picks up the change automatically.
+
+linux:
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-diffusers"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: diffusers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-vllm"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: vllm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-sglang"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: sglang
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-diffusers"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: diffusers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-chatterbox"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: chatterbox
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-moonshine"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: moonshine
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-tinygrad"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: tinygrad
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-whisperx"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: whisperx
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-faster-whisper"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: faster-whisper
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-ace-step"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: ace-step
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-trl"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: trl
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-llama-cpp-quantization"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: llama-cpp-quantization
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-mlx"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: mlx
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-mlx-vlm"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: mlx-vlm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-mlx-audio"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: mlx-audio
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-mlx-distributed"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: mlx-distributed
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-vibevoice"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: vibevoice
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-qwen-asr"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: qwen-asr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-nemo"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: nemo
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-qwen-tts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: qwen-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-fish-speech"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: fish-speech
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-faster-qwen3-tts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: faster-qwen3-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-voxcpm"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: voxcpm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-pocket-tts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: pocket-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-rerankers"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: rerankers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-llama-cpp"
+    runs-on: bigger-runner
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: llama-cpp
+    dockerfile: ./backend/Dockerfile.llama-cpp
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-turboquant"
+    runs-on: bigger-runner
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: turboquant
+    dockerfile: ./backend/Dockerfile.turboquant
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-vllm"
+    runs-on: arc-runner-set
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: vllm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-vllm-omni"
+    runs-on: arc-runner-set
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: vllm-omni
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-sglang"
+    runs-on: arc-runner-set
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: sglang
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-transformers"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: transformers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-diffusers"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: diffusers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-ace-step"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: ace-step
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-trl"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: trl
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-kokoro"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: kokoro
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-faster-whisper"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: faster-whisper
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-whisperx"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: whisperx
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "9"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-coqui"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: coqui
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-outetts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: outetts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-chatterbox"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: chatterbox
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-moonshine"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: moonshine
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-mlx"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: mlx
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-mlx-vlm"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: mlx-vlm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-mlx-audio"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: mlx-audio
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-mlx-distributed"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: mlx-distributed
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-stablediffusion-ggml"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: stablediffusion-ggml
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-sam3-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: sam3-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-whisper"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: whisper
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-acestep-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: acestep-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-qwen3-tts-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: qwen3-tts-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-vibevoice-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: vibevoice-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-rfdetr"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: rfdetr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-insightface"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: insightface
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-speaker-recognition"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: speaker-recognition
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-neutts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: neutts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-rerankers"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: rerankers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-vibevoice"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: vibevoice
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-qwen-asr"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: qwen-asr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-nemo"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: nemo
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-qwen-tts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: qwen-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-fish-speech"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: fish-speech
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-faster-qwen3-tts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: faster-qwen3-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-voxcpm"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: voxcpm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-pocket-tts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: pocket-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-llama-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: llama-cpp
+    dockerfile: ./backend/Dockerfile.llama-cpp
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-turboquant"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: turboquant
+    dockerfile: ./backend/Dockerfile.turboquant
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-llama-cpp"
+    base-image: ubuntu:24.04
+    runs-on: ubuntu-24.04-arm
+    ubuntu-version: "2404"
+    backend: llama-cpp
+    dockerfile: ./backend/Dockerfile.llama-cpp
+    context: ./
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-turboquant"
+    base-image: ubuntu:24.04
+    runs-on: ubuntu-24.04-arm
+    ubuntu-version: "2404"
+    backend: turboquant
+    dockerfile: ./backend/Dockerfile.turboquant
+    context: ./
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-vllm"
+    runs-on: arc-runner-set
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: vllm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-vllm-omni"
+    runs-on: arc-runner-set
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: vllm-omni
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-transformers"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: transformers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-diffusers"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: diffusers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-ace-step"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: ace-step
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-trl"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: trl
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-vibevoice"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: vibevoice
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-qwen-asr"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: qwen-asr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-qwen-tts"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: qwen-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-fish-speech"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: fish-speech
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: faster-qwen3-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-pocket-tts"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: pocket-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-chatterbox"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: chatterbox
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-diffusers"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: diffusers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-vllm"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: vllm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-vllm-omni"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: vllm-omni
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-sglang"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: sglang
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-mlx"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: mlx
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-mlx-vlm"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: mlx-vlm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-mlx-audio"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: mlx-audio
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-mlx-distributed"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: mlx-distributed
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-whisperx"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: whisperx
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: l4t
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-faster-whisper"
+    runs-on: ubuntu-24.04-arm
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    ubuntu-version: "2404"
+    backend: faster-whisper
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-kokoro"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: kokoro
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-faster-whisper"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: faster-whisper
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-whisperx"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: whisperx
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-chatterbox"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: chatterbox
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-moonshine"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: moonshine
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-mlx"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: mlx
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-mlx-vlm"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: mlx-vlm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-mlx-audio"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: mlx-audio
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-mlx-distributed"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: mlx-distributed
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-stablediffusion-ggml"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: stablediffusion-ggml
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-stablediffusion-ggml"
+    base-image: ubuntu:24.04
+    ubuntu-version: "2404"
+    runs-on: ubuntu-24.04-arm
+    backend: stablediffusion-ggml
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-sam3-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: sam3-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-sam3-cpp"
+    base-image: ubuntu:24.04
+    ubuntu-version: "2404"
+    runs-on: ubuntu-24.04-arm
+    backend: sam3-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-whisper"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: whisper
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-whisper"
+    base-image: ubuntu:24.04
+    ubuntu-version: "2404"
+    runs-on: ubuntu-24.04-arm
+    backend: whisper
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-acestep-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: acestep-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-qwen3-tts-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: qwen3-tts-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-vibevoice-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: vibevoice-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-acestep-cpp"
+    base-image: ubuntu:24.04
+    ubuntu-version: "2404"
+    runs-on: ubuntu-24.04-arm
+    backend: acestep-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-qwen3-tts-cpp"
+    base-image: ubuntu:24.04
+    ubuntu-version: "2404"
+    runs-on: ubuntu-24.04-arm
+    backend: qwen3-tts-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-cuda-13-arm64-vibevoice-cpp"
+    base-image: ubuntu:24.04
+    ubuntu-version: "2404"
+    runs-on: ubuntu-24.04-arm
+    backend: vibevoice-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-rfdetr"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: rfdetr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-rerankers"
+    runs-on: ubuntu-latest
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: rerankers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-llama-cpp"
+    runs-on: ubuntu-latest
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: llama-cpp
+    dockerfile: ./backend/Dockerfile.llama-cpp
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-turboquant"
+    runs-on: ubuntu-latest
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: turboquant
+    dockerfile: ./backend/Dockerfile.turboquant
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-vllm"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: vllm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-vllm-omni"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: vllm-omni
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-sglang"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: sglang
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-transformers"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: transformers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-diffusers"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: diffusers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-ace-step"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: ace-step
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-kokoro"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: kokoro
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-vibevoice"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: vibevoice
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-qwen-asr"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: qwen-asr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-nemo"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: nemo
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-qwen-tts"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: qwen-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-fish-speech"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: fish-speech
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-voxcpm"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: voxcpm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-pocket-tts"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: pocket-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-faster-whisper"
+    runs-on: bigger-runner
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: faster-whisper
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-coqui"
+    runs-on: bigger-runner
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: coqui
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-rerankers"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: rerankers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f32
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f32-llama-cpp"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: llama-cpp
+    dockerfile: ./backend/Dockerfile.llama-cpp
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f32
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f32-turboquant"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: turboquant
+    dockerfile: ./backend/Dockerfile.turboquant
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f16
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f16-llama-cpp"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: llama-cpp
+    dockerfile: ./backend/Dockerfile.llama-cpp
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f16
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f16-turboquant"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: turboquant
+    dockerfile: ./backend/Dockerfile.turboquant
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-vllm"
+    runs-on: arc-runner-set
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: vllm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sglang"
+    runs-on: arc-runner-set
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: sglang
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-transformers"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: transformers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-diffusers"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: diffusers
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-ace-step"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: ace-step
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-vibevoice"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: vibevoice
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-qwen-asr"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: qwen-asr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-qwen-tts"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: qwen-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-fish-speech"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: fish-speech
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-faster-qwen3-tts"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: faster-qwen3-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-pocket-tts"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: pocket-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-kokoro"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: kokoro
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-mlx"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: mlx
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-mlx-vlm"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: mlx-vlm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-mlx-audio"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: mlx-audio
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-mlx-distributed"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: mlx-distributed
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-whisperx"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: whisperx
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-faster-whisper"
+    runs-on: ubuntu-24.04-arm
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    skip-drivers: "true"
+    backend: faster-whisper
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-kokoro"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: kokoro
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-faster-whisper"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: faster-whisper
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-vibevoice"
+    runs-on: arc-runner-set
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: vibevoice
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-qwen-asr"
+    runs-on: arc-runner-set
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: qwen-asr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-nemo"
+    runs-on: arc-runner-set
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: nemo
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-qwen-tts"
+    runs-on: arc-runner-set
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: qwen-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-fish-speech"
+    runs-on: arc-runner-set
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: fish-speech
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-voxcpm"
+    runs-on: arc-runner-set
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: voxcpm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-pocket-tts"
+    runs-on: arc-runner-set
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: pocket-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-coqui"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: coqui
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-piper"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: piper
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-llama-cpp"
+    runs-on: bigger-runner
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: llama-cpp
+    dockerfile: ./backend/Dockerfile.llama-cpp
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-turboquant"
+    runs-on: bigger-runner
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: turboquant
+    dockerfile: ./backend/Dockerfile.turboquant
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-ik-llama-cpp"
+    runs-on: bigger-runner
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: ik-llama-cpp
+    dockerfile: ./backend/Dockerfile.ik-llama-cpp
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-arm64-llama-cpp"
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    runs-on: ubuntu-24.04-arm
+    backend: llama-cpp
+    dockerfile: ./backend/Dockerfile.llama-cpp
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-arm64-turboquant"
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    runs-on: ubuntu-24.04-arm
+    backend: turboquant
+    dockerfile: ./backend/Dockerfile.turboquant
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: vulkan
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-gpu-vulkan-llama-cpp"
+    runs-on: bigger-runner
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: llama-cpp
+    dockerfile: ./backend/Dockerfile.llama-cpp
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: vulkan
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-gpu-vulkan-turboquant"
+    runs-on: bigger-runner
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: turboquant
+    dockerfile: ./backend/Dockerfile.turboquant
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-stablediffusion-ggml"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: stablediffusion-ggml
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-sam3-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: sam3-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f32
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f32-sam3-cpp"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: sam3-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f16
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f16-sam3-cpp"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: sam3-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: vulkan
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-gpu-vulkan-sam3-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: sam3-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f32
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f32-stablediffusion-ggml"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: stablediffusion-ggml
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f16
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f16-stablediffusion-ggml"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: stablediffusion-ggml
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: vulkan
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-gpu-vulkan-stablediffusion-ggml"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: stablediffusion-ggml
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-arm64-stablediffusion-ggml"
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    runs-on: ubuntu-24.04-arm
+    backend: stablediffusion-ggml
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-arm64-sam3-cpp"
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    runs-on: ubuntu-24.04-arm
+    backend: sam3-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-whisper"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: whisper
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f32
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f32-whisper"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: whisper
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f16
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f16-whisper"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: whisper
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: vulkan
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-gpu-vulkan-whisper"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: whisper
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-arm64-whisper"
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    runs-on: ubuntu-24.04-arm
+    backend: whisper
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-whisper"
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    runs-on: ubuntu-latest
+    skip-drivers: "false"
+    backend: whisper
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-acestep-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: acestep-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f32
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f32-acestep-cpp"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: acestep-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f16
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f16-acestep-cpp"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: acestep-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: vulkan
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-gpu-vulkan-acestep-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: acestep-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-arm64-acestep-cpp"
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    runs-on: ubuntu-24.04-arm
+    backend: acestep-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-acestep-cpp"
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    runs-on: ubuntu-latest
+    skip-drivers: "false"
+    backend: acestep-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-qwen3-tts-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: qwen3-tts-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f32
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f32-qwen3-tts-cpp"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: qwen3-tts-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f16
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f16-qwen3-tts-cpp"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: qwen3-tts-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: vulkan
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-gpu-vulkan-qwen3-tts-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: qwen3-tts-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-arm64-qwen3-tts-cpp"
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    runs-on: ubuntu-24.04-arm
+    backend: qwen3-tts-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-qwen3-tts-cpp"
+    base-image: rocm/dev-ubuntu-24.04:6.4.4
+    runs-on: ubuntu-latest
+    skip-drivers: "false"
+    backend: qwen3-tts-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-vibevoice-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: vibevoice-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-localvqe"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: localvqe
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f32
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f32-vibevoice-cpp"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: vibevoice-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: sycl_f16
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-sycl-f16-vibevoice-cpp"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: vibevoice-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: vulkan
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-gpu-vulkan-vibevoice-cpp"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: vibevoice-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: vulkan
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-gpu-vulkan-localvqe"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: localvqe
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "false"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-arm64-vibevoice-cpp"
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    runs-on: ubuntu-24.04-arm
+    backend: vibevoice-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-vibevoice-cpp"
+    base-image: rocm/dev-ubuntu-24.04:6.4.4
+    runs-on: ubuntu-latest
+    skip-drivers: "false"
+    backend: vibevoice-cpp
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-voxtral"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: voxtral
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-opus"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: opus
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-silero-vad"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: silero-vad
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-kokoros"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: kokoros
+    dockerfile: ./backend/Dockerfile.rust
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-local-store"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: local-store
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-rfdetr"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: rfdetr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-insightface"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: insightface
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-speaker-recognition"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: speaker-recognition
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: intel
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-intel-rfdetr"
+    runs-on: ubuntu-latest
+    base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
+    skip-drivers: "false"
+    backend: rfdetr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "true"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-arm64-rfdetr"
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    runs-on: ubuntu-24.04-arm
+    backend: rfdetr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: l4t
+    cuda-major-version: "12"
+    cuda-minor-version: "0"
+    platforms: linux/arm64
+    skip-drivers: "true"
+    tag-latest: auto
+    tag-suffix: "-nvidia-l4t-arm64-chatterbox"
+    base-image: nvcr.io/nvidia/l4t-jetpack:r36.4.0
+    runs-on: ubuntu-24.04-arm
+    backend: chatterbox
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2204"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-kitten-tts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: kitten-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-neutts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: neutts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: hipblas
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-rocm-hipblas-neutts"
+    runs-on: arc-runner-set
+    base-image: rocm/dev-ubuntu-24.04:7.2.1
+    skip-drivers: "false"
+    backend: neutts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-vibevoice"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: vibevoice
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-qwen-asr"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: qwen-asr
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-nemo"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: nemo
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-qwen-tts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: qwen-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-fish-speech"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: fish-speech
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-voxcpm"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: voxcpm
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-pocket-tts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: pocket-tts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-cpu-outetts"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "true"
+    backend: outetts
+    dockerfile: ./backend/Dockerfile.python
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: ""
+    cuda-major-version: ""
+    cuda-minor-version: ""
+    platforms: linux/amd64,linux/arm64
+    tag-latest: auto
+    tag-suffix: "-cpu-sherpa-onnx"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: sherpa-onnx
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "12"
+    cuda-minor-version: "8"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-12-sherpa-onnx"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: sherpa-onnx
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+  - build-type: cublas
+    cuda-major-version: "13"
+    cuda-minor-version: "0"
+    platforms: linux/amd64
+    tag-latest: auto
+    tag-suffix: "-gpu-nvidia-cuda-13-sherpa-onnx"
+    runs-on: ubuntu-latest
+    base-image: ubuntu:24.04
+    skip-drivers: "false"
+    backend: sherpa-onnx
+    dockerfile: ./backend/Dockerfile.golang
+    context: ./
+    ubuntu-version: "2404"
+darwin:
+  - backend: diffusers
+    tag-suffix: "-metal-darwin-arm64-diffusers"
+    build-type: mps
+  - backend: ace-step
+    tag-suffix: "-metal-darwin-arm64-ace-step"
+    build-type: mps
+  - backend: mlx
+    tag-suffix: "-metal-darwin-arm64-mlx"
+    build-type: mps
+  - backend: chatterbox
+    tag-suffix: "-metal-darwin-arm64-chatterbox"
+    build-type: mps
+  - backend: mlx-vlm
+    tag-suffix: "-metal-darwin-arm64-mlx-vlm"
+    build-type: mps
+  - backend: mlx-audio
+    tag-suffix: "-metal-darwin-arm64-mlx-audio"
+    build-type: mps
+  - backend: mlx-distributed
+    tag-suffix: "-metal-darwin-arm64-mlx-distributed"
+    build-type: mps
+  - backend: stablediffusion-ggml
+    tag-suffix: "-metal-darwin-arm64-stablediffusion-ggml"
+    build-type: metal
+    lang: go
+  - backend: whisper
+    tag-suffix: "-metal-darwin-arm64-whisper"
+    build-type: metal
+    lang: go
+  - backend: acestep-cpp
+    tag-suffix: "-metal-darwin-arm64-acestep-cpp"
+    build-type: metal
+    lang: go
+  - backend: qwen3-tts-cpp
+    tag-suffix: "-metal-darwin-arm64-qwen3-tts-cpp"
+    build-type: metal
+    lang: go
+  - backend: vibevoice-cpp
+    tag-suffix: "-metal-darwin-arm64-vibevoice-cpp"
+    build-type: metal
+    lang: go
+  - backend: voxtral
+    tag-suffix: "-metal-darwin-arm64-voxtral"
+    build-type: metal
+    lang: go
+  - backend: vibevoice
+    tag-suffix: "-metal-darwin-arm64-vibevoice"
+    build-type: mps
+  - backend: qwen-asr
+    tag-suffix: "-metal-darwin-arm64-qwen-asr"
+    build-type: mps
+  - backend: nemo
+    tag-suffix: "-metal-darwin-arm64-nemo"
+    build-type: mps
+  - backend: qwen-tts
+    tag-suffix: "-metal-darwin-arm64-qwen-tts"
+    build-type: mps
+  - backend: fish-speech
+    tag-suffix: "-metal-darwin-arm64-fish-speech"
+    build-type: mps
+  - backend: voxcpm
+    tag-suffix: "-metal-darwin-arm64-voxcpm"
+    build-type: mps
+  - backend: pocket-tts
+    tag-suffix: "-metal-darwin-arm64-pocket-tts"
+    build-type: mps
+  - backend: moonshine
+    tag-suffix: "-metal-darwin-arm64-moonshine"
+    build-type: mps
+  - backend: whisperx
+    tag-suffix: "-metal-darwin-arm64-whisperx"
+    build-type: mps
+  - backend: rerankers
+    tag-suffix: "-metal-darwin-arm64-rerankers"
+    build-type: mps
+  - backend: transformers
+    tag-suffix: "-metal-darwin-arm64-transformers"
+    build-type: mps
+  - backend: kokoro
+    tag-suffix: "-metal-darwin-arm64-kokoro"
+    build-type: mps
+  - backend: faster-whisper
+    tag-suffix: "-metal-darwin-arm64-faster-whisper"
+    build-type: mps
+  - backend: coqui
+    tag-suffix: "-metal-darwin-arm64-coqui"
+    build-type: mps
+  - backend: rfdetr
+    tag-suffix: "-metal-darwin-arm64-rfdetr"
+    build-type: mps
+  - backend: kitten-tts
+    tag-suffix: "-metal-darwin-arm64-kitten-tts"
+    build-type: mps
+  - backend: piper
+    tag-suffix: "-metal-darwin-arm64-piper"
+    build-type: metal
+    lang: go
+  - backend: opus
+    tag-suffix: "-metal-darwin-arm64-opus"
+    build-type: metal
+    lang: go
+  - backend: silero-vad
+    tag-suffix: "-metal-darwin-arm64-silero-vad"
+    build-type: metal
+    lang: go
+  - backend: local-store
+    tag-suffix: "-metal-darwin-arm64-local-store"
+    build-type: metal
+    lang: go
+  - backend: llama-cpp-quantization
+    tag-suffix: "-metal-darwin-arm64-llama-cpp-quantization"
+    build-type: mps
diff --git a/.github/workflows/backend.yml b/.github/workflows/backend.yml
index 2242be0f79ba..d15604ea6312 100644
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
@@ -13,8 +13,55 @@ concurrency:
   cancel-in-progress: true
 
 jobs:
-  backend-jobs:
+  derive-bases:
     if: github.repository == 'mudler/LocalAI'
+    runs-on: ubuntu-latest
+    outputs:
+      matrix: ${{ steps.derive.outputs.matrix }}
+      matrix-darwin: ${{ steps.derive.outputs.matrix-darwin }}
+      bases-matrix: ${{ steps.derive.outputs.bases-matrix }}
+      has-backends: ${{ steps.derive.outputs.has-backends }}
+      has-backends-darwin: ${{ steps.derive.outputs.has-backends-darwin }}
+      has-bases: ${{ steps.derive.outputs.has-bases }}
+    steps:
+      - uses: actions/checkout@v6
+      - uses: oven-sh/setup-bun@v2
+      - run: |
+          bun add js-yaml
+          bun add @octokit/core
+      - id: derive
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_EVENT_PATH: ${{ github.event_path }}
+        run: bun run scripts/changed-backends.js
+
+  build-bases:
+    needs: derive-bases
+    if: needs.derive-bases.outputs.has-bases == 'true'
+    strategy:
+      fail-fast: false
+      matrix: ${{ fromJSON(needs.derive-bases.outputs.bases-matrix) }}
+    uses: ./.github/workflows/base_images.yml
+    with:
+      lang: ${{ matrix.lang }}
+      base-image: ${{ matrix.base-image }}
+      build-type: ${{ matrix.build-type }}
+      cuda-major-version: ${{ matrix.cuda-major-version }}
+      cuda-minor-version: ${{ matrix.cuda-minor-version }}
+      ubuntu-version: ${{ matrix.ubuntu-version }}
+      platforms: ${{ matrix.platforms }}
+      runs-on: ${{ matrix.runs-on }}
+      tag-stem: ${{ matrix.tag-stem }}
+      skip-drivers: ${{ matrix.skip-drivers }}
+    secrets:
+      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+
+  backend-jobs:
+    if: |
+      always() && github.repository == 'mudler/LocalAI' &&
+      (needs.build-bases.result == 'success' || needs.build-bases.result == 'skipped')
+    needs: [derive-bases, build-bases]
     uses: ./.github/workflows/backend_build.yml
     with:
       tag-latest: ${{ matrix.tag-latest }}
@@ -31,6 +78,10 @@ jobs:
       context: ${{ matrix.context }}
       ubuntu-version: ${{ matrix.ubuntu-version }}
       amdgpu-targets: ${{ matrix.amdgpu-targets || 'gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201' }}
+      # Set by scripts/changed-backends.js for langs that have a
+      # .docker/bases/Dockerfile.<lang> recipe; '' otherwise (those run
+      # the inline bootstrap in their own Dockerfile).
+      base-image-prebuilt: ${{ matrix.base-image-prebuilt || '' }}
     secrets:
       dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
       dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
@@ -38,3214 +89,14 @@ jobs:
       quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
     strategy:
       fail-fast: false
-      #max-parallel: ${{ github.event_name != 'pull_request' && 6 || 4 }}
-      matrix:
-        include:
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-diffusers'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "diffusers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-vllm'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "vllm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-sglang'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "sglang"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-diffusers'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "diffusers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-chatterbox'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "chatterbox"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-moonshine'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "moonshine"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          # tinygrad ships a single image — its CPU device uses bundled
-          # libLLVM, and its CUDA / HIP / Metal devices dlopen the host
-          # driver libraries at runtime via tinygrad's ctypes autogen
-          # wrappers. There is no toolkit-version split because tinygrad
-          # generates kernels itself (PTX renderer for CUDA) and never
-          # links against cuDNN/cuBLAS/torch.
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-tinygrad'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "tinygrad"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-whisperx'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "whisperx"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-faster-whisper'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "faster-whisper"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-ace-step'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "ace-step"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-trl'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "trl"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-llama-cpp-quantization'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "llama-cpp-quantization"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-mlx'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "mlx"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-mlx-vlm'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "mlx-vlm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-mlx-audio'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "mlx-audio"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-mlx-distributed'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "mlx-distributed"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          # CUDA 12 builds
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-vibevoice'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "vibevoice"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-qwen-asr'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "qwen-asr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-nemo'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "nemo"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-qwen-tts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "qwen-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-fish-speech'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "fish-speech"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-faster-qwen3-tts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "faster-qwen3-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-voxcpm'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "voxcpm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-pocket-tts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "pocket-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-rerankers'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "rerankers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-llama-cpp'
-            runs-on: 'bigger-runner'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "llama-cpp"
-            dockerfile: "./backend/Dockerfile.llama-cpp"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-turboquant'
-            runs-on: 'bigger-runner'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "turboquant"
-            dockerfile: "./backend/Dockerfile.turboquant"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-vllm'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "vllm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-vllm-omni'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "vllm-omni"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-sglang'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "sglang"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-transformers'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "transformers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-diffusers'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "diffusers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-ace-step'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "ace-step"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-trl'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "trl"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-kokoro'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "kokoro"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-faster-whisper'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "faster-whisper"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-whisperx'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "whisperx"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "9"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-coqui'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "coqui"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-outetts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "outetts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-chatterbox'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "chatterbox"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-moonshine'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "moonshine"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-mlx'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "mlx"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-mlx-vlm'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "mlx-vlm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-mlx-audio'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "mlx-audio"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-mlx-distributed'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "mlx-distributed"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-stablediffusion-ggml'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-sam3-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "sam3-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-whisper'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-acestep-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "acestep-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-qwen3-tts-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "qwen3-tts-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-vibevoice-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "vibevoice-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-rfdetr'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "rfdetr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-insightface'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "insightface"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-speaker-recognition'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "speaker-recognition"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-neutts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "neutts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          # cuda 13
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-rerankers'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "rerankers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-vibevoice'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "vibevoice"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-qwen-asr'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "qwen-asr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-nemo'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "nemo"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-qwen-tts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "qwen-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-fish-speech'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "fish-speech"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-faster-qwen3-tts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "faster-qwen3-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-voxcpm'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "voxcpm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-pocket-tts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "pocket-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-llama-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "llama-cpp"
-            dockerfile: "./backend/Dockerfile.llama-cpp"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-turboquant'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "turboquant"
-            dockerfile: "./backend/Dockerfile.turboquant"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-llama-cpp'
-            base-image: "ubuntu:24.04"
-            runs-on: 'ubuntu-24.04-arm'
-            ubuntu-version: '2404'
-            backend: "llama-cpp"
-            dockerfile: "./backend/Dockerfile.llama-cpp"
-            context: "./"
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-turboquant'
-            base-image: "ubuntu:24.04"
-            runs-on: 'ubuntu-24.04-arm'
-            ubuntu-version: '2404'
-            backend: "turboquant"
-            dockerfile: "./backend/Dockerfile.turboquant"
-            context: "./"
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-vllm'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "vllm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-vllm-omni'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "vllm-omni"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-transformers'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "transformers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-diffusers'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "diffusers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-ace-step'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "ace-step"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-trl'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "trl"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-vibevoice'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "vibevoice"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-qwen-asr'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "qwen-asr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-qwen-tts'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "qwen-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-fish-speech'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "fish-speech"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "faster-qwen3-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-pocket-tts'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "pocket-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-chatterbox'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "chatterbox"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-diffusers'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "diffusers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-vllm'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "vllm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-vllm-omni'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "vllm-omni"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-sglang'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "sglang"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-mlx'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "mlx"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-mlx-vlm'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "mlx-vlm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-mlx-audio'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "mlx-audio"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-mlx-distributed'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "mlx-distributed"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-whisperx'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "whisperx"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'l4t'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-faster-whisper'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            ubuntu-version: '2404'
-            backend: "faster-whisper"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-kokoro'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "kokoro"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-faster-whisper'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "faster-whisper"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-whisperx'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "whisperx"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-chatterbox'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "chatterbox"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-moonshine'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "moonshine"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-mlx'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "mlx"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-mlx-vlm'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "mlx-vlm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-mlx-audio'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "mlx-audio"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-mlx-distributed'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "mlx-distributed"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-stablediffusion-ggml'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-stablediffusion-ggml'
-            base-image: "ubuntu:24.04"
-            ubuntu-version: '2404'
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-sam3-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "sam3-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-sam3-cpp'
-            base-image: "ubuntu:24.04"
-            ubuntu-version: '2404'
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "sam3-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-whisper'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-whisper'
-            base-image: "ubuntu:24.04"
-            ubuntu-version: '2404'
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-acestep-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "acestep-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-qwen3-tts-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "qwen3-tts-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-vibevoice-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "vibevoice-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-acestep-cpp'
-            base-image: "ubuntu:24.04"
-            ubuntu-version: '2404'
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "acestep-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-qwen3-tts-cpp'
-            base-image: "ubuntu:24.04"
-            ubuntu-version: '2404'
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "qwen3-tts-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-cuda-13-arm64-vibevoice-cpp'
-            base-image: "ubuntu:24.04"
-            ubuntu-version: '2404'
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "vibevoice-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-rfdetr'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "rfdetr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          # hipblas builds
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-rerankers'
-            runs-on: 'ubuntu-latest'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "rerankers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-llama-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "llama-cpp"
-            dockerfile: "./backend/Dockerfile.llama-cpp"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-turboquant'
-            runs-on: 'ubuntu-latest'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "turboquant"
-            dockerfile: "./backend/Dockerfile.turboquant"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-vllm'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "vllm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-vllm-omni'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "vllm-omni"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-sglang'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "sglang"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-transformers'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "transformers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-diffusers'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "diffusers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-ace-step'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "ace-step"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          # ROCm additional backends
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-kokoro'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "kokoro"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-vibevoice'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "vibevoice"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-qwen-asr'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "qwen-asr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-nemo'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "nemo"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-qwen-tts'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "qwen-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-fish-speech'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "fish-speech"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-voxcpm'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "voxcpm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-pocket-tts'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "pocket-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-faster-whisper'
-            runs-on: 'bigger-runner'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "faster-whisper"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-coqui'
-            runs-on: 'bigger-runner'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "coqui"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-            # sycl builds
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-rerankers'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "rerankers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-llama-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "llama-cpp"
-            dockerfile: "./backend/Dockerfile.llama-cpp"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-turboquant'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "turboquant"
-            dockerfile: "./backend/Dockerfile.turboquant"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f16'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-llama-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "llama-cpp"
-            dockerfile: "./backend/Dockerfile.llama-cpp"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f16'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-turboquant'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "turboquant"
-            dockerfile: "./backend/Dockerfile.turboquant"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-vllm'
-            runs-on: 'arc-runner-set'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "vllm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sglang'
-            runs-on: 'arc-runner-set'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "sglang"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-transformers'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "transformers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-diffusers'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "diffusers"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-ace-step'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "ace-step"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-vibevoice'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "vibevoice"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-qwen-asr'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "qwen-asr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-qwen-tts'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "qwen-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-fish-speech'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "fish-speech"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-faster-qwen3-tts'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "faster-qwen3-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-pocket-tts'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "pocket-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-kokoro'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "kokoro"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-mlx'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "mlx"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-mlx-vlm'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "mlx-vlm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-mlx-audio'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "mlx-audio"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-mlx-distributed'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "mlx-distributed"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-whisperx'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "whisperx"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-faster-whisper'
-            runs-on: 'ubuntu-24.04-arm'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            skip-drivers: 'true'
-            backend: "faster-whisper"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          # SYCL additional backends
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-kokoro'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "kokoro"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-faster-whisper'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "faster-whisper"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-vibevoice'
-            runs-on: 'arc-runner-set'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "vibevoice"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-qwen-asr'
-            runs-on: 'arc-runner-set'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "qwen-asr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-nemo'
-            runs-on: 'arc-runner-set'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "nemo"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-qwen-tts'
-            runs-on: 'arc-runner-set'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "qwen-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-fish-speech'
-            runs-on: 'arc-runner-set'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "fish-speech"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-voxcpm'
-            runs-on: 'arc-runner-set'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "voxcpm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-pocket-tts'
-            runs-on: 'arc-runner-set'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "pocket-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-coqui'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "coqui"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          # piper
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-piper'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "piper"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-llama-cpp'
-            runs-on: 'bigger-runner'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "llama-cpp"
-            dockerfile: "./backend/Dockerfile.llama-cpp"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-turboquant'
-            runs-on: 'bigger-runner'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "turboquant"
-            dockerfile: "./backend/Dockerfile.turboquant"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-ik-llama-cpp'
-            runs-on: 'bigger-runner'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "ik-llama-cpp"
-            dockerfile: "./backend/Dockerfile.ik-llama-cpp"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-arm64-llama-cpp'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "llama-cpp"
-            dockerfile: "./backend/Dockerfile.llama-cpp"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-arm64-turboquant'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "turboquant"
-            dockerfile: "./backend/Dockerfile.turboquant"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'vulkan'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-vulkan-llama-cpp'
-            runs-on: 'bigger-runner'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "llama-cpp"
-            dockerfile: "./backend/Dockerfile.llama-cpp"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'vulkan'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-vulkan-turboquant'
-            runs-on: 'bigger-runner'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "turboquant"
-            dockerfile: "./backend/Dockerfile.turboquant"
-            context: "./"
-            ubuntu-version: '2404'
-          # Stablediffusion-ggml
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-stablediffusion-ggml'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          # sam3-cpp
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-sam3-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "sam3-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-sam3-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "sam3-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f16'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-sam3-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "sam3-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'vulkan'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-vulkan-sam3-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "sam3-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-stablediffusion-ggml'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f16'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-stablediffusion-ggml'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'vulkan'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-vulkan-stablediffusion-ggml'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-arm64-stablediffusion-ggml'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "stablediffusion-ggml"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-arm64-sam3-cpp'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "sam3-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2204'
-          # whisper
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-whisper'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-whisper'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f16'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-whisper'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'vulkan'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-vulkan-whisper'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-arm64-whisper'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-whisper'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            runs-on: 'ubuntu-latest'
-            skip-drivers: 'false'
-            backend: "whisper"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          # acestep-cpp
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-acestep-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "acestep-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-acestep-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "acestep-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f16'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-acestep-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "acestep-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'vulkan'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-vulkan-acestep-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "acestep-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-arm64-acestep-cpp'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "acestep-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-acestep-cpp'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            runs-on: 'ubuntu-latest'
-            skip-drivers: 'false'
-            backend: "acestep-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          # qwen3-tts-cpp
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-qwen3-tts-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "qwen3-tts-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-qwen3-tts-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "qwen3-tts-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f16'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-qwen3-tts-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "qwen3-tts-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'vulkan'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-vulkan-qwen3-tts-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "qwen3-tts-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-arm64-qwen3-tts-cpp'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "qwen3-tts-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-qwen3-tts-cpp'
-            base-image: "rocm/dev-ubuntu-24.04:6.4.4"
-            runs-on: 'ubuntu-latest'
-            skip-drivers: 'false'
-            backend: "qwen3-tts-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          # vibevoice-cpp
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-vibevoice-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "vibevoice-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-localvqe'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "localvqe"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f32'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f32-vibevoice-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "vibevoice-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'sycl_f16'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-sycl-f16-vibevoice-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "vibevoice-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'vulkan'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-vulkan-vibevoice-cpp'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "vibevoice-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'vulkan'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-vulkan-localvqe'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "localvqe"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'false'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-arm64-vibevoice-cpp'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "vibevoice-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-vibevoice-cpp'
-            base-image: "rocm/dev-ubuntu-24.04:6.4.4"
-            runs-on: 'ubuntu-latest'
-            skip-drivers: 'false'
-            backend: "vibevoice-cpp"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          # voxtral
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-voxtral'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "voxtral"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          #opus
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-opus'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "opus"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          #silero-vad
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-silero-vad'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "silero-vad"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          # kokoros (Rust TTS)
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-kokoros'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "kokoros"
-            dockerfile: "./backend/Dockerfile.rust"
-            context: "./"
-            ubuntu-version: '2404'
-          # local-store
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-local-store'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "local-store"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          # rfdetr
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-rfdetr'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "rfdetr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          # insightface (face recognition)
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-insightface'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "insightface"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          # speaker-recognition (voice/speaker biometrics)
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-speaker-recognition'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "speaker-recognition"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'intel'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-intel-rfdetr'
-            runs-on: 'ubuntu-latest'
-            base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-            skip-drivers: 'false'
-            backend: "rfdetr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'true'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-arm64-rfdetr'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "rfdetr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          - build-type: 'l4t'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            skip-drivers: 'true'
-            tag-latest: 'auto'
-            tag-suffix: '-nvidia-l4t-arm64-chatterbox'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            backend: "chatterbox"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2204'
-          # runs out of space on the runner
-          # - build-type: 'hipblas'
-          #   cuda-major-version: ""
-          #   cuda-minor-version: ""
-          #   platforms: 'linux/amd64'
-          #   tag-latest: 'auto'
-          #   tag-suffix: '-gpu-hipblas-rfdetr'
-          #   base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-          #   runs-on: 'ubuntu-latest'
-          #   skip-drivers: 'false'
-          #   backend: "rfdetr"
-          #   dockerfile: "./backend/Dockerfile.python"
-          #   context: "./"
-          # kitten-tts
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-kitten-tts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "kitten-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          # neutts
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-neutts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "neutts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: 'hipblas'
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-rocm-hipblas-neutts'
-            runs-on: 'arc-runner-set'
-            base-image: "rocm/dev-ubuntu-24.04:7.2.1"
-            skip-drivers: 'false'
-            backend: "neutts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-vibevoice'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "vibevoice"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-qwen-asr'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "qwen-asr"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-nemo'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "nemo"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-qwen-tts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "qwen-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-fish-speech'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "fish-speech"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-voxcpm'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "voxcpm"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-pocket-tts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "pocket-tts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-outetts'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'true'
-            backend: "outetts"
-            dockerfile: "./backend/Dockerfile.python"
-            context: "./"
-            ubuntu-version: '2404'
-          # sherpa-onnx CPU
-          - build-type: ''
-            cuda-major-version: ""
-            cuda-minor-version: ""
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-cpu-sherpa-onnx'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "sherpa-onnx"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          # sherpa-onnx CUDA 12
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "8"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-12-sherpa-onnx'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "sherpa-onnx"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
-          # sherpa-onnx CUDA 13 — requires onnxruntime 1.24.x+ for the
-          # gpu_cuda13 tarball; sherpa-onnx SHERPA_COMMIT pins to v1.12.39.
-          - build-type: 'cublas'
-            cuda-major-version: "13"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-gpu-nvidia-cuda-13-sherpa-onnx'
-            runs-on: 'ubuntu-latest'
-            base-image: "ubuntu:24.04"
-            skip-drivers: 'false'
-            backend: "sherpa-onnx"
-            dockerfile: "./backend/Dockerfile.golang"
-            context: "./"
-            ubuntu-version: '2404'
+      matrix: ${{ fromJSON(needs.derive-bases.outputs.matrix) }}
   backend-jobs-darwin:
+    if: github.repository == 'mudler/LocalAI'
+    needs: derive-bases
     uses: ./.github/workflows/backend_build_darwin.yml
     strategy:
-      matrix:
-        include:
-          - backend: "diffusers"
-            tag-suffix: "-metal-darwin-arm64-diffusers"
-            build-type: "mps"
-          - backend: "ace-step"
-            tag-suffix: "-metal-darwin-arm64-ace-step"
-            build-type: "mps"
-          - backend: "mlx"
-            tag-suffix: "-metal-darwin-arm64-mlx"
-            build-type: "mps"
-          - backend: "chatterbox"
-            tag-suffix: "-metal-darwin-arm64-chatterbox"
-            build-type: "mps"
-          - backend: "mlx-vlm"
-            tag-suffix: "-metal-darwin-arm64-mlx-vlm"
-            build-type: "mps"
-          - backend: "mlx-audio"
-            tag-suffix: "-metal-darwin-arm64-mlx-audio"
-            build-type: "mps"
-          - backend: "mlx-distributed"
-            tag-suffix: "-metal-darwin-arm64-mlx-distributed"
-            build-type: "mps"
-          - backend: "stablediffusion-ggml"
-            tag-suffix: "-metal-darwin-arm64-stablediffusion-ggml"
-            build-type: "metal"
-            lang: "go"
-          - backend: "whisper"
-            tag-suffix: "-metal-darwin-arm64-whisper"
-            build-type: "metal"
-            lang: "go"
-          - backend: "acestep-cpp"
-            tag-suffix: "-metal-darwin-arm64-acestep-cpp"
-            build-type: "metal"
-            lang: "go"
-          - backend: "qwen3-tts-cpp"
-            tag-suffix: "-metal-darwin-arm64-qwen3-tts-cpp"
-            build-type: "metal"
-            lang: "go"
-          - backend: "vibevoice-cpp"
-            tag-suffix: "-metal-darwin-arm64-vibevoice-cpp"
-            build-type: "metal"
-            lang: "go"
-          - backend: "voxtral"
-            tag-suffix: "-metal-darwin-arm64-voxtral"
-            build-type: "metal"
-            lang: "go"
-          - backend: "vibevoice"
-            tag-suffix: "-metal-darwin-arm64-vibevoice"
-            build-type: "mps"
-          - backend: "qwen-asr"
-            tag-suffix: "-metal-darwin-arm64-qwen-asr"
-            build-type: "mps"
-          - backend: "nemo"
-            tag-suffix: "-metal-darwin-arm64-nemo"
-            build-type: "mps"
-          - backend: "qwen-tts"
-            tag-suffix: "-metal-darwin-arm64-qwen-tts"
-            build-type: "mps"
-          - backend: "fish-speech"
-            tag-suffix: "-metal-darwin-arm64-fish-speech"
-            build-type: "mps"
-          - backend: "voxcpm"
-            tag-suffix: "-metal-darwin-arm64-voxcpm"
-            build-type: "mps"
-          - backend: "pocket-tts"
-            tag-suffix: "-metal-darwin-arm64-pocket-tts"
-            build-type: "mps"
-          - backend: "moonshine"
-            tag-suffix: "-metal-darwin-arm64-moonshine"
-            build-type: "mps"
-          - backend: "whisperx"
-            tag-suffix: "-metal-darwin-arm64-whisperx"
-            build-type: "mps"
-          - backend: "rerankers"
-            tag-suffix: "-metal-darwin-arm64-rerankers"
-            build-type: "mps"
-          - backend: "transformers"
-            tag-suffix: "-metal-darwin-arm64-transformers"
-            build-type: "mps"
-          - backend: "kokoro"
-            tag-suffix: "-metal-darwin-arm64-kokoro"
-            build-type: "mps"
-          - backend: "faster-whisper"
-            tag-suffix: "-metal-darwin-arm64-faster-whisper"
-            build-type: "mps"
-          - backend: "coqui"
-            tag-suffix: "-metal-darwin-arm64-coqui"
-            build-type: "mps"
-          - backend: "rfdetr"
-            tag-suffix: "-metal-darwin-arm64-rfdetr"
-            build-type: "mps"
-          - backend: "kitten-tts"
-            tag-suffix: "-metal-darwin-arm64-kitten-tts"
-            build-type: "mps"
-          - backend: "piper"
-            tag-suffix: "-metal-darwin-arm64-piper"
-            build-type: "metal"
-            lang: "go"
-          - backend: "opus"
-            tag-suffix: "-metal-darwin-arm64-opus"
-            build-type: "metal"
-            lang: "go"
-          - backend: "silero-vad"
-            tag-suffix: "-metal-darwin-arm64-silero-vad"
-            build-type: "metal"
-            lang: "go"
-          - backend: "local-store"
-            tag-suffix: "-metal-darwin-arm64-local-store"
-            build-type: "metal"
-            lang: "go"
-          - backend: "llama-cpp-quantization"
-            tag-suffix: "-metal-darwin-arm64-llama-cpp-quantization"
-            build-type: "mps"
+      fail-fast: false
+      matrix: ${{ fromJSON(needs.derive-bases.outputs.matrix-darwin) }}
     with:
       backend: ${{ matrix.backend }}
       build-type: ${{ matrix.build-type }}
diff --git a/.github/workflows/backend_build.yml b/.github/workflows/backend_build.yml
index a7f6a8a5efd1..5f68af3b0f95 100644
--- a/.github/workflows/backend_build.yml
+++ b/.github/workflows/backend_build.yml
@@ -63,6 +63,16 @@ on:
         required: false
         default: ''
         type: string
+      base-image-prebuilt:
+        description: |
+          Optional reference to a prebuilt accel/lang base image
+          (quay.io/go-skynet/localai-base:<tag>). When set, the backend
+          Dockerfile FROMs this image instead of running the inline
+          bootstrap. See .github/workflows/base_images_python.yml and
+          .agents/ci-caching.md.
+        required: false
+        default: ''
+        type: string
     secrets:
       dockerUsername:
         required: false
@@ -228,6 +238,7 @@ jobs:
             APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
             APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
             DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
+            BASE_IMAGE_PREBUILT=${{ inputs.base-image-prebuilt }}
           context: ${{ inputs.context }}
           file: ${{ inputs.dockerfile }}
           cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
@@ -254,6 +265,7 @@ jobs:
             APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
             APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
             DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
+            BASE_IMAGE_PREBUILT=${{ inputs.base-image-prebuilt }}
           context: ${{ inputs.context }}
           file: ${{ inputs.dockerfile }}
           cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
diff --git a/.github/workflows/backend_pr.yml b/.github/workflows/backend_pr.yml
index 5a557b38bbb2..5ad3fb7ad079 100644
--- a/.github/workflows/backend_pr.yml
+++ b/.github/workflows/backend_pr.yml
@@ -13,8 +13,10 @@ jobs:
     outputs:
       matrix: ${{ steps.set-matrix.outputs.matrix }}
       matrix-darwin: ${{ steps.set-matrix.outputs.matrix-darwin }}
+      bases-matrix: ${{ steps.set-matrix.outputs.bases-matrix }}
       has-backends: ${{ steps.set-matrix.outputs.has-backends }}
       has-backends-darwin: ${{ steps.set-matrix.outputs.has-backends-darwin }}
+      has-bases: ${{ steps.set-matrix.outputs.has-bases }}
     steps:
       - name: Checkout repository
         uses: actions/checkout@v6
@@ -27,7 +29,8 @@ jobs:
           bun add js-yaml
           bun add @octokit/core
 
-      # filters the matrix in backend.yml
+      # Filters the matrix from backend.yml against this PR's changed files
+      # AND derives the deduplicated bases-matrix consumed by build-bases.
       - name: Filter matrix for changed backends
         id: set-matrix
         env:
@@ -35,10 +38,34 @@ jobs:
           GITHUB_EVENT_PATH: ${{ github.event_path }}
         run: bun run scripts/changed-backends.js
 
-  backend-jobs:
+  build-bases:
     needs: generate-matrix
+    if: needs.generate-matrix.outputs.has-bases == 'true'
+    strategy:
+      fail-fast: false
+      matrix: ${{ fromJSON(needs.generate-matrix.outputs.bases-matrix) }}
+    uses: ./.github/workflows/base_images.yml
+    with:
+      lang: ${{ matrix.lang }}
+      base-image: ${{ matrix.base-image }}
+      build-type: ${{ matrix.build-type }}
+      cuda-major-version: ${{ matrix.cuda-major-version }}
+      cuda-minor-version: ${{ matrix.cuda-minor-version }}
+      ubuntu-version: ${{ matrix.ubuntu-version }}
+      platforms: ${{ matrix.platforms }}
+      runs-on: ${{ matrix.runs-on }}
+      tag-stem: ${{ matrix.tag-stem }}
+      skip-drivers: ${{ matrix.skip-drivers }}
+    secrets:
+      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+
+  backend-jobs:
+    needs: [generate-matrix, build-bases]
     uses: ./.github/workflows/backend_build.yml
-    if: needs.generate-matrix.outputs.has-backends == 'true'
+    if: |
+      always() && needs.generate-matrix.outputs.has-backends == 'true' &&
+      (needs.build-bases.result == 'success' || needs.build-bases.result == 'skipped')
     with:
       tag-latest: ${{ matrix.tag-latest }}
       tag-suffix: ${{ matrix.tag-suffix }}
@@ -54,12 +81,17 @@ jobs:
       context: ${{ matrix.context }}
       ubuntu-version: ${{ matrix.ubuntu-version }}
       amdgpu-targets: ${{ matrix.amdgpu-targets || 'gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201' }}
+      # The script annotates each filtered Python entry with the prebuilt
+      # base ref it should consume; non-Python entries get '' and run their
+      # own inline bootstrap.
+      base-image-prebuilt: ${{ matrix.base-image-prebuilt || '' }}
     secrets:
       quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
       quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
     strategy:
       fail-fast: true
       matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
+
   backend-jobs-darwin:
     needs: generate-matrix
     uses: ./.github/workflows/backend_build_darwin.yml
diff --git a/.github/workflows/base_images.yml b/.github/workflows/base_images.yml
new file mode 100644
index 000000000000..141c7411b875
--- /dev/null
+++ b/.github/workflows/base_images.yml
@@ -0,0 +1,145 @@
+---
+name: 'build base image (reusable)'
+
+# Builds and pushes one (lang, accel, arch, ubuntu, cuda) base image flavour
+# to quay.io/go-skynet/localai-base. Consumed by backend builds via the
+# BASE_IMAGE_PREBUILT build-arg. PR builds tag with `-pr${PR_NUMBER}` so the
+# same PR's backend matrix can opt-in to the freshly-built base; master
+# builds overwrite the unsuffixed tag for downstream consumption. See
+# .agents/ci-caching.md for the full tagging scheme.
+
+on:
+  workflow_call:
+    inputs:
+      lang:
+        description: 'Language toolchain (matches .docker/bases/Dockerfile.<lang>)'
+        required: true
+        type: string
+      base-image:
+        description: 'Upstream base image (ubuntu:24.04, rocm/dev-ubuntu-24.04:..., etc.)'
+        required: true
+        type: string
+      build-type:
+        description: 'BUILD_TYPE: empty for CPU, cublas, hipblas, vulkan, l4t, ...'
+        default: ''
+        type: string
+      cuda-major-version:
+        description: 'CUDA major version (only meaningful for cublas/l4t)'
+        default: '12'
+        type: string
+      cuda-minor-version:
+        description: 'CUDA minor version'
+        default: '9'
+        type: string
+      ubuntu-version:
+        description: 'Ubuntu version code (2204, 2404)'
+        default: '2404'
+        type: string
+      platforms:
+        description: 'Single platform per call (linux/amd64 or linux/arm64)'
+        required: true
+        type: string
+      runs-on:
+        description: 'Runner label'
+        required: true
+        type: string
+      tag-stem:
+        description: 'Stable portion of the image tag (e.g. python-cpu-amd64-2404)'
+        required: true
+        type: string
+      skip-drivers:
+        description: 'Pass-through to the base Dockerfile'
+        default: 'false'
+        type: string
+    secrets:
+      quayUsername:
+        required: false
+      quayPassword:
+        required: false
+    outputs:
+      image-ref:
+        description: 'Full image reference of the built base'
+        value: ${{ jobs.base-build.outputs.image-ref }}
+
+jobs:
+  base-build:
+    runs-on: ${{ inputs.runs-on }}
+    env:
+      quay_username: ${{ secrets.quayUsername }}
+    outputs:
+      image-ref: ${{ steps.compute_ref.outputs.ref }}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6
+
+      - name: Configure apt mirror on runner
+        id: apt_mirror
+        uses: ./.github/actions/configure-apt-mirror
+
+      - name: Free Disk Space (Ubuntu)
+        if: inputs.runs-on == 'ubuntu-latest'
+        uses: jlumbroso/free-disk-space@main
+        with:
+          tool-cache: true
+          android: true
+          dotnet: true
+          haskell: true
+          large-packages: true
+          docker-images: true
+          swap-storage: true
+
+      - name: Compute image ref
+        id: compute_ref
+        run: |
+          stem='${{ inputs.tag-stem }}'
+          if [ "${{ github.event_name }}" = "pull_request" ]; then
+            tag="${stem}-pr${{ github.event.number }}"
+          else
+            tag="${stem}"
+          fi
+          echo "tag=${tag}" >> "$GITHUB_OUTPUT"
+          echo "ref=quay.io/go-skynet/localai-base:${tag}" >> "$GITHUB_OUTPUT"
+
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@master
+        with:
+          platforms: all
+
+      - name: Set up Docker Buildx
+        id: buildx
+        uses: docker/setup-buildx-action@master
+
+      - name: Login to Quay.io
+        if: ${{ env.quay_username != '' }}
+        uses: docker/login-action@v4
+        with:
+          registry: quay.io
+          username: ${{ secrets.quayUsername }}
+          password: ${{ secrets.quayPassword }}
+
+      - name: Build and push base image
+        uses: docker/build-push-action@v7
+        with:
+          builder: ${{ steps.buildx.outputs.name }}
+          context: .
+          file: ./.docker/bases/Dockerfile.${{ inputs.lang }}
+          build-args: |
+            BUILD_TYPE=${{ inputs.build-type }}
+            CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
+            CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
+            BASE_IMAGE=${{ inputs.base-image }}
+            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
+            SKIP_DRIVERS=${{ inputs.skip-drivers }}
+            APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
+            APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
+          platforms: ${{ inputs.platforms }}
+          # Push on PRs as well (if creds present) so the PR's backend matrix
+          # can opt-in to the freshly-built base via -pr${N} tag.
+          push: ${{ env.quay_username != '' }}
+          tags: ${{ steps.compute_ref.outputs.ref }}
+          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:base-${{ inputs.tag-stem }}
+          cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:base-${{ inputs.tag-stem }},mode=max,ignore-error=true
+
+      - name: job summary
+        run: |
+          echo "Built base image: ${{ steps.compute_ref.outputs.ref }}" >> "$GITHUB_STEP_SUMMARY"
diff --git a/Makefile b/Makefile
index 1de03999d555..ac9cd09491ac 100644
--- a/Makefile
+++ b/Makefile
@@ -1072,6 +1072,90 @@ BACKEND_KOKOROS = kokoros|rust|.|false|true
 # C++ backends (Go wrapper with purego)
 BACKEND_SAM3_CPP = sam3-cpp|golang|.|false|true
 
+# Tag stem for the local prebuilt base images. Mirrors tagStem() in
+# scripts/changed-backends.js and the inline expression in
+# .github/workflows/backend.yml, so a `make docker-build-X` produces the
+# same FROM ref shape that CI uses.
+LOCAL_BASE_BUILD_TYPE := $(or $(BUILD_TYPE),cpu)
+LOCAL_BASE_UBUNTU_VERSION := $(or $(UBUNTU_VERSION),2404)
+LOCAL_BASE_CUDA_SUFFIX := $(if $(filter cublas l4t,$(BUILD_TYPE)),-cuda$(CUDA_MAJOR_VERSION).$(CUDA_MINOR_VERSION))
+LOCAL_BASE_PYTHON_TAG := localai-base:python-$(LOCAL_BASE_BUILD_TYPE)-$(LOCAL_BASE_UBUNTU_VERSION)$(LOCAL_BASE_CUDA_SUFFIX)
+LOCAL_BASE_GOLANG_TAG := localai-base:golang-$(LOCAL_BASE_BUILD_TYPE)-$(LOCAL_BASE_UBUNTU_VERSION)$(LOCAL_BASE_CUDA_SUFFIX)
+LOCAL_BASE_CPP_TAG := localai-base:cpp-$(LOCAL_BASE_BUILD_TYPE)-$(LOCAL_BASE_UBUNTU_VERSION)$(LOCAL_BASE_CUDA_SUFFIX)
+LOCAL_BASE_RUST_TAG := localai-base:rust-$(LOCAL_BASE_BUILD_TYPE)-$(LOCAL_BASE_UBUNTU_VERSION)
+
+# Per-(lang) base image build targets. Each backend's docker-build-X target
+# depends on the matching base via generate-docker-build-target below.
+# PHONY so docker handles its own layer caching.
+.PHONY: docker-build-python-base docker-build-golang-base docker-build-cpp-base docker-build-rust-base
+
+docker-build-python-base:
+	docker build \
+		--build-arg BUILD_TYPE=$(BUILD_TYPE) \
+		--build-arg BASE_IMAGE=$(or $(BASE_IMAGE),ubuntu:24.04) \
+		--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
+		--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
+		--build-arg UBUNTU_VERSION=$(LOCAL_BASE_UBUNTU_VERSION) \
+		--build-arg APT_MIRROR=$(APT_MIRROR) \
+		--build-arg APT_PORTS_MIRROR=$(APT_PORTS_MIRROR) \
+		$(if $(SKIP_DRIVERS),--build-arg SKIP_DRIVERS=$(SKIP_DRIVERS)) \
+		-t $(LOCAL_BASE_PYTHON_TAG) \
+		-f .docker/bases/Dockerfile.python \
+		.
+
+docker-build-golang-base:
+	docker build \
+		--build-arg BUILD_TYPE=$(BUILD_TYPE) \
+		--build-arg BASE_IMAGE=$(or $(BASE_IMAGE),ubuntu:24.04) \
+		--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
+		--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
+		--build-arg UBUNTU_VERSION=$(LOCAL_BASE_UBUNTU_VERSION) \
+		--build-arg APT_MIRROR=$(APT_MIRROR) \
+		--build-arg APT_PORTS_MIRROR=$(APT_PORTS_MIRROR) \
+		$(if $(SKIP_DRIVERS),--build-arg SKIP_DRIVERS=$(SKIP_DRIVERS)) \
+		-t $(LOCAL_BASE_GOLANG_TAG) \
+		-f .docker/bases/Dockerfile.golang \
+		.
+
+docker-build-cpp-base:
+	docker build \
+		--build-arg BUILD_TYPE=$(BUILD_TYPE) \
+		--build-arg BASE_IMAGE=$(or $(BASE_IMAGE),ubuntu:24.04) \
+		--build-arg CUDA_MAJOR_VERSION=$(CUDA_MAJOR_VERSION) \
+		--build-arg CUDA_MINOR_VERSION=$(CUDA_MINOR_VERSION) \
+		--build-arg UBUNTU_VERSION=$(LOCAL_BASE_UBUNTU_VERSION) \
+		--build-arg APT_MIRROR=$(APT_MIRROR) \
+		--build-arg APT_PORTS_MIRROR=$(APT_PORTS_MIRROR) \
+		$(if $(SKIP_DRIVERS),--build-arg SKIP_DRIVERS=$(SKIP_DRIVERS)) \
+		-t $(LOCAL_BASE_CPP_TAG) \
+		-f .docker/bases/Dockerfile.cpp \
+		.
+
+docker-build-rust-base:
+	docker build \
+		--build-arg BASE_IMAGE=$(or $(BASE_IMAGE),ubuntu:24.04) \
+		--build-arg UBUNTU_VERSION=$(LOCAL_BASE_UBUNTU_VERSION) \
+		--build-arg APT_MIRROR=$(APT_MIRROR) \
+		--build-arg APT_PORTS_MIRROR=$(APT_PORTS_MIRROR) \
+		-t $(LOCAL_BASE_RUST_TAG) \
+		-f .docker/bases/Dockerfile.rust \
+		.
+
+# Map a consumer dockerfile-type to the base-image tag it should consume.
+# Mirrors langOf() in scripts/changed-backends.js: the C++ trio
+# (llama-cpp/ik-llama-cpp/turboquant) all consume the shared cpp base.
+local-base-tag = $(strip \
+	$(if $(filter python,$(1)),$(LOCAL_BASE_PYTHON_TAG), \
+	$(if $(filter golang,$(1)),$(LOCAL_BASE_GOLANG_TAG), \
+	$(if $(filter llama-cpp ik-llama-cpp turboquant,$(1)),$(LOCAL_BASE_CPP_TAG), \
+	$(if $(filter rust,$(1)),$(LOCAL_BASE_RUST_TAG))))))
+
+local-base-target = $(strip \
+	$(if $(filter python,$(1)),docker-build-python-base, \
+	$(if $(filter golang,$(1)),docker-build-golang-base, \
+	$(if $(filter llama-cpp ik-llama-cpp turboquant,$(1)),docker-build-cpp-base, \
+	$(if $(filter rust,$(1)),docker-build-rust-base)))))
+
 # Helper function to build docker image for a backend
 # Usage: $(call docker-build-backend,BACKEND_NAME,DOCKERFILE_TYPE,BUILD_CONTEXT,PROGRESS_FLAG,NEEDS_BACKEND_ARG)
 define docker-build-backend
@@ -1084,15 +1168,18 @@ define docker-build-backend
 		--build-arg UBUNTU_CODENAME=$(UBUNTU_CODENAME) \
 		--build-arg APT_MIRROR=$(APT_MIRROR) \
 		--build-arg APT_PORTS_MIRROR=$(APT_PORTS_MIRROR) \
+		$(if $(call local-base-tag,$(2)),--build-arg BASE_IMAGE_PREBUILT=$(call local-base-tag,$(2))) \
 		$(if $(FROM_SOURCE),--build-arg FROM_SOURCE=$(FROM_SOURCE)) \
 		$(if $(AMDGPU_TARGETS),--build-arg AMDGPU_TARGETS=$(AMDGPU_TARGETS)) \
 		$(if $(filter true,$(5)),--build-arg BACKEND=$(1)) \
 		-t local-ai-backend:$(1) -f backend/Dockerfile.$(2) $(3)
 endef
 
-# Generate docker-build targets from backend definitions
+# Generate docker-build targets from backend definitions. Each consumer
+# gets the matching layered base as a prerequisite so the FROM in the
+# slimmed Dockerfile resolves locally. The map lives in local-base-target.
 define generate-docker-build-target
-docker-build-$(word 1,$(subst |, ,$(1))):
+docker-build-$(word 1,$(subst |, ,$(1))): $(call local-base-target,$(word 2,$(subst |, ,$(1))))
 	$$(call docker-build-backend,$(word 1,$(subst |, ,$(1))),$(word 2,$(subst |, ,$(1))),$(word 3,$(subst |, ,$(1))),$(word 4,$(subst |, ,$(1))),$(word 5,$(subst |, ,$(1))))
 endef
 
diff --git a/backend/Dockerfile.golang b/backend/Dockerfile.golang
index 4d0980a81e37..f01e86e56608 100644
--- a/backend/Dockerfile.golang
+++ b/backend/Dockerfile.golang
@@ -1,202 +1,37 @@
-ARG BASE_IMAGE=ubuntu:24.04
-ARG APT_MIRROR=""
-ARG APT_PORTS_MIRROR=""
+# Builds a single Go backend on top of the shared
+# .docker/bases/Dockerfile.golang base. The base bakes in apt + GPU SDK +
+# Go toolchain + protoc + grpc tooling, so this stage only carries the
+# per-backend opus-dev install + COPY + `make build`.
+#
+# CI orchestration (.github/workflows/backend.yml + backend_pr.yml) builds
+# the right base flavour automatically via scripts/changed-backends.js
+# and passes BASE_IMAGE_PREBUILT here. For local builds, run:
+#   make backend-image-base LANG=golang BUILD_TYPE=<...>
+#   make backend-image BACKEND=<...> BUILD_TYPE=<...>
+# See .agents/ci-caching.md.
+
+ARG BASE_IMAGE_PREBUILT
+
+FROM ${BASE_IMAGE_PREBUILT} AS builder
 
-FROM ${BASE_IMAGE} AS builder
 ARG BACKEND=rerankers
 ARG BUILD_TYPE
 ENV BUILD_TYPE=${BUILD_TYPE}
 ARG CUDA_MAJOR_VERSION
 ARG CUDA_MINOR_VERSION
-ARG SKIP_DRIVERS=false
 ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
 ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
-ENV DEBIAN_FRONTEND=noninteractive
 ARG TARGETARCH
 ARG TARGETVARIANT
-ARG GO_VERSION=1.25.4
-ARG UBUNTU_VERSION=2404
 ARG AMDGPU_TARGETS
 ENV AMDGPU_TARGETS=${AMDGPU_TARGETS}
-ARG APT_MIRROR
-ARG APT_PORTS_MIRROR
-
-RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
-    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
-    apt-get update && \
-    apt-get install -y --no-install-recommends \
-        build-essential \
-        gcc-14 g++-14 \
-        git ccache \
-        ca-certificates \
-        make cmake wget libopenblas-dev \
-        curl unzip \
-        libssl-dev && \
-    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-14 100 \
-        --slave /usr/bin/g++ g++ /usr/bin/g++-14 \
-        --slave /usr/bin/gcov gcov /usr/bin/gcov-14 && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-
-# Cuda
-ENV PATH=/usr/local/cuda/bin:${PATH}
-
-# HipBLAS requirements
-ENV PATH=/opt/rocm/bin:${PATH}
-
-
-# Vulkan requirements
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils wget gpg-agent && \
-        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
-            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
-            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
-            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
-            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
-            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
-            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            mkdir -p /opt/vulkan-sdk && \
-            mv 1.4.335.0 /opt/vulkan-sdk/ && \
-            cd /opt/vulkan-sdk/1.4.335.0 && \
-            ./vulkansdk --no-deps --maxjobs \
-                vulkan-loader \
-                vulkan-validationlayers \
-                vulkan-extensionlayer \
-                vulkan-tools \
-                shaderc && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
-            rm -rf /opt/vulkan-sdk
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            mkdir vulkan && cd vulkan && \
-            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
-            tar -xvf vulkan-sdk.tar.xz && \
-            rm vulkan-sdk.tar.xz && \
-            cd 1.4.335.0 && \
-            cp -rfv aarch64/bin/* /usr/bin/ && \
-            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
-            cp -rfv aarch64/include/* /usr/include/ && \
-            cp -rfv aarch64/share/* /usr/share/ && \
-            cd ../.. && \
-            rm -rf vulkan
-        fi
-        ldconfig && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# CuBLAS requirements
-RUN <<EOT bash
-    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
-            else
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
-            fi
-        fi
-        dpkg -i cuda-keyring_1.1-1_all.deb && \
-        rm -f cuda-keyring_1.1-1_all.deb && \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
-            apt-get install -y --no-install-recommends \
-            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        fi
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-
-# https://github.com/NVIDIA/Isaac-GR00T/issues/343
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
-        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
-        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get install -y nvpl
-    fi
-EOT
-
-# If we are building with clblas support, we need the libraries for the builds
-RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            libclblast-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* \
-    ; fi
-
-RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            hipblas-dev \
-            hipblaslt-dev \
-            rocblas-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* && \
-        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
-        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
-        ldconfig \
-    ; fi
-
-# Install Go
-RUN curl -L -s https://go.dev/dl/go${GO_VERSION}.linux-${TARGETARCH}.tar.gz | tar -C /usr/local -xz
-ENV PATH=$PATH:/root/go/bin:/usr/local/go/bin:/usr/local/bin
-
-# Install grpc compilers
-RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 && \
-    go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-RUN echo "TARGETARCH: $TARGETARCH"
-
-# We need protoc installed, and the version in 22.04 is too old.  We will create one as part installing the GRPC build below
-# but that will also being in a newer version of absl which stablediffusion cannot compile with.  This version of protoc is only
-# here so that we can generate the grpc code for the stablediffusion build
-RUN <<EOT bash
-    if [ "amd64" = "$TARGETARCH" ]; then
-        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
-        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-        rm protoc.zip
-    fi
-    if [ "arm64" = "$TARGETARCH" ]; then
-        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
-        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-        rm protoc.zip
-    fi
-EOT
 
+# opus-dev is only needed for the opus backend; install on demand to keep
+# every other golang backend's base image lean.
 RUN if [ "${BACKEND}" = "opus" ]; then \
-    apt-get update && apt-get install -y --no-install-recommends libopus-dev pkg-config && \
-    apt-get clean && rm -rf /var/lib/apt/lists/*; \
-fi
+        apt-get update && apt-get install -y --no-install-recommends libopus-dev pkg-config && \
+        apt-get clean && rm -rf /var/lib/apt/lists/*; \
+    fi
 
 COPY . /LocalAI
 
diff --git a/backend/Dockerfile.ik-llama-cpp b/backend/Dockerfile.ik-llama-cpp
index e2387dfba02f..6c40340e4837 100644
--- a/backend/Dockerfile.ik-llama-cpp
+++ b/backend/Dockerfile.ik-llama-cpp
@@ -1,261 +1,25 @@
-ARG BASE_IMAGE=ubuntu:24.04
-ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
-ARG APT_MIRROR=""
-ARG APT_PORTS_MIRROR=""
+# Builds the ik-llama-cpp backend on top of the shared
+# .docker/bases/Dockerfile.cpp base (shared with llama-cpp/turboquant).
+# See backend/Dockerfile.llama-cpp for the rationale; this consumer differs
+# only in the make targets at the end.
 
+ARG BASE_IMAGE_PREBUILT
 
-# The grpc target does one thing, it builds and installs GRPC.  This is in it's own layer so that it can be effectively cached by CI.
-# You probably don't need to change anything here, and if you do, make sure that CI is adjusted so that the cache continues to work.
-FROM ${GRPC_BASE_IMAGE} AS grpc
+FROM ${BASE_IMAGE_PREBUILT} AS builder
 
-# This is a bit of a hack, but it's required in order to be able to effectively cache this layer in CI
-ARG GRPC_MAKEFLAGS="-j4 -Otarget"
-ARG GRPC_VERSION=v1.65.0
-ARG CMAKE_FROM_SOURCE=false
-# CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues
-ARG CMAKE_VERSION=3.31.10
-ARG APT_MIRROR
-ARG APT_PORTS_MIRROR
-
-ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
-
-WORKDIR /build
-
-RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
-    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
-    apt-get update && \
-    apt-get install -y --no-install-recommends \
-        ca-certificates \
-        build-essential curl libssl-dev \
-        git wget && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-# Install CMake (the version in 22.04 is too old)
-RUN <<EOT bash
-    if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
-        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
-    else
-        apt-get update && \
-        apt-get install -y \
-            cmake && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# We install GRPC to a different prefix here so that we can copy in only the build artifacts later
-# saves several hundred MB on the final docker image size vs copying in the entire GRPC source tree
-# and running make install in the target container
-RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
-    mkdir -p /build/grpc/cmake/build && \
-    cd /build/grpc/cmake/build && \
-    sed -i "216i\  TESTONLY" "../../third_party/abseil-cpp/absl/container/CMakeLists.txt" && \
-    cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX:PATH=/opt/grpc ../.. && \
-    make && \
-    make install && \
-    rm -rf /build
-
-FROM ${BASE_IMAGE} AS builder
-ARG CMAKE_FROM_SOURCE=false
-ARG CMAKE_VERSION=3.31.10
-# We can target specific CUDA ARCHITECTURES like --build-arg CUDA_DOCKER_ARCH='75;86;89;120'
 ARG CUDA_DOCKER_ARCH
 ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
 ARG CMAKE_ARGS
 ENV CMAKE_ARGS=${CMAKE_ARGS}
-ARG BACKEND=rerankers
+ARG BACKEND=ik-llama-cpp
 ARG BUILD_TYPE
 ENV BUILD_TYPE=${BUILD_TYPE}
 ARG CUDA_MAJOR_VERSION
 ARG CUDA_MINOR_VERSION
-ARG SKIP_DRIVERS=false
 ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
 ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
-ENV DEBIAN_FRONTEND=noninteractive
 ARG TARGETARCH
 ARG TARGETVARIANT
-ARG GO_VERSION=1.25.4
-ARG UBUNTU_VERSION=2404
-ARG APT_MIRROR
-ARG APT_PORTS_MIRROR
-
-RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
-    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
-    apt-get update && \
-    apt-get install -y --no-install-recommends \
-        build-essential \
-        ccache git \
-        ca-certificates \
-        make \
-        pkg-config libcurl4-openssl-dev \
-        curl unzip \
-        libssl-dev wget && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-# Cuda
-ENV PATH=/usr/local/cuda/bin:${PATH}
-
-# HipBLAS requirements
-ENV PATH=/opt/rocm/bin:${PATH}
-
-
-# Vulkan requirements
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils wget gpg-agent && \
-        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
-            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
-            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
-            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
-            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
-            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
-            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            mkdir -p /opt/vulkan-sdk && \
-            mv 1.4.335.0 /opt/vulkan-sdk/ && \
-            cd /opt/vulkan-sdk/1.4.335.0 && \
-            ./vulkansdk --no-deps --maxjobs \
-                vulkan-loader \
-                vulkan-validationlayers \
-                vulkan-extensionlayer \
-                vulkan-tools \
-                shaderc && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
-            rm -rf /opt/vulkan-sdk
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            mkdir vulkan && cd vulkan && \
-            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
-            tar -xvf vulkan-sdk.tar.xz && \
-            rm vulkan-sdk.tar.xz && \
-            cd 1.4.335.0 && \
-            cp -rfv aarch64/bin/* /usr/bin/ && \
-            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
-            cp -rfv aarch64/include/* /usr/include/ && \
-            cp -rfv aarch64/share/* /usr/share/ && \
-            cd ../.. && \
-            rm -rf vulkan
-        fi
-        ldconfig && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# CuBLAS requirements
-RUN <<EOT bash
-    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
-            else
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
-            fi
-        fi
-        dpkg -i cuda-keyring_1.1-1_all.deb && \
-        rm -f cuda-keyring_1.1-1_all.deb && \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
-            apt-get install -y --no-install-recommends \
-            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        fi
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-
-# https://github.com/NVIDIA/Isaac-GR00T/issues/343
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
-        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
-        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get install -y nvpl
-    fi
-EOT
-
-# If we are building with clblas support, we need the libraries for the builds
-RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            libclblast-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* \
-    ; fi
-
-RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            hipblas-dev \
-            hipblaslt-dev \
-            rocblas-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* && \
-        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
-        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
-        ldconfig \
-    ; fi
-
-RUN echo "TARGETARCH: $TARGETARCH"
-
-# We need protoc installed, and the version in 22.04 is too old.  We will create one as part installing the GRPC build below
-# but that will also being in a newer version of absl which stablediffusion cannot compile with.  This version of protoc is only
-# here so that we can generate the grpc code for the stablediffusion build
-RUN <<EOT bash
-    if [ "amd64" = "$TARGETARCH" ]; then
-        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
-        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-        rm protoc.zip
-    fi
-    if [ "arm64" = "$TARGETARCH" ]; then
-        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
-        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-        rm protoc.zip
-    fi
-EOT
-
-# Install CMake (the version in 22.04 is too old)
-RUN <<EOT bash
-    if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
-        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
-    else
-        apt-get update && \
-        apt-get install -y \
-            cmake && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-COPY --from=grpc /opt/grpc /usr/local
-
 
 COPY . /LocalAI
 
@@ -281,12 +45,10 @@ fi
 EOT
 
 
-# Copy libraries using a script to handle architecture differences
 RUN make -BC /LocalAI/backend/cpp/ik-llama-cpp package
 
 
 FROM scratch
 
 
-# Copy all available binaries (the build process only creates the appropriate ones for the target architecture)
 COPY --from=builder /LocalAI/backend/cpp/ik-llama-cpp/package/. ./
diff --git a/backend/Dockerfile.llama-cpp b/backend/Dockerfile.llama-cpp
index 2a2f7b303cc8..8a03a0842f6f 100644
--- a/backend/Dockerfile.llama-cpp
+++ b/backend/Dockerfile.llama-cpp
@@ -1,64 +1,15 @@
-ARG BASE_IMAGE=ubuntu:24.04
-ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
-ARG APT_MIRROR=""
-ARG APT_PORTS_MIRROR=""
+# Builds the llama-cpp backend on top of the shared
+# .docker/bases/Dockerfile.cpp base. The base bakes in apt + GPU SDK +
+# protoc + cmake + GRPC, so this stage only carries the COPY + `make`
+# invocations and the final scratch-stage package.
+#
+# CI orchestration (.github/workflows/backend.yml + backend_pr.yml) passes
+# BASE_IMAGE_PREBUILT. See .agents/ci-caching.md.
 
+ARG BASE_IMAGE_PREBUILT
 
-# The grpc target does one thing, it builds and installs GRPC.  This is in it's own layer so that it can be effectively cached by CI.
-# You probably don't need to change anything here, and if you do, make sure that CI is adjusted so that the cache continues to work.
-FROM ${GRPC_BASE_IMAGE} AS grpc
+FROM ${BASE_IMAGE_PREBUILT} AS builder
 
-# This is a bit of a hack, but it's required in order to be able to effectively cache this layer in CI
-ARG GRPC_MAKEFLAGS="-j4 -Otarget"
-ARG GRPC_VERSION=v1.65.0
-ARG CMAKE_FROM_SOURCE=false
-# CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues
-ARG CMAKE_VERSION=3.31.10
-ARG APT_MIRROR
-ARG APT_PORTS_MIRROR
-
-ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
-
-WORKDIR /build
-
-RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
-    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
-    apt-get update && \
-    apt-get install -y --no-install-recommends \
-        ca-certificates \
-        build-essential curl libssl-dev \
-        git wget && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-# Install CMake (the version in 22.04 is too old)
-RUN <<EOT bash
-    if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
-        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
-    else
-        apt-get update && \
-        apt-get install -y \
-            cmake && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# We install GRPC to a different prefix here so that we can copy in only the build artifacts later
-# saves several hundred MB on the final docker image size vs copying in the entire GRPC source tree
-# and running make install in the target container
-RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
-    mkdir -p /build/grpc/cmake/build && \
-    cd /build/grpc/cmake/build && \
-    sed -i "216i\  TESTONLY" "../../third_party/abseil-cpp/absl/container/CMakeLists.txt" && \
-    cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX:PATH=/opt/grpc ../.. && \
-    make && \
-    make install && \
-    rm -rf /build
-
-FROM ${BASE_IMAGE} AS builder
-ARG CMAKE_FROM_SOURCE=false
-ARG CMAKE_VERSION=3.31.10
 # We can target specific CUDA ARCHITECTURES like --build-arg CUDA_DOCKER_ARCH='75;86;89;120'
 ARG CUDA_DOCKER_ARCH
 ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
@@ -66,202 +17,15 @@ ARG CMAKE_ARGS
 ENV CMAKE_ARGS=${CMAKE_ARGS}
 ARG AMDGPU_TARGETS
 ENV AMDGPU_TARGETS=${AMDGPU_TARGETS}
-ARG BACKEND=rerankers
+ARG BACKEND=llama-cpp
 ARG BUILD_TYPE
 ENV BUILD_TYPE=${BUILD_TYPE}
 ARG CUDA_MAJOR_VERSION
 ARG CUDA_MINOR_VERSION
-ARG SKIP_DRIVERS=false
 ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
 ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
-ENV DEBIAN_FRONTEND=noninteractive
 ARG TARGETARCH
 ARG TARGETVARIANT
-ARG GO_VERSION=1.25.4
-ARG UBUNTU_VERSION=2404
-ARG APT_MIRROR
-ARG APT_PORTS_MIRROR
-
-RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
-    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
-    apt-get update && \
-    apt-get install -y --no-install-recommends \
-        build-essential \
-        ccache git \
-        ca-certificates \
-        make \
-        pkg-config libcurl4-openssl-dev \
-        curl unzip \
-        libssl-dev wget && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-# Cuda
-ENV PATH=/usr/local/cuda/bin:${PATH}
-
-# HipBLAS requirements
-ENV PATH=/opt/rocm/bin:${PATH}
-
-
-# Vulkan requirements
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils wget gpg-agent && \
-        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
-            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
-            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
-            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
-            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
-            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
-            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            mkdir -p /opt/vulkan-sdk && \
-            mv 1.4.335.0 /opt/vulkan-sdk/ && \
-            cd /opt/vulkan-sdk/1.4.335.0 && \
-            ./vulkansdk --no-deps --maxjobs \
-                vulkan-loader \
-                vulkan-validationlayers \
-                vulkan-extensionlayer \
-                vulkan-tools \
-                shaderc && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
-            rm -rf /opt/vulkan-sdk
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            mkdir vulkan && cd vulkan && \
-            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
-            tar -xvf vulkan-sdk.tar.xz && \
-            rm vulkan-sdk.tar.xz && \
-            cd 1.4.335.0 && \
-            cp -rfv aarch64/bin/* /usr/bin/ && \
-            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
-            cp -rfv aarch64/include/* /usr/include/ && \
-            cp -rfv aarch64/share/* /usr/share/ && \
-            cd ../.. && \
-            rm -rf vulkan
-        fi
-        ldconfig && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# CuBLAS requirements
-RUN <<EOT bash
-    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
-            else
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
-            fi
-        fi
-        dpkg -i cuda-keyring_1.1-1_all.deb && \
-        rm -f cuda-keyring_1.1-1_all.deb && \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
-            apt-get install -y --no-install-recommends \
-            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        fi
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-
-# https://github.com/NVIDIA/Isaac-GR00T/issues/343
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
-        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
-        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get install -y nvpl
-    fi
-EOT
-
-# If we are building with clblas support, we need the libraries for the builds
-RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            libclblast-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* \
-    ; fi
-
-RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            hipblas-dev \
-            hipblaslt-dev \
-            rocblas-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* && \
-        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
-        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
-        ldconfig && \
-        # Log which GPU architectures have rocBLAS kernel support
-        echo "rocBLAS library data architectures:" && \
-        (ls /opt/rocm*/lib/rocblas/library/Kernels* 2>/dev/null || ls /opt/rocm*/lib64/rocblas/library/Kernels* 2>/dev/null) | grep -oP 'gfx[0-9a-z+-]+' | sort -u || \
-        echo "WARNING: No rocBLAS kernel data found" \
-    ; fi
-
-RUN echo "TARGETARCH: $TARGETARCH"
-
-# We need protoc installed, and the version in 22.04 is too old.  We will create one as part installing the GRPC build below
-# but that will also being in a newer version of absl which stablediffusion cannot compile with.  This version of protoc is only
-# here so that we can generate the grpc code for the stablediffusion build
-RUN <<EOT bash
-    if [ "amd64" = "$TARGETARCH" ]; then
-        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
-        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-        rm protoc.zip
-    fi
-    if [ "arm64" = "$TARGETARCH" ]; then
-        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
-        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-        rm protoc.zip
-    fi
-EOT
-
-# Install CMake (the version in 22.04 is too old)
-RUN <<EOT bash
-    if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
-        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
-    else
-        apt-get update && \
-        apt-get install -y \
-            cmake && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-COPY --from=grpc /opt/grpc /usr/local
-
 
 COPY . /LocalAI
 
diff --git a/backend/Dockerfile.python b/backend/Dockerfile.python
index 63a86b6be3a2..405181d6e51c 100644
--- a/backend/Dockerfile.python
+++ b/backend/Dockerfile.python
@@ -1,202 +1,26 @@
-ARG BASE_IMAGE=ubuntu:24.04
-ARG APT_MIRROR=""
-ARG APT_PORTS_MIRROR=""
+# Builds a single Python backend on top of the shared
+# .docker/bases/Dockerfile.python base. The base bakes in apt-update + GPU
+# SDK install + python toolchain (uv, pip, rustup, grpcio-tools), so this
+# stage only carries the per-backend source COPY + `make`.
+#
+# CI orchestration (.github/workflows/backend.yml + backend_pr.yml) builds
+# the right base flavour automatically via scripts/derive-build-matrix.js
+# and passes BASE_IMAGE_PREBUILT here. For local builds, run:
+#   make backend-image-base BUILD_TYPE=<...>     # build the base
+#   make backend-image BACKEND=<...> BUILD_TYPE=<...>
+# See .agents/ci-caching.md.
+
+ARG BASE_IMAGE_PREBUILT
+
+FROM ${BASE_IMAGE_PREBUILT} AS builder
 
-FROM ${BASE_IMAGE} AS builder
 ARG BACKEND=rerankers
 ARG BUILD_TYPE
 ENV BUILD_TYPE=${BUILD_TYPE}
 ARG CUDA_MAJOR_VERSION
 ARG CUDA_MINOR_VERSION
-ARG SKIP_DRIVERS=false
 ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
 ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
-ENV DEBIAN_FRONTEND=noninteractive
-ARG TARGETARCH
-ARG TARGETVARIANT
-ARG UBUNTU_VERSION=2404
-ARG APT_MIRROR
-ARG APT_PORTS_MIRROR
-
-RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
-    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
-    apt-get update && \
-    apt-get install -y --no-install-recommends \
-        build-essential \
-        ccache \
-        ca-certificates \
-        espeak-ng \
-        curl \
-        libssl-dev \
-        git wget \
-        git-lfs \
-        unzip clang \
-        upx-ucl \
-        curl python3-pip \
-        python-is-python3 \
-        python3-dev llvm \
-        libnuma1 libgomp1 \
-        python3-venv make cmake && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-RUN <<EOT bash
-    if [ "${UBUNTU_VERSION}" = "2404" ]; then
-        pip install --break-system-packages --user --upgrade pip
-    else
-        pip install --upgrade pip
-    fi
-EOT
-
-
-# Cuda
-ENV PATH=/usr/local/cuda/bin:${PATH}
-
-# HipBLAS requirements
-ENV PATH=/opt/rocm/bin:${PATH}
-
-# Vulkan requirements
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils wget gpg-agent && \
-        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
-            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
-            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
-            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
-            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
-            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
-            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            mkdir -p /opt/vulkan-sdk && \
-            mv 1.4.335.0 /opt/vulkan-sdk/ && \
-            cd /opt/vulkan-sdk/1.4.335.0 && \
-            ./vulkansdk --no-deps --maxjobs \
-                vulkan-loader \
-                vulkan-validationlayers \
-                vulkan-extensionlayer \
-                vulkan-tools \
-                shaderc && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
-            rm -rf /opt/vulkan-sdk
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            mkdir vulkan && cd vulkan && \
-            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
-            tar -xvf vulkan-sdk.tar.xz && \
-            rm vulkan-sdk.tar.xz && \
-            cd 1.4.335.0 && \
-            cp -rfv aarch64/bin/* /usr/bin/ && \
-            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
-            cp -rfv aarch64/include/* /usr/include/ && \
-            cp -rfv aarch64/share/* /usr/share/ && \
-            cd ../.. && \
-            rm -rf vulkan
-        fi
-        ldconfig && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# CuBLAS requirements
-RUN <<EOT bash
-    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
-            else
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
-            fi
-        fi
-        dpkg -i cuda-keyring_1.1-1_all.deb && \
-        rm -f cuda-keyring_1.1-1_all.deb && \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
-            apt-get install -y --no-install-recommends \
-            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        fi
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-
-# https://github.com/NVIDIA/Isaac-GR00T/issues/343
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
-        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
-        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get install -y nvpl
-    fi
-EOT
-
-# If we are building with clblas support, we need the libraries for the builds
-RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            libclblast-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* \
-    ; fi
-
-RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            hipblas-dev \
-            hipblaslt-dev \
-            rocblas-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* && \
-        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
-        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
-        ldconfig \
-    ; fi
-
-RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
-    ln -s /opt/rocm-**/lib/llvm/lib/libomp.so /usr/lib/libomp.so \
-    ; fi
-
-# Install uv as a system package
-RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/bin sh
-ENV PATH="/root/.cargo/bin:${PATH}"
-# Increase timeout for uv installs behind slow networks
-ENV UV_HTTP_TIMEOUT=180
-RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
-
-# Install grpcio-tools (the version in 22.04 is too old)
-RUN <<EOT bash
-    if [ "${UBUNTU_VERSION}" = "2404" ]; then
-        pip install --break-system-packages --user grpcio-tools==1.71.0 grpcio==1.71.0
-    else
-        pip install grpcio-tools==1.71.0 grpcio==1.71.0
-    fi
-EOT
-
 
 COPY backend/python/${BACKEND} /${BACKEND}
 COPY backend/backend.proto /${BACKEND}/backend.proto
diff --git a/backend/Dockerfile.rust b/backend/Dockerfile.rust
index 177e77d46c01..6d18ea15ad0e 100644
--- a/backend/Dockerfile.rust
+++ b/backend/Dockerfile.rust
@@ -1,37 +1,15 @@
-ARG BASE_IMAGE=ubuntu:24.04
-ARG APT_MIRROR=""
-ARG APT_PORTS_MIRROR=""
+# Builds a single Rust backend on top of the shared
+# .docker/bases/Dockerfile.rust base. The base bakes in apt + Rust +
+# protobuf-compiler + audio dev libs (espeak/sonic/pcaudio/opus), so this
+# stage only carries the per-backend COPY + `make build`.
+#
+# CI orchestration (.github/workflows/backend.yml + backend_pr.yml) passes
+# BASE_IMAGE_PREBUILT. See .agents/ci-caching.md.
 
-FROM ${BASE_IMAGE} AS builder
-ARG BACKEND=kokoros
-ENV DEBIAN_FRONTEND=noninteractive
-ARG TARGETARCH
-ARG TARGETVARIANT
-ARG APT_MIRROR
-ARG APT_PORTS_MIRROR
-
-RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
-    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
-    apt-get update && \
-    apt-get install -y --no-install-recommends \
-        build-essential \
-        git ccache \
-        ca-certificates \
-        make cmake wget \
-        curl unzip \
-        clang \
-        pkg-config \
-        libssl-dev \
-        espeak-ng libespeak-ng-dev \
-        libsonic-dev libpcaudio-dev \
-        libopus-dev \
-        protobuf-compiler && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
+ARG BASE_IMAGE_PREBUILT
 
-# Install Rust
-RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
-ENV PATH="/root/.cargo/bin:${PATH}"
+FROM ${BASE_IMAGE_PREBUILT} AS builder
+ARG BACKEND=kokoros
 
 COPY . /LocalAI
 
diff --git a/backend/Dockerfile.turboquant b/backend/Dockerfile.turboquant
index 4235b0fb2847..932d52c019c3 100644
--- a/backend/Dockerfile.turboquant
+++ b/backend/Dockerfile.turboquant
@@ -1,265 +1,25 @@
-ARG BASE_IMAGE=ubuntu:24.04
-ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
-ARG APT_MIRROR=""
-ARG APT_PORTS_MIRROR=""
+# Builds the turboquant backend on top of the shared
+# .docker/bases/Dockerfile.cpp base (shared with llama-cpp/ik-llama-cpp).
+# See backend/Dockerfile.llama-cpp for the rationale; this consumer differs
+# only in the make targets at the end.
 
+ARG BASE_IMAGE_PREBUILT
 
-# The grpc target does one thing, it builds and installs GRPC.  This is in it's own layer so that it can be effectively cached by CI.
-# You probably don't need to change anything here, and if you do, make sure that CI is adjusted so that the cache continues to work.
-FROM ${GRPC_BASE_IMAGE} AS grpc
+FROM ${BASE_IMAGE_PREBUILT} AS builder
 
-# This is a bit of a hack, but it's required in order to be able to effectively cache this layer in CI
-ARG GRPC_MAKEFLAGS="-j4 -Otarget"
-ARG GRPC_VERSION=v1.65.0
-ARG CMAKE_FROM_SOURCE=false
-# CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues
-ARG CMAKE_VERSION=3.31.10
-ARG APT_MIRROR
-ARG APT_PORTS_MIRROR
-
-ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
-
-WORKDIR /build
-
-RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
-    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
-    apt-get update && \
-    apt-get install -y --no-install-recommends \
-        ca-certificates \
-        build-essential curl libssl-dev \
-        git wget && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-# Install CMake (the version in 22.04 is too old)
-RUN <<EOT bash
-    if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
-        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
-    else
-        apt-get update && \
-        apt-get install -y \
-            cmake && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# We install GRPC to a different prefix here so that we can copy in only the build artifacts later
-# saves several hundred MB on the final docker image size vs copying in the entire GRPC source tree
-# and running make install in the target container
-RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
-    mkdir -p /build/grpc/cmake/build && \
-    cd /build/grpc/cmake/build && \
-    sed -i "216i\  TESTONLY" "../../third_party/abseil-cpp/absl/container/CMakeLists.txt" && \
-    cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX:PATH=/opt/grpc ../.. && \
-    make && \
-    make install && \
-    rm -rf /build
-
-FROM ${BASE_IMAGE} AS builder
-ARG CMAKE_FROM_SOURCE=false
-ARG CMAKE_VERSION=3.31.10
-# We can target specific CUDA ARCHITECTURES like --build-arg CUDA_DOCKER_ARCH='75;86;89;120'
 ARG CUDA_DOCKER_ARCH
 ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
 ARG CMAKE_ARGS
 ENV CMAKE_ARGS=${CMAKE_ARGS}
-ARG BACKEND=rerankers
+ARG BACKEND=turboquant
 ARG BUILD_TYPE
 ENV BUILD_TYPE=${BUILD_TYPE}
 ARG CUDA_MAJOR_VERSION
 ARG CUDA_MINOR_VERSION
-ARG SKIP_DRIVERS=false
 ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
 ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
-ENV DEBIAN_FRONTEND=noninteractive
 ARG TARGETARCH
 ARG TARGETVARIANT
-ARG GO_VERSION=1.25.4
-ARG UBUNTU_VERSION=2404
-ARG APT_MIRROR
-ARG APT_PORTS_MIRROR
-
-RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
-    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
-    apt-get update && \
-    apt-get install -y --no-install-recommends \
-        build-essential \
-        ccache git \
-        ca-certificates \
-        make \
-        pkg-config libcurl4-openssl-dev \
-        curl unzip \
-        libssl-dev wget && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-# Cuda
-ENV PATH=/usr/local/cuda/bin:${PATH}
-
-# HipBLAS requirements
-ENV PATH=/opt/rocm/bin:${PATH}
-
-
-# Vulkan requirements
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils wget gpg-agent && \
-        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
-            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
-            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
-            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
-            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
-            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
-            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            mkdir -p /opt/vulkan-sdk && \
-            mv 1.4.335.0 /opt/vulkan-sdk/ && \
-            cd /opt/vulkan-sdk/1.4.335.0 && \
-            ./vulkansdk --no-deps --maxjobs \
-                vulkan-loader \
-                vulkan-validationlayers \
-                vulkan-extensionlayer \
-                vulkan-tools \
-                shaderc && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
-            rm -rf /opt/vulkan-sdk
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            mkdir vulkan && cd vulkan && \
-            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
-            tar -xvf vulkan-sdk.tar.xz && \
-            rm vulkan-sdk.tar.xz && \
-            cd 1.4.335.0 && \
-            cp -rfv aarch64/bin/* /usr/bin/ && \
-            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
-            cp -rfv aarch64/include/* /usr/include/ && \
-            cp -rfv aarch64/share/* /usr/share/ && \
-            cd ../.. && \
-            rm -rf vulkan
-        fi
-        ldconfig && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# CuBLAS requirements
-RUN <<EOT bash
-    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
-            else
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
-            fi
-        fi
-        dpkg -i cuda-keyring_1.1-1_all.deb && \
-        rm -f cuda-keyring_1.1-1_all.deb && \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
-            apt-get install -y --no-install-recommends \
-            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        fi
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-
-# https://github.com/NVIDIA/Isaac-GR00T/issues/343
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
-        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
-        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get install -y nvpl
-    fi
-EOT
-
-# If we are building with clblas support, we need the libraries for the builds
-RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            libclblast-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* \
-    ; fi
-
-RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            hipblas-dev \
-            hipblaslt-dev \
-            rocblas-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* && \
-        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
-        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
-        ldconfig && \
-        # Log which GPU architectures have rocBLAS kernel support
-        echo "rocBLAS library data architectures:" && \
-        (ls /opt/rocm*/lib/rocblas/library/Kernels* 2>/dev/null || ls /opt/rocm*/lib64/rocblas/library/Kernels* 2>/dev/null) | grep -oP 'gfx[0-9a-z+-]+' | sort -u || \
-        echo "WARNING: No rocBLAS kernel data found" \
-    ; fi
-
-RUN echo "TARGETARCH: $TARGETARCH"
-
-# We need protoc installed, and the version in 22.04 is too old.  We will create one as part installing the GRPC build below
-# but that will also being in a newer version of absl which stablediffusion cannot compile with.  This version of protoc is only
-# here so that we can generate the grpc code for the stablediffusion build
-RUN <<EOT bash
-    if [ "amd64" = "$TARGETARCH" ]; then
-        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
-        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-        rm protoc.zip
-    fi
-    if [ "arm64" = "$TARGETARCH" ]; then
-        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
-        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-        rm protoc.zip
-    fi
-EOT
-
-# Install CMake (the version in 22.04 is too old)
-RUN <<EOT bash
-    if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
-        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
-    else
-        apt-get update && \
-        apt-get install -y \
-            cmake && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-COPY --from=grpc /opt/grpc /usr/local
-
 
 COPY . /LocalAI
 
diff --git a/scripts/changed-backends.js b/scripts/changed-backends.js
index 7ad770af9053..6618a75ebb8b 100644
--- a/scripts/changed-backends.js
+++ b/scripts/changed-backends.js
@@ -1,19 +1,91 @@
+// Compute the CI build pipeline from backend.yml's matrix:
+//   - matrix:           filtered (PR mode) or full (master mode) backend
+//                       matrix entries, with base-image-prebuilt annotated
+//                       for langs that have a prebuilt base recipe under
+//                       .docker/bases/.
+//   - matrix-darwin:    same idea for the darwin matrix.
+//   - bases-matrix:     deduplicated set of base images needed by the
+//                       filtered matrix, in the shape consumed by
+//                       .github/workflows/base_images.yml.
+//   - has-{backends,backends-darwin,bases}: gating booleans.
+//   - <backend>=true/false:  per-backend booleans for test-extra.yml.
+//
+// On PR events the matrix is filtered to backends whose source dirs
+// changed; if .docker/bases/Dockerfile.<lang> (or its workflow scaffolding)
+// changed, a canary entry per (lang × build-type × arch × cuda × ubuntu)
+// is added so the prebuilt-base path gets exercised end-to-end before
+// merge. See .agents/ci-caching.md.
+
 import fs from "fs";
 import yaml from "js-yaml";
 import { Octokit } from "@octokit/core";
 
-// Load backend.yml and parse matrix.include
-const backendYml = yaml.load(fs.readFileSync(".github/workflows/backend.yml", "utf8"));
-const jobs = backendYml.jobs;
-const backendJobs = jobs["backend-jobs"];
-const backendJobsDarwin = jobs["backend-jobs-darwin"];
-const includes = backendJobs.strategy.matrix.include;
-const includesDarwin = backendJobsDarwin.strategy.matrix.include;
+// Backend matrix lives in a sibling data file so the workflow can switch
+// to fromJSON without needing two copies of the same matrix. See
+// .github/backend-matrix.yaml.
+const matrixData = yaml.load(fs.readFileSync(".github/backend-matrix.yaml", "utf8"));
+const includes = matrixData.linux;
+const includesDarwin = matrixData.darwin;
 
 const eventPath = process.env.GITHUB_EVENT_PATH;
 const event = JSON.parse(fs.readFileSync(eventPath, "utf8"));
+const isPR = !!event.pull_request;
+const prNumber = isPR ? event.pull_request.number : null;
+
+// Langs with a prebuilt base recipe under .docker/bases/Dockerfile.<lang>.
+// Discovered at runtime so adding a new language tier (e.g. golang) only
+// requires creating that file + slimming the consumer Dockerfile; no
+// orchestration changes needed.
+const baseRecipeDir = ".docker/bases";
+const langsWithBase = new Set(
+  fs.existsSync(baseRecipeDir)
+    ? fs.readdirSync(baseRecipeDir)
+        .filter(f => f.startsWith("Dockerfile."))
+        .map(f => f.slice("Dockerfile.".length))
+    : []
+);
+
+// Files that, when changed in a PR, should fan out to canary backend
+// matrix entries for the affected lang. Keeps PR validation honest when a
+// PR only touches base scaffolding. Per-lang recipe paths
+// (.docker/bases/Dockerfile.<lang>) trigger only their own lang; the
+// shared scaffolding entries trigger every lang.
+const baseTriggerFiles = new Set([
+  ".docker/bases/Dockerfile.python",
+  ".docker/bases/Dockerfile.golang",
+  ".docker/bases/Dockerfile.cpp",
+  ".docker/bases/Dockerfile.rust",
+  ".docker/apt-mirror.sh",
+  ".github/workflows/base_images.yml",
+  ".github/actions/configure-apt-mirror/action.yml",
+  "scripts/changed-backends.js",
+]);
+// Maps a base lang back to the consumer Dockerfiles that build on top of
+// it. The cpp base is shared by the llama-cpp / ik-llama-cpp / turboquant
+// trio; everything else is 1:1 with the file suffix.
+const langTriggerSelector = {
+  python: (item) => item.dockerfile && item.dockerfile.endsWith("python"),
+  golang: (item) => item.dockerfile && item.dockerfile.endsWith("golang"),
+  rust: (item) => item.dockerfile && item.dockerfile.endsWith("rust"),
+  cpp: (item) =>
+    !!item.dockerfile && /Dockerfile\.(llama-cpp|ik-llama-cpp|turboquant)$/.test(item.dockerfile),
+};
+
+// ---------- helpers ----------
+
+function langOf(item) {
+  if (!item.dockerfile) return null;
+  // dockerfile is like "./backend/Dockerfile.python"
+  const m = item.dockerfile.match(/Dockerfile\.([\w-]+)$/);
+  if (!m) return null;
+  // The C++ trio (llama-cpp, ik-llama-cpp, turboquant) consume a shared
+  // cpp base image — they only differ in their per-backend make targets.
+  if (m[1] === "llama-cpp" || m[1] === "ik-llama-cpp" || m[1] === "turboquant") {
+    return "cpp";
+  }
+  return m[1];
+}
 
-// Infer backend path
 function inferBackendPath(item) {
   if (item.dockerfile.endsWith("python")) {
     return `backend/python/${item.backend}/`;
@@ -42,61 +114,196 @@ function inferBackendPathDarwin(item) {
   if (!item.lang) {
     return `backend/python/${item.backend}/`;
   }
-
   return `backend/${item.lang}/${item.backend}/`;
 }
 
-// Build a deduplicated map of backend name -> path prefix from all matrix entries
+function platformsOf(item) {
+  // matrix.platforms can be "linux/amd64", "linux/arm64", or
+  // "linux/amd64,linux/arm64". Always return a normalized array.
+  if (!item.platforms) return ["linux/amd64"];
+  return item.platforms.split(",").map(p => p.trim()).filter(Boolean);
+}
+
+// Slug a base image reference for inclusion in a tag-stem. Returns "" for
+// the default ubuntu:24.04 (which is the implicit BASE_IMAGE) so that case
+// keeps a clean stem. Other base images get a short, parseable suffix.
+function baseImageSlug(img) {
+  if (!img || img === "ubuntu:24.04") return "";
+  if (img.includes("l4t-jetpack")) {
+    const m = img.match(/r\d+(?:\.\d+)+/);
+    return `jetpack-${m ? m[0] : "x"}`;
+  }
+  if (img.includes("rocm/dev-ubuntu")) {
+    const m = img.match(/:([\d.]+)/);
+    return `rocm-${m ? m[1] : "x"}`;
+  }
+  if (img.includes("intel/oneapi-basekit")) {
+    const m = img.match(/:([\d.]+)/);
+    return `oneapi-${m ? m[1] : "x"}`;
+  }
+  return img.replace(/.*\//, "").replace(/:/g, "-").replace(/[^A-Za-z0-9.-]/g, "");
+}
+
+// Tag stem for the prebuilt base. Arch is intentionally NOT in the stem:
+// the base is built multi-arch when any consumer needs multi-arch, and
+// single-arch otherwise.
+function tagStem(item) {
+  const lang = langOf(item);
+  if (!lang || !langsWithBase.has(lang)) return null;
+  const ubuntu = item["ubuntu-version"] || "2404";
+  const buildType = item["build-type"] || "cpu";
+  let stem = `${lang}-${buildType}-${ubuntu}`;
+  if (buildType === "cublas" || buildType === "l4t") {
+    stem += `-cuda${item["cuda-major-version"]}.${item["cuda-minor-version"]}`;
+  }
+  const slug = baseImageSlug(item["base-image"]);
+  if (slug) stem += `-${slug}`;
+  return stem;
+}
+
+function prebuiltRef(stem) {
+  if (!stem) return "";
+  const suffix = isPR ? `-pr${prNumber}` : "";
+  return `quay.io/go-skynet/localai-base:${stem}${suffix}`;
+}
+
+// Build-types that actually exercise the SKIP_DRIVERS branch in the base
+// Dockerfile. For everything else (cpu, intel, sycl_*, mps, metal),
+// skip-drivers is a no-op and disagreeing values across consumers are
+// safe to merge.
+const driverBuildTypes = new Set(["vulkan", "cublas", "l4t", "clblas", "hipblas"]);
+
+function effectiveSkipDrivers(item) {
+  if (!driverBuildTypes.has(item["build-type"] || "")) return "false";
+  return String(item["skip-drivers"] ?? "false");
+}
+
+// Build a base entry consumed by base_images.yml. Platforms is the union
+// across all consumers of this stem (multi-arch when any consumer needs
+// it). runs-on is derived from the platforms: arm-native when arm64 is
+// the only arch, ubuntu-latest (with QEMU) otherwise.
+function baseEntryFor(stem, items) {
+  const first = items[0];
+  const platformSet = new Set();
+  for (const it of items) for (const p of platformsOf(it)) platformSet.add(p);
+  const platforms = [...platformSet].sort().join(",");
+  const armOnly = platforms === "linux/arm64";
+  return {
+    "tag-stem": stem,
+    lang: langOf(first),
+    "base-image": first["base-image"],
+    "build-type": first["build-type"] || "",
+    "cuda-major-version": String(first["cuda-major-version"] ?? ""),
+    "cuda-minor-version": String(first["cuda-minor-version"] ?? ""),
+    "ubuntu-version": String(first["ubuntu-version"] ?? "2404"),
+    platforms,
+    "runs-on": armOnly ? "ubuntu-24.04-arm" : "ubuntu-latest",
+    "skip-drivers": effectiveSkipDrivers(first),
+  };
+}
+
+function dedupBases(items) {
+  // Group consumers by tag-stem.
+  const groups = new Map();
+  for (const item of items) {
+    const stem = tagStem(item);
+    if (!stem) continue;
+    if (!groups.has(stem)) groups.set(stem, []);
+    groups.get(stem).push(item);
+  }
+  // Inputs that MUST agree across all consumers of a stem. If they don't,
+  // the script picks one arbitrarily and the others get a wrong base — fail
+  // loudly so the matrix is reconciled.
+  const collisionChecks = [
+    ["base-image", (it) => it["base-image"]],
+    ["skip-drivers", effectiveSkipDrivers],
+  ];
+  const out = [];
+  for (const [stem, consumers] of groups) {
+    for (const [name, getter] of collisionChecks) {
+      const v0 = getter(consumers[0]);
+      for (const c of consumers.slice(1)) {
+        const v = getter(c);
+        if (v !== v0) {
+          throw new Error(
+            `Tag-stem collision for ${stem}: ${name} differs ` +
+            `(${JSON.stringify(v0)} for ${consumers[0]["tag-suffix"]} vs ` +
+            `${JSON.stringify(v)} for ${c["tag-suffix"]}). ` +
+            `Disambiguate by encoding ${name} in tagStem(), or reconcile the matrix entries.`,
+          );
+        }
+      }
+    }
+    out.push(baseEntryFor(stem, consumers));
+  }
+  return out;
+}
+
+// Annotate a backend matrix entry with `base-image-prebuilt` for langs
+// with a prebuilt base recipe; leave others untouched (their Dockerfile
+// runs the inline bootstrap).
+function annotate(item) {
+  const stem = tagStem(item);
+  if (!stem) return item;
+  return { ...item, "base-image-prebuilt": prebuiltRef(stem) };
+}
+
+// Build the deduplicated list of backend names → path prefixes from all
+// matrix entries (linux + darwin). Used for per-backend boolean outputs
+// consumed by test-extra.yml.
 function getAllBackendPaths() {
   const paths = new Map();
   for (const item of includes) {
     const p = inferBackendPath(item);
-    if (p && !paths.has(item.backend)) {
-      paths.set(item.backend, p);
-    }
+    if (p && !paths.has(item.backend)) paths.set(item.backend, p);
   }
   for (const item of includesDarwin) {
     const p = inferBackendPathDarwin(item);
-    if (p && !paths.has(item.backend)) {
-      paths.set(item.backend, p);
-    }
+    if (p && !paths.has(item.backend)) paths.set(item.backend, p);
   }
   return paths;
 }
 
 const allBackendPaths = getAllBackendPaths();
 
-// Non-PR events: output run-all=true and all backends as true
-if (!event.pull_request) {
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `run-all=true\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends=true\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends-darwin=true\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix=${JSON.stringify({ include: includes })}\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix-darwin=${JSON.stringify({ include: includesDarwin })}\n`);
+function writeOutput(key, value) {
+  fs.appendFileSync(process.env.GITHUB_OUTPUT, `${key}=${value}\n`);
+}
+
+function emit(filtered, filteredDarwin, runAll) {
+  const annotated = filtered.map(annotate);
+  const bases = dedupBases(filtered);
+  writeOutput("run-all", runAll);
+  writeOutput("has-backends", annotated.length > 0 ? "true" : "false");
+  writeOutput("has-backends-darwin", filteredDarwin.length > 0 ? "true" : "false");
+  writeOutput("has-bases", bases.length > 0 ? "true" : "false");
+  writeOutput("matrix", JSON.stringify({ include: annotated }));
+  writeOutput("matrix-darwin", JSON.stringify({ include: filteredDarwin }));
+  writeOutput("bases-matrix", JSON.stringify({ include: bases }));
+}
+
+// ---------- master mode (push events) ----------
+
+if (!isPR) {
+  emit(includes, includesDarwin, "true");
   for (const backend of allBackendPaths.keys()) {
-    fs.appendFileSync(process.env.GITHUB_OUTPUT, `${backend}=true\n`);
+    writeOutput(backend, "true");
   }
   process.exit(0);
 }
 
-// PR context
-const prNumber = event.pull_request.number;
+// ---------- PR mode ----------
+
 const repo = event.repository.name;
 const owner = event.repository.owner.login;
-
-const token = process.env.GITHUB_TOKEN;
-const octokit = new Octokit({ auth: token });
+const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });
 
 async function getChangedFiles() {
   let files = [];
   let page = 1;
   while (true) {
-    const res = await octokit.request('GET /repos/{owner}/{repo}/pulls/{pull_number}/files', {
-      owner,
-      repo,
-      pull_number: prNumber,
-      per_page: 100,
-      page
+    const res = await octokit.request("GET /repos/{owner}/{repo}/pulls/{pull_number}/files", {
+      owner, repo, pull_number: prNumber, per_page: 100, page,
     });
     files = files.concat(res.data.map(f => f.filename));
     if (res.data.length < 100) break;
@@ -107,35 +314,55 @@ async function getChangedFiles() {
 
 (async () => {
   const changedFiles = await getChangedFiles();
-
   console.log("Changed files:", changedFiles);
 
-  const filtered = includes.filter(item => {
-    const backendPath = inferBackendPath(item);
-    if (!backendPath) return false;
-    return changedFiles.some(file => file.startsWith(backendPath));
-  });
+  // Source-driven filter: backend dir touched.
+  const sourceTriggered = new Set();
+  for (const item of includes) {
+    const p = inferBackendPath(item);
+    if (p && changedFiles.some(f => f.startsWith(p))) {
+      sourceTriggered.add(item);
+    }
+  }
 
-  const filteredDarwin = includesDarwin.filter(item => {
-    const backendPath = inferBackendPathDarwin(item);
-    return changedFiles.some(file => file.startsWith(backendPath));
-  })
+  // Base-driven filter: any matrix entry whose lang has a prebuilt base
+  // recipe AND that recipe (or its scaffolding) was touched. We want one
+  // canary per (lang × build-type × arch × cuda × ubuntu) so all bases get
+  // exercised, not 234 entries.
+  const baseTriggered = new Set();
+  const baseTriggerHits = new Set(changedFiles.filter(f => baseTriggerFiles.has(f)));
+  if (baseTriggerHits.size > 0) {
+    const seenStems = new Set();
+    for (const item of includes) {
+      const stem = tagStem(item);
+      if (!stem) continue;
+      const select = langTriggerSelector[langOf(item)];
+      if (select && !select(item)) continue;
+      // Only canary entries for langs whose recipe/scaffolding actually changed.
+      const hits = [...baseTriggerHits];
+      const recipePath = `.docker/bases/Dockerfile.${langOf(item)}`;
+      const langTouched =
+        hits.includes(recipePath) ||
+        // any non-recipe trigger touches all langs
+        hits.some(h => h !== recipePath && !h.startsWith(".docker/bases/Dockerfile."));
+      if (!langTouched) continue;
+      if (seenStems.has(stem)) continue;
+      seenStems.add(stem);
+      baseTriggered.add(item);
+    }
+  }
 
-  console.log("Filtered files:", filtered);
-  console.log("Filtered files Darwin:", filteredDarwin);
+  const filtered = includes.filter(item => sourceTriggered.has(item) || baseTriggered.has(item));
+  const filteredDarwin = includesDarwin.filter(item => {
+    const p = inferBackendPathDarwin(item);
+    return changedFiles.some(f => f.startsWith(p));
+  });
 
-  const hasBackends = filtered.length > 0 ? 'true' : 'false';
-  const hasBackendsDarwin = filteredDarwin.length > 0 ? 'true' : 'false';
-  console.log("Has backends?:", hasBackends);
-  console.log("Has Darwin backends?:", hasBackendsDarwin);
+  console.log("Filtered linux:", filtered.length, "(source:", sourceTriggered.size, "base canaries:", baseTriggered.size, ")");
+  console.log("Filtered darwin:", filteredDarwin.length);
 
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `run-all=false\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends=${hasBackends}\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends-darwin=${hasBackendsDarwin}\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix=${JSON.stringify({ include: filtered })}\n`);
-  fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix-darwin=${JSON.stringify({ include: filteredDarwin })}\n`);
+  emit(filtered, filteredDarwin, "false");
 
-  // Per-backend boolean outputs
   for (const [backend, pathPrefix] of allBackendPaths) {
     let changed = changedFiles.some(file => file.startsWith(pathPrefix));
     // turboquant reuses backend/cpp/llama-cpp sources via a thin wrapper;
@@ -143,6 +370,6 @@ async function getChangedFiles() {
     if (backend === "turboquant" && !changed) {
       changed = changedFiles.some(file => file.startsWith("backend/cpp/llama-cpp/"));
     }
-    fs.appendFileSync(process.env.GITHUB_OUTPUT, `${backend}=${changed ? 'true' : 'false'}\n`);
+    writeOutput(backend, changed ? "true" : "false");
   }
 })();