Skip to content

ci: per-board build reusables, host-side test split, path-filter unification#2759

Open
vpetersson wants to merge 25 commits intomasterfrom
audit-ci
Open

ci: per-board build reusables, host-side test split, path-filter unification#2759
vpetersson wants to merge 25 commits intomasterfrom
audit-ci

Conversation

@vpetersson
Copy link
Copy Markdown
Contributor

@vpetersson vpetersson commented Apr 27, 2026

Summary

Single big-bang CI overhaul. Replaces the flat 7×4 buildx matrix + sequential publish-latest gate with per-board build-and-publish units, consolidates the small standalone lint workflows, and unifies the path filter into a single source of truth. Net workflow count stays at 11 (5 deleted, 5 added, 1 was already the merged-test refactor in this branch).

What changes

New entry workflow + reusables

  • .github/workflows/ci.yaml — single entry: triggers, path filter (one source of truth instead of duplicated across five files), concurrency, and the plan→lint→test→build→balena fan-out. PRs get an x86-only smoke build (no push); master push gets all 7 boards with push.
  • .github/workflows/_lint.yaml — ruff, mypy, eslint/prettier, ansible-lint, actionlint as parallel jobs.
  • .github/workflows/_test-host.yaml — typescript, python unit, python integration tests; preserves anthias.conf seeding, native build deps, and the nick-fields/retry wrapper around integration tests. Codecov upload guarded by if: env.CODECOV_TOKEN != '' for fork PRs.
  • .github/workflows/_build-board.yaml — per-board build + inline mirror-latest. PR/master cache isolation comes for free from GHA's type=gha cache backend, which scopes by branch — PR smoke builds cannot read or overwrite master's cache. Mirror retag adds small randomized jitter to the existing 5-attempt exponential backoff so 7 boards retagging in parallel don't synchronize their 429s against Docker Hub.
  • .github/workflows/_balena-board.yaml — per-board balena deploy. Preflights the board's <short-hash> tag in GHCR; cleanly skips if missing. The registry is the source of truth used to gate per-matrix-leg dependency that GHA can't natively express.

Deletions

  • docker-build.yaml, python-checks.yaml, javascript-lint.yaml, ansible-lint.yaml, lint-workflows.yml — all absorbed into the new entry + reusables.

Earlier in this branch (still part of this PR)

  • Tests no longer run twice. docker-test.yaml previously triggered standalone on push/PR AND was reused via workflow_call from docker-build.yaml's run-tests job — every push paid for two full TS + Python suites. Collapsed into top-level jobs.
  • TS + Python unit tests now run directly on the runner (host bun, host uv) for fail-fast feedback. Selenium integration tests stay in Docker.
  • Path filters switched from '**' + drift-prone !exclusions to whitelists.
  • concurrency: cancel-in-progress on PRs only (never on master) added to every push/PR workflow.
  • Merged python-lint.yaml + python-mypy.yamlpython-checks.yaml with a new setup-python-uv composite action — that action is reused by all the new reusables.

SonarCloud cleanup

  • Initial round used secrets: inherit on every reusable call site; SonarCloud rule S7635 ("only pass required secrets") flagged seven MEDIUM hotspots. Replaced with explicit per-secret pass-through (CODECOV_TOKEN, DOCKER_USERNAME/DOCKER_PASSWORD, BALENA_TOKEN) — least-privilege.
  • S5443 (publicly writable directory) flagged the buildx local cache under /tmp/.buildx-cache. In CI we use type=gha so that path was unused dead code anyway; for local dev runs the cache moved to ~/.cache/anthias-buildx/<board_label> to avoid sharing a world-writable path with other users on multi-user systems.

Trade-off (per-board independence)

Each board's latest-* advances independently. A flaky pi3 doesn't block pi5. A mixed-board fleet may briefly run different SHAs across boards. Within a single board, the four services (server/celery/redis/viewer) still advance coherently because mirror-latest needs the entire 4-service matrix to pass before any of that board's latest-* tags get retagged.

Risks

  • Docker Hub 429s with parallel retag: today's sequential retag (56 inspects + 56 creates serialized) fans out across 7 runners doing 8 ops each. Mitigations: GHCR-first dual-registry order is unchanged; jitter added to the 5-attempt backoff (+ RANDOM % 3); monitor first 5 master runs. If 429s recur, widen jitter (e.g. RANDOM % 8) or stagger per-board mirror jobs.
  • PR x86 smoke cache poisoning master: not a concern with type=gha — GHA's cache service scopes entries by branch ref, so PR cache writes are invisible to master and vice versa. The on-disk cache_scope (<board_label>-<service>) does not need an explicit pr|push segment.
  • Forks on PR: secrets aren't forwarded for forked PRs. Codecov upload guarded by if: env.CODECOV_TOKEN != ''. Docker Hub login + --push already gated by if: inputs.push (false on PRs).

⚠️ Before merging — branch-protection rename

Required-check names change. Run gh api repos/Screenly/Anthias/branches/master/protection to list current names, then update:

Old New
Run Unit Tests / run-typescript-tests CI / Test / typescript-tests
Run Unit Tests / run-python-tests CI / Test / python-unit-tests and CI / Test / python-integration-tests
Run Python Linter / Run mypy CI / Lint / ruff and CI / Lint / mypy
Run JavaScript Linter and Formatter / lint CI / Lint / js-lint
Ansible Lint / build CI / Lint / ansible-lint
Lint GitHub Workflows / Run Linter (1.7.7) CI / Lint / actionlint
Docker Image Build / buildx (...) CI / Build / Build <board> <service> (matrix expansion)
Docker Image Build / publish-latest (gone — replaced by per-board mirror-latest)
Docker Image Build / balena (...) CI / Balena <board>

If branch protection isn't updated, required checks will be missing post-merge. Recommend doing this in the same window as the merge.

Test plan

  • PR run: CI / plan, CI / Lint / *, CI / Test / *, and CI / Build / Build x86 server (and the other 3 services) all pass on this PR. (PR-side build is x86-only, push: false smoke build.)
  • PR run: confirm only ONE pipeline fires now (the old Docker Image Build and friends are deleted).
  • PR run: codecov upload either fires (this is a same-repo PR with CODECOV_TOKEN available) or is cleanly skipped.
  • Force-push a no-op change to confirm the concurrency block cancels the in-flight run.
  • After merge: branch protection updated (see table above).
  • After merge: first master push triggers the full 7-board build, all 4 services per board, and mirror-latest per board pushes both <short-hash>-<board> and latest-<board> to both ghcr.io/screenly/anthias-* and screenly/anthias-*.
  • After merge: balena-pi1..balena-pi5 deploy succeeds.
  • After merge: confirm packages: write permission is scoped only to build and mirror-latest jobs in run metadata.
  • After merge: monitor first 5 master runs for Docker Hub 429s in the mirror-latest step. If observed twice, increase jitter or stagger the per-board mirror jobs.
  • After merge: deliberately introduce a single-board failure on a branch and verify the other 6 boards still advance their latest-* independently — validates the per-board independence design.
  • Wall time comparable to or better than the old pipeline (no >10% regression).

🤖 Generated with Claude Code

The .github/workflows/ directory had grown 14 workflows with three
overlapping problems:

  1. Tests ran twice on every push: docker-test.yaml triggered
     standalone on push/PR AND was reused via workflow_call from
     docker-build.yaml's run-tests job, so the full TS + Python
     suite executed twice.
  2. Tests ran inside Docker even for cheap unit checks, so the
     first failure signal lagged the image build by several
     minutes. Devs want fail-fast feedback.
  3. docker-build / docker-test / generate-openapi-schema all used
     the '**' + long !exclusion path-filter pattern and had drifted
     against each other. docker-build also excluded
     '!.github/workflows/docker-test.yml' (file is .yaml — no-op).

Changes:

  - Collapse docker-test.yaml + test-runner.yml into docker-build.yaml
    as three top-level jobs: typescript-tests + python-unit-tests run
    on the host runner (apt-install the cec/netifaces native deps,
    seed ~/.anthias/anthias.conf, run bun test / manage.py test
    --exclude-tag=integration directly); python-integration-tests
    keeps the Docker flow for the Selenium WebTest class. buildx +
    publish-latest + balena gate on all three test jobs and on
    push-to-master only, so PRs stop after the test stage.
  - Add a setup-python-uv composite action wrapping
    setup-python + setup-uv + uv venv + uv pip install. Used by 5
    workflows so version bumps land in one place.
  - Merge python-lint.yaml + python-mypy.yaml into python-checks.yaml
    (two parallel jobs sharing the composite action).
  - Replace '**' + exclusions with whitelist path filters in
    docker-build and generate-openapi-schema. The two halves of each
    push/pull_request block are inlined verbatim because actionlint
    doesn't follow YAML anchors.
  - Add concurrency: cancel-in-progress on PRs only, never on master,
    to every push/PR workflow. Biggest cost win on the 35-job buildx
    matrix.
  - Narrow lint-workflows.yml from '**/*.yml' to .github/workflows/**
    + .github/actions/**. Add source-extension path filter to
    codeql-analysis.yaml (weekly schedule remains the catchall).
  - Drop the run-tests gate from sbom.yaml — SBOMs come from
    lockfiles; tests don't validate lockfile resolution.
  - Pin ubuntu-24.04 on the two stragglers using ubuntu-latest.
  - Update README badges to point at the new docker-build.yaml and
    python-checks.yaml workflows.

Net 14 → 12 workflow files; +307 / -376 lines.

Branch protection required-check names will change once this lands;
update them before merging or required checks will go missing
(Run Unit Tests / * → Docker Image Build / *; Run Python Linter +
Run mypy → Python Checks / ruff + Python Checks / mypy).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vpetersson vpetersson requested a review from a team as a code owner April 27, 2026 16:07
vpetersson and others added 2 commits April 28, 2026 12:00
# Conflicts:
#	.github/workflows/docker-build.yaml
Adds five new workflow files that together replace the flat 7×4
buildx matrix + sequential publish-latest gate with per-board
build-and-publish units, and consolidate the small standalone lint
workflows into one reusable.

- ci.yaml — single entry: triggers, path filter (one source of
  truth instead of duplicated across five files), concurrency, and
  the plan→lint→test→build→balena fan-out. PRs get an x86-only
  smoke build (no push); master push gets all 7 boards with push.
- _lint.yaml — ruff, mypy, eslint/prettier, ansible-lint,
  actionlint as parallel jobs.
- _test-host.yaml — typescript, python unit, python integration
  tests; preserves anthias.conf seeding, native build deps, and
  the nick-fields/retry wrapper around integration tests.
- _build-board.yaml — per-board build + inline mirror-latest.
  Cache keys split by pr|push so PR smoke builds can't poison
  master cache. Mirror retag adds small randomized jitter to the
  existing 5-attempt exponential backoff so 7 boards retagging in
  parallel don't synchronize their 429s.
- _balena-board.yaml — per-board balena deploy. Preflights the
  board's <short-hash> tag in GHCR; cleanly skips if missing. The
  registry is the source of truth used to gate per-matrix-leg
  dependency that GHA can't natively express.

The old workflows (docker-build.yaml, python-checks.yaml,
javascript-lint.yaml, ansible-lint.yaml, lint-workflows.yml) stay
in place. ci.yaml runs alongside them — both pipelines will fire
on master push and PR. Validate the new layout via
workflow_dispatch from master and via this PR's own runs, then a
follow-up PR deletes the old files once we're confident.

Trade-off vs. the old global publish-latest gate: per-board
independence means a flaky pi3 doesn't block pi5 from advancing
its latest-pi5 tag. A mixed-board fleet may briefly run different
SHAs across boards. Within a single board, the four services
(server/celery/redis/viewer) still advance coherently because
mirror-latest needs the entire 4-service matrix to pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vpetersson vpetersson requested a review from Copilot April 28, 2026 12:42
@vpetersson vpetersson changed the title ci: dedupe test runs, move unit tests to host, unify path filters ci: per-board build reusables + host-side test split + path-filter unification Apr 28, 2026
Two follow-ups on top of the per-board reusables commit:

1. Replace `secrets: inherit` on every reusable workflow call site
   in ci.yaml with explicit secret pass-through. SonarCloud rule
   S7635 (githubactions:S7635, "only pass required secrets") was
   flagging the seven `inherit` lines as MEDIUM hotspots — the
   reusable workflows already declare exactly which secrets they
   accept (CODECOV_TOKEN for _test-host, DOCKER_USERNAME/PASSWORD
   for _build-board, BALENA_TOKEN for _balena-board), so least-
   privilege pass-through is straightforward and quiets Sonar.

2. Delete the five workflow files now superseded by ci.yaml +
   reusables: docker-build.yaml, python-checks.yaml,
   javascript-lint.yaml, ansible-lint.yaml, lint-workflows.yml.
   Going single-PR rather than the originally-planned two-phase
   rollout — validation happens via this PR's own runs and via
   workflow_dispatch on master post-merge.

Branch protection required-check names will need updating after
merge. Old `Docker Image Build / *` and `Python Checks / *` check
names become `CI / *`. See PR description for the rename table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vpetersson vpetersson changed the title ci: per-board build reusables + host-side test split + path-filter unification ci: per-board build reusables, host-side test split, path-filter unification Apr 28, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the GitHub Actions CI setup to reduce duplicated test execution, speed up feedback by moving unit tests to the host runner, and standardize workflow path filtering/concurrency behavior across pipelines.

Changes:

  • Removes the standalone/reusable Docker test workflows and introduces host-run TypeScript + Python unit tests with Docker-based integration tests.
  • Adds a shared composite action (setup-python-uv) and consolidates Python lint+mypy into a single workflow, plus introduces reusable _lint/_test-host/_build-board/_balena-board workflows.
  • Reworks workflow triggers (path whitelists) and adds PR-only cancel-in-progress concurrency across multiple workflows.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
README.md Updates CI badges to new workflow names/URLs.
.github/workflows/test-runner.yml Deleted reusable test runner workflow (Docker-based).
.github/workflows/docker-test.yaml Deleted standalone “Run Unit Tests” workflow.
.github/workflows/docker-build.yaml Adds host test jobs + gates build/publish on master push; replaces prior reusable test invocation.
.github/workflows/python-lint.yaml Deleted (merged into python-checks).
.github/workflows/python-mypy.yaml Deleted (merged into python-checks).
.github/workflows/python-checks.yaml New combined Ruff + mypy workflow using setup-python-uv.
.github/workflows/lint-workflows.yml Narrows path filters, adds concurrency, and pins runner version.
.github/workflows/javascript-lint.yaml Adds PR-only cancel-in-progress concurrency.
.github/workflows/ansible-lint.yaml Adopts setup-python-uv, adds concurrency, updates path filters.
.github/workflows/generate-openapi-schema.yml Replaces broad path filters with a whitelist; adopts setup-python-uv; adds concurrency.
.github/workflows/codeql-analysis.yaml Adds language-extension path filters + concurrency (schedule remains catch-all).
.github/workflows/sbom.yaml Removes dependency on tests; adds workflow self-trigger and concurrency.
.github/workflows/deploy-website.yaml Updates runner to ubuntu-24.04.
.github/workflows/build-webview.yaml Adds PR-only cancel-in-progress concurrency.
.github/workflows/ci.yaml Adds new umbrella CI entry point orchestrating reusable lint/test/build/balena workflows.
.github/workflows/_lint.yaml New reusable workflow consolidating all lint passes.
.github/workflows/_test-host.yaml New reusable workflow for host tests + Docker integration tests.
.github/workflows/_build-board.yaml New reusable per-board build+publish workflow.
.github/workflows/_balena-board.yaml New reusable per-board balena deploy workflow with registry preflight.
.github/actions/setup-python-uv/action.yml New composite action to standardize Python+uv setup and dependency installation.
Comments suppressed due to low confidence (2)

.github/workflows/docker-build.yaml:138

  • This job copies ansible/roles/anthias/files/anthias.conf, but the workflow’s path whitelist does not include ansible/**. A PR that only touches that file (or moves it) would not trigger this workflow, and could break CI/config seeding once other changes later cause CI to run. Add the relevant ansible/** path(s) to the whitelist or avoid depending on an Ansible-managed file here (e.g., write a minimal config inline as in the mypy job).
    .github/workflows/docker-build.yaml:148
  • The Codecov upload here is unconditional. For PRs from forks, secrets.CODECOV_TOKEN will be unavailable and this can cause the upload step to fail or produce noisy errors. Consider guarding this step the same way as _test-host.yaml (e.g., only run when the token is present) to keep fork PR CI green.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/_test-host.yaml
Comment thread .github/workflows/ci.yaml Outdated
Comment thread .github/workflows/_build-board.yaml
Comment thread .github/workflows/_balena-board.yaml Outdated
vpetersson and others added 4 commits April 28, 2026 12:51
…e ENRICH

- sbomify/github-action → sbomify/sbomify-action. The repo was
  renamed by upstream; GitHub redirects, so the old name still
  works, but pinning to the canonical name avoids the redirect
  hop and stops the URL from looking like 404 fodder when
  spot-checked.
- Replace `# master` with `# v26.2.0` next to the SHA pin. The
  pinned SHA (ac8b0d4...) is the v26.2.0 release tag, so this is a
  comment fix only — no behavior change.
- Add ENRICH: true. The README's quick-start example explicitly
  recommends it; pulls package metadata (license, supplier, full
  version info) from PyPI, deps.dev, etc. so the published SBOM
  has full provenance instead of just lockfile entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a leading `version` job that resolves the version once and
both SBOM jobs consume via `needs:`. The resolution is `git
describe --tags --exact-match HEAD || git rev-parse --short HEAD`
— so a tag-push (e.g. v0.20.5) lands in the SBOM as the tag,
and a regular master push lands as the 7-char short SHA. The
checkout uses `fetch-tags: true` so tags are visible without
paying for full history.

Both SBOMs now share the same resolved version string per run, so
the JS and Python halves of the same build always agree on which
revision they describe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SonarCloud (8 BLOCKER, rule githubactions:S7630):
Reusable-workflow inputs are user-controllable, so SonarCloud
flags any `${{ inputs.X }}` interpolated directly into a `run:`
block as a script-injection vector. Standard fix is to hop the
value through a step-level `env:` and reference it as a shell
variable.

  - _balena-board.yaml: `inputs.board` in the "Set Docker tag"
    and "Prepare Balena file" run blocks now go through
    `INPUT_BOARD`.
  - _build-board.yaml: `inputs.board`, `matrix.service`, and
    `inputs.push` in the "Inspect cache before build", "Build
    container", and "Inspect cache after build" run blocks now
    go through `INPUT_BOARD` / `MATRIX_SERVICE` / `INPUT_PUSH`.
  - setup-python-uv/action.yml: `inputs.group` in the install
    step now goes through `INPUT_GROUP`.

Copilot review feedback:

  - _test-host.yaml: the `if: env.CODECOV_TOKEN != ''` guard on
    the codecov upload step was reading from job/workflow env,
    not the step-level env where CODECOV_TOKEN was defined — so
    it always evaluated false and coverage never uploaded. Lift
    CODECOV_TOKEN to the job env (the step-level env was redundant
    once it's at job level — codecov-action picks it up from there
    too) so the guard works. Note: the `secrets` context isn't
    available in step `if:` per actionlint, so the alternative
    `if: secrets.CODECOV_TOKEN != ''` shape doesn't compile.

  - ci.yaml: the header comment claimed it "replaces" the five
    deleted workflows in present tense, but those workflows are
    already deleted in this PR. Reworded to past tense to match
    reality.

  - _balena-board.yaml: added an explicit `docker/setup-buildx-
    action` step before the preflight `docker buildx imagetools
    inspect` so the workflow doesn't rely on the runner image's
    preinstalled buildx — matches what every other docker-using
    job in this PR already does.

  - _build-board.yaml: copilot suggested
    `packages: ${{ inputs.push && 'write' || 'read' }}` to drop
    permission scope on PR smoke builds. GHA's permissions block
    doesn't accept expressions for permission values (actionlint
    rejects it: "available values are read, write or none"), so
    this isn't viable as-is. Documented the trade-off in a comment
    instead — `packages: write` stays at job level, but no login
    or push fires on PR builds (gated by `if: inputs.push`), and
    fork-PR tokens are stripped to read by GHA regardless.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new `CI` workflow has been failing at startup since it landed
— five 1-second `startup_failure` runs, no jobs registered, no
logs. The trigger is matrix-of-uses with a JSON boolean: `plan`
emits `[{"board":"x86","push":false}]`, the `build` job fans out
via `matrix.include: ${{ fromJSON(...) }}`, and passes
`push: ${{ matrix.push }}` to `_build-board.yaml`. The reusable's
`inputs.push` was declared `type: boolean`, and that boolean
arriving via the matrix→template→with: chain trips GHA's input
type validation.

Switching `inputs.push` to `type: string` and threading
`'true'`/`'false'` end-to-end avoids the coercion entirely:

- `_build-board.yaml`: `inputs.push` type boolean → string;
  `if: inputs.push` → `if: inputs.push == 'true'` (3 sites);
  cache key ternary `inputs.push && 'push' || 'pr'` →
  `inputs.push == 'true' && 'push' || 'pr'`. Note that with a
  string input both 'true' and 'false' are truthy, so the explicit
  comparison is required.
- `ci.yaml`: `plan` job's emitted JSON now has `"push":"true"`
  and `"push":"false"` (string-quoted). Comment block records the
  trade-off so the next person isn't tempted to change it back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vpetersson vpetersson requested a review from Copilot April 28, 2026 14:49
vpetersson and others added 15 commits April 28, 2026 14:58
Replaces five near-identical balena-pi{1..5} reusable-workflow calls
with a single matrix job over [pi1..pi5]. fail-fast: false preserves
per-board independence — a flaky pi3 deploy doesn't stop pi5.

Required-check name shape changes (e.g. "Balena pi1" → "Balena /
balena (pi1)"); branch-protection rename table needs to reflect the
new pattern before flipping protection post-merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the registry-preflight workaround with a real GHA needs
chain. The old top-level `balena` matrix in ci.yaml waited for the
entire `build` job (all 7 boards) to finish, then relied on each
leg inspecting GHCR for its own <short-hash> tag to decide whether
to deploy — the registry was acting as a fanned-out source of
truth for "did my matching build push succeed?". That's racy
against registry latency and only papers over GHA's lack of
cross-matrix-leg dependencies.

Move balena into `_build-board.yaml` instead, where it can use a
plain `needs: build`: GHA skips the deploy automatically if any
of the four services in that board's build matrix failed. Each
board is its own reusable-workflow invocation, so per-board
independence is preserved (pi3 failure doesn't touch pi5).

Also fix the `mirror-latest` `if: inputs.push` bug noted in the
last review — `inputs.push` is a string, and a non-empty string
is always truthy in GHA expressions, so 'false' would have fired
the job. Now `if: inputs.push == 'true'` matches the gating used
on the build job's login/push steps.

Drop the buildx setup + preflight from `_balena-board.yaml`; the
dependency is structural now and the registry probe is dead code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous setup paired buildkit's `type=local` cache with
`actions/cache`, but the two were keyed on different paths:

  workflow path:  /tmp/.buildx-cache/<inputs.board>-<service>
  builder path:   /tmp/.buildx-cache/<board>      (or <board>-64
                  for pi4 arm64) — the service is never appended

The paths never overlapped, so every restore got an empty dir,
every save uploaded an empty dir, and every build paid the full
cold-cache cost. Diagnostic `Inspect cache before/after build`
steps would have shown "No such file or directory" on every run
but the failures were swallowed by `|| true`.

Switch to buildkit's GHA-native cache backend (`type=gha`):

  - setup-buildx-action injects ACTIONS_CACHE_URL +
    ACTIONS_RUNTIME_TOKEN into the builder daemon, which is
    enough for `type=gha` to authenticate against GHA's cache
    service end-to-end.
  - Name the builder `multiarch-builder` directly via
    setup-buildx-action so the Python builder still finds it
    under that name, and drop the explicit `docker buildx
    create` step that was building a second, auth-less builder.
  - Drop the `actions/cache` step and both inspect probes; the
    cache transport is now buildkit ↔ GHA directly.

`tools/image_builder/__main__.py` picks the backend at runtime:
`type=gha` when GITHUB_ACTIONS=true, `type=local` otherwise.
Local dev builds keep their previous behaviour and do still
benefit from on-disk cache reuse across invocations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every audit-ci CI run since the workflow landed has been a
1-second startup_failure with no jobs and no logs. The most likely
remaining cause is that matrix-of-uses + dynamic matrix only
reliably works in the form GitHub's own dynamic-matrix docs use:

    strategy:
      matrix: ${{ fromJSON(needs.plan.outputs.matrix) }}

…where the `plan` job emits a JSON *object* (with an `include`
key) and the consumer replaces the whole strategy matrix from
that object. We were using the variant that assigns to
`matrix.include` only, with `plan` emitting a bare JSON list:

    matrix:
      include: ${{ fromJSON(needs.plan.outputs.matrix) }}

That shape isn't anywhere in GitHub's dynamic-matrix
documentation. Switching to the documented form is the cheapest
change that could explain the startup_failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every audit-ci CI run since the workflow landed has been a 1-second
startup_failure. The actual error (only visible on the run's HTML
page, not the API) was:

  The workflow is not valid.
  .github/workflows/ci.yaml (Line: 154, Col: 3):
  The nested job 'build' is requesting 'packages: write',
  but is only allowed 'packages: none'.

  .github/workflows/ci.yaml (Line: 154, Col: 3):
  The nested job 'mirror-latest' is requesting 'packages: write',
  but is only allowed 'packages: none'.

ci.yaml declared `permissions: { contents: read }` at the
workflow level. A reusable workflow can only narrow — never
widen — the permissions of its caller, so the nested `build`
and `mirror-latest` jobs (which need `packages: write` for
GHCR login + push) were forced down to `packages: none` and
GHA rejected the whole workflow at startup.

Grant `packages: write` on the `build` job in ci.yaml only,
which keeps plan/lint/test on the tighter `contents: read`
default.

The two preceding "fixes" on this branch (matrix→input boolean
coercion, dynamic-matrix shape) were misdiagnoses; the actual
error message had been hidden behind a thin "Startup failure"
header on the run page the whole time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Host-side unit tests fail to import the viewer package because
viewer/__init__.py imports pydbus, and pydbus does
\`from gi.repository import GLib\` at import time. PyGObject
(\`gi\`) ships as the apt package \`python3-gi\` inside the
Anthias container images but isn't installed on the GitHub
Actions runner where unit tests now run after the host-side
test split. Compiling PyGObject from source via pip on every
CI run just to satisfy an import we never exercise (no host
unit test talks to D-Bus) isn't worth the time.

Stub the gi submodules pydbus references in tests/__init__.py,
guarded by a real \`import gi\` first — when gi is genuinely
available (Docker integration tests), nothing is stubbed and
the real package stays in use.

The five submodules (\`gi\`, \`gi.repository\`,
\`gi.repository.GLib/Gio/GObject\`) are registered explicitly
in sys.modules. A single mock at \`gi.repository\` isn't enough
because Python treats a MagicMock as a leaf module, not a
package, and pydbus's
\`from gi.repository.GLib import Variant\` fails on the package
lookup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups from the gi-stub commit:

1. mypy was failing on `import gi` in tests/__init__.py with
   "Cannot find implementation or library stub for module named
   'gi'". Add `gi` and `gi.repository` to the existing
   ignore_missing_imports list in pyproject.toml — that list
   already covers `pydbus` etc. for the same reason (no upstream
   stubs).

2. Host unit tests fail in tearDown of
   api/tests/test_v1_endpoints.py with FileNotFoundError on
   `~/anthias_assets/`. AnthiasSettings resolves `assetdir` to
   $HOME/anthias_assets; container images have that path
   pre-created, the host runner doesn't. Materialize it in the
   same step that seeds anthias.conf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
\`lib.utils.connect_to_redis()\` hardcodes \`host='redis'\` and is
called at module-load time inside \`lib.github\` — which
\`tests.test_updates\` imports. Inside container images redis is a
sibling compose service resolved via Docker DNS; on the host
runner that hostname doesn't exist, so the test fails with
"Error -3 connecting to redis:6379".

Install \`redis-server\` via apt in the same step that installs
the cec / netifaces build deps, start it, and add a
\`127.0.0.1 redis\` entry to /etc/hosts so the unmodified
\`connect_to_redis()\` resolves to the local instance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first audit-ci CI run after switching to \`cache-to=type=gha\`
finished without a single cache entry written. The
\`[cache] type=gha scope=...\` debug print confirmed our Python
code had selected the right backend, but \`gh api
repos/.../actions/caches\` showed zero buildx-related entries.

Root cause: the buildkit daemon container that
setup-buildx-action launches needs \`ACTIONS_CACHE_URL\` and
\`ACTIONS_RUNTIME_TOKEN\` to authenticate against GHA's cache
service. GHA exposes these to actions via \`process.env\` but
strips them from shell-step env. setup-buildx-action's auto-
forward of these vars (via \`--driver-opt env.ACTIONS_*\`) only
fires when they're already present in the step env — which they
weren't.

Re-export them via \`actions/github-script\` (an official
GitHub action) before \`Set up Docker Buildx\`. From there
setup-buildx-action picks them up and the buildkit container
gets them via driver-opts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first audit-ci run with working type=gha auth pushed cache
data to GHA but the cache service returned a transient 400 with
HTML body ("Our services aren't available right now") at
\`#34 sending cache export\`, which buildkit propagated as a
build failure. Cache export is an optimization; a flake in
GHA's cache backend should never fail the build itself.

Add \`ignore-error=true\` to cache_to so buildkit downgrades
the export error to a warning. Cache import (cache_from) doesn't
have an equivalent flag — it already warns and continues on
miss in buildkit's normal path, so no change needed there.

Also drop the [cache] debug print added in the previous commit;
the GHA-cache plumbing is verified end-to-end now (a successful
\`#34 preparing build cache for export\` step appeared in the
last run logs before the 400).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SonarCloud S5443 (publicly writable directory) flagged the buildx
cache path under /tmp/.buildx-cache. In CI we use the type=gha
backend so cache_dir was unused there anyway; for local dev runs,
moving it to ~/.cache/anthias-buildx/<board_label> avoids sharing a
world-writable path with other users on multi-user systems.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
That file was removed in a9be1d3 ("chore: drop pyzmq + libzmq,
finalize ZMQ→Redis migration"); the whitelist entries in
generate-openapi-schema.yml no longer match anything.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
vpetersson and others added 2 commits April 28, 2026 21:45
Cuts the duplicate webview build that was running on every
webview/** change to master. The master-push run produced only
ephemeral GHA artifacts (no release attached), while the WebView-v*
tag push that follows shortly after compiled the same code again
and was the only run that actually shipped artifacts. PR runs cover
pre-merge validation, tag pushes cover release builds — master
pushes had nothing to add.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants