ci: per-board build reusables, host-side test split, path-filter unification#2759
Open
vpetersson wants to merge 25 commits intomasterfrom
Open
ci: per-board build reusables, host-side test split, path-filter unification#2759vpetersson wants to merge 25 commits intomasterfrom
vpetersson wants to merge 25 commits intomasterfrom
Conversation
The .github/workflows/ directory had grown 14 workflows with three
overlapping problems:
1. Tests ran twice on every push: docker-test.yaml triggered
standalone on push/PR AND was reused via workflow_call from
docker-build.yaml's run-tests job, so the full TS + Python
suite executed twice.
2. Tests ran inside Docker even for cheap unit checks, so the
first failure signal lagged the image build by several
minutes. Devs want fail-fast feedback.
3. docker-build / docker-test / generate-openapi-schema all used
the '**' + long !exclusion path-filter pattern and had drifted
against each other. docker-build also excluded
'!.github/workflows/docker-test.yml' (file is .yaml — no-op).
Changes:
- Collapse docker-test.yaml + test-runner.yml into docker-build.yaml
as three top-level jobs: typescript-tests + python-unit-tests run
on the host runner (apt-install the cec/netifaces native deps,
seed ~/.anthias/anthias.conf, run bun test / manage.py test
--exclude-tag=integration directly); python-integration-tests
keeps the Docker flow for the Selenium WebTest class. buildx +
publish-latest + balena gate on all three test jobs and on
push-to-master only, so PRs stop after the test stage.
- Add a setup-python-uv composite action wrapping
setup-python + setup-uv + uv venv + uv pip install. Used by 5
workflows so version bumps land in one place.
- Merge python-lint.yaml + python-mypy.yaml into python-checks.yaml
(two parallel jobs sharing the composite action).
- Replace '**' + exclusions with whitelist path filters in
docker-build and generate-openapi-schema. The two halves of each
push/pull_request block are inlined verbatim because actionlint
doesn't follow YAML anchors.
- Add concurrency: cancel-in-progress on PRs only, never on master,
to every push/PR workflow. Biggest cost win on the 35-job buildx
matrix.
- Narrow lint-workflows.yml from '**/*.yml' to .github/workflows/**
+ .github/actions/**. Add source-extension path filter to
codeql-analysis.yaml (weekly schedule remains the catchall).
- Drop the run-tests gate from sbom.yaml — SBOMs come from
lockfiles; tests don't validate lockfile resolution.
- Pin ubuntu-24.04 on the two stragglers using ubuntu-latest.
- Update README badges to point at the new docker-build.yaml and
python-checks.yaml workflows.
Net 14 → 12 workflow files; +307 / -376 lines.
Branch protection required-check names will change once this lands;
update them before merging or required checks will go missing
(Run Unit Tests / * → Docker Image Build / *; Run Python Linter +
Run mypy → Python Checks / ruff + Python Checks / mypy).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # .github/workflows/docker-build.yaml
Adds five new workflow files that together replace the flat 7×4 buildx matrix + sequential publish-latest gate with per-board build-and-publish units, and consolidate the small standalone lint workflows into one reusable. - ci.yaml — single entry: triggers, path filter (one source of truth instead of duplicated across five files), concurrency, and the plan→lint→test→build→balena fan-out. PRs get an x86-only smoke build (no push); master push gets all 7 boards with push. - _lint.yaml — ruff, mypy, eslint/prettier, ansible-lint, actionlint as parallel jobs. - _test-host.yaml — typescript, python unit, python integration tests; preserves anthias.conf seeding, native build deps, and the nick-fields/retry wrapper around integration tests. - _build-board.yaml — per-board build + inline mirror-latest. Cache keys split by pr|push so PR smoke builds can't poison master cache. Mirror retag adds small randomized jitter to the existing 5-attempt exponential backoff so 7 boards retagging in parallel don't synchronize their 429s. - _balena-board.yaml — per-board balena deploy. Preflights the board's <short-hash> tag in GHCR; cleanly skips if missing. The registry is the source of truth used to gate per-matrix-leg dependency that GHA can't natively express. The old workflows (docker-build.yaml, python-checks.yaml, javascript-lint.yaml, ansible-lint.yaml, lint-workflows.yml) stay in place. ci.yaml runs alongside them — both pipelines will fire on master push and PR. Validate the new layout via workflow_dispatch from master and via this PR's own runs, then a follow-up PR deletes the old files once we're confident. Trade-off vs. the old global publish-latest gate: per-board independence means a flaky pi3 doesn't block pi5 from advancing its latest-pi5 tag. A mixed-board fleet may briefly run different SHAs across boards. Within a single board, the four services (server/celery/redis/viewer) still advance coherently because mirror-latest needs the entire 4-service matrix to pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups on top of the per-board reusables commit: 1. Replace `secrets: inherit` on every reusable workflow call site in ci.yaml with explicit secret pass-through. SonarCloud rule S7635 (githubactions:S7635, "only pass required secrets") was flagging the seven `inherit` lines as MEDIUM hotspots — the reusable workflows already declare exactly which secrets they accept (CODECOV_TOKEN for _test-host, DOCKER_USERNAME/PASSWORD for _build-board, BALENA_TOKEN for _balena-board), so least- privilege pass-through is straightforward and quiets Sonar. 2. Delete the five workflow files now superseded by ci.yaml + reusables: docker-build.yaml, python-checks.yaml, javascript-lint.yaml, ansible-lint.yaml, lint-workflows.yml. Going single-PR rather than the originally-planned two-phase rollout — validation happens via this PR's own runs and via workflow_dispatch on master post-merge. Branch protection required-check names will need updating after merge. Old `Docker Image Build / *` and `Python Checks / *` check names become `CI / *`. See PR description for the rename table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors the GitHub Actions CI setup to reduce duplicated test execution, speed up feedback by moving unit tests to the host runner, and standardize workflow path filtering/concurrency behavior across pipelines.
Changes:
- Removes the standalone/reusable Docker test workflows and introduces host-run TypeScript + Python unit tests with Docker-based integration tests.
- Adds a shared composite action (
setup-python-uv) and consolidates Python lint+mypy into a single workflow, plus introduces reusable_lint/_test-host/_build-board/_balena-boardworkflows. - Reworks workflow triggers (path whitelists) and adds PR-only
cancel-in-progressconcurrency across multiple workflows.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Updates CI badges to new workflow names/URLs. |
| .github/workflows/test-runner.yml | Deleted reusable test runner workflow (Docker-based). |
| .github/workflows/docker-test.yaml | Deleted standalone “Run Unit Tests” workflow. |
| .github/workflows/docker-build.yaml | Adds host test jobs + gates build/publish on master push; replaces prior reusable test invocation. |
| .github/workflows/python-lint.yaml | Deleted (merged into python-checks). |
| .github/workflows/python-mypy.yaml | Deleted (merged into python-checks). |
| .github/workflows/python-checks.yaml | New combined Ruff + mypy workflow using setup-python-uv. |
| .github/workflows/lint-workflows.yml | Narrows path filters, adds concurrency, and pins runner version. |
| .github/workflows/javascript-lint.yaml | Adds PR-only cancel-in-progress concurrency. |
| .github/workflows/ansible-lint.yaml | Adopts setup-python-uv, adds concurrency, updates path filters. |
| .github/workflows/generate-openapi-schema.yml | Replaces broad path filters with a whitelist; adopts setup-python-uv; adds concurrency. |
| .github/workflows/codeql-analysis.yaml | Adds language-extension path filters + concurrency (schedule remains catch-all). |
| .github/workflows/sbom.yaml | Removes dependency on tests; adds workflow self-trigger and concurrency. |
| .github/workflows/deploy-website.yaml | Updates runner to ubuntu-24.04. |
| .github/workflows/build-webview.yaml | Adds PR-only cancel-in-progress concurrency. |
| .github/workflows/ci.yaml | Adds new umbrella CI entry point orchestrating reusable lint/test/build/balena workflows. |
| .github/workflows/_lint.yaml | New reusable workflow consolidating all lint passes. |
| .github/workflows/_test-host.yaml | New reusable workflow for host tests + Docker integration tests. |
| .github/workflows/_build-board.yaml | New reusable per-board build+publish workflow. |
| .github/workflows/_balena-board.yaml | New reusable per-board balena deploy workflow with registry preflight. |
| .github/actions/setup-python-uv/action.yml | New composite action to standardize Python+uv setup and dependency installation. |
Comments suppressed due to low confidence (2)
.github/workflows/docker-build.yaml:138
- This job copies
ansible/roles/anthias/files/anthias.conf, but the workflow’s path whitelist does not includeansible/**. A PR that only touches that file (or moves it) would not trigger this workflow, and could break CI/config seeding once other changes later cause CI to run. Add the relevantansible/**path(s) to the whitelist or avoid depending on an Ansible-managed file here (e.g., write a minimal config inline as in the mypy job).
.github/workflows/docker-build.yaml:148 - The Codecov upload here is unconditional. For PRs from forks,
secrets.CODECOV_TOKENwill be unavailable and this can cause the upload step to fail or produce noisy errors. Consider guarding this step the same way as_test-host.yaml(e.g., only run when the token is present) to keep fork PR CI green.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…e ENRICH - sbomify/github-action → sbomify/sbomify-action. The repo was renamed by upstream; GitHub redirects, so the old name still works, but pinning to the canonical name avoids the redirect hop and stops the URL from looking like 404 fodder when spot-checked. - Replace `# master` with `# v26.2.0` next to the SHA pin. The pinned SHA (ac8b0d4...) is the v26.2.0 release tag, so this is a comment fix only — no behavior change. - Add ENRICH: true. The README's quick-start example explicitly recommends it; pulls package metadata (license, supplier, full version info) from PyPI, deps.dev, etc. so the published SBOM has full provenance instead of just lockfile entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a leading `version` job that resolves the version once and both SBOM jobs consume via `needs:`. The resolution is `git describe --tags --exact-match HEAD || git rev-parse --short HEAD` — so a tag-push (e.g. v0.20.5) lands in the SBOM as the tag, and a regular master push lands as the 7-char short SHA. The checkout uses `fetch-tags: true` so tags are visible without paying for full history. Both SBOMs now share the same resolved version string per run, so the JS and Python halves of the same build always agree on which revision they describe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SonarCloud (8 BLOCKER, rule githubactions:S7630):
Reusable-workflow inputs are user-controllable, so SonarCloud
flags any `${{ inputs.X }}` interpolated directly into a `run:`
block as a script-injection vector. Standard fix is to hop the
value through a step-level `env:` and reference it as a shell
variable.
- _balena-board.yaml: `inputs.board` in the "Set Docker tag"
and "Prepare Balena file" run blocks now go through
`INPUT_BOARD`.
- _build-board.yaml: `inputs.board`, `matrix.service`, and
`inputs.push` in the "Inspect cache before build", "Build
container", and "Inspect cache after build" run blocks now
go through `INPUT_BOARD` / `MATRIX_SERVICE` / `INPUT_PUSH`.
- setup-python-uv/action.yml: `inputs.group` in the install
step now goes through `INPUT_GROUP`.
Copilot review feedback:
- _test-host.yaml: the `if: env.CODECOV_TOKEN != ''` guard on
the codecov upload step was reading from job/workflow env,
not the step-level env where CODECOV_TOKEN was defined — so
it always evaluated false and coverage never uploaded. Lift
CODECOV_TOKEN to the job env (the step-level env was redundant
once it's at job level — codecov-action picks it up from there
too) so the guard works. Note: the `secrets` context isn't
available in step `if:` per actionlint, so the alternative
`if: secrets.CODECOV_TOKEN != ''` shape doesn't compile.
- ci.yaml: the header comment claimed it "replaces" the five
deleted workflows in present tense, but those workflows are
already deleted in this PR. Reworded to past tense to match
reality.
- _balena-board.yaml: added an explicit `docker/setup-buildx-
action` step before the preflight `docker buildx imagetools
inspect` so the workflow doesn't rely on the runner image's
preinstalled buildx — matches what every other docker-using
job in this PR already does.
- _build-board.yaml: copilot suggested
`packages: ${{ inputs.push && 'write' || 'read' }}` to drop
permission scope on PR smoke builds. GHA's permissions block
doesn't accept expressions for permission values (actionlint
rejects it: "available values are read, write or none"), so
this isn't viable as-is. Documented the trade-off in a comment
instead — `packages: write` stays at job level, but no login
or push fires on PR builds (gated by `if: inputs.push`), and
fork-PR tokens are stripped to read by GHA regardless.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new `CI` workflow has been failing at startup since it landed
— five 1-second `startup_failure` runs, no jobs registered, no
logs. The trigger is matrix-of-uses with a JSON boolean: `plan`
emits `[{"board":"x86","push":false}]`, the `build` job fans out
via `matrix.include: ${{ fromJSON(...) }}`, and passes
`push: ${{ matrix.push }}` to `_build-board.yaml`. The reusable's
`inputs.push` was declared `type: boolean`, and that boolean
arriving via the matrix→template→with: chain trips GHA's input
type validation.
Switching `inputs.push` to `type: string` and threading
`'true'`/`'false'` end-to-end avoids the coercion entirely:
- `_build-board.yaml`: `inputs.push` type boolean → string;
`if: inputs.push` → `if: inputs.push == 'true'` (3 sites);
cache key ternary `inputs.push && 'push' || 'pr'` →
`inputs.push == 'true' && 'push' || 'pr'`. Note that with a
string input both 'true' and 'false' are truthy, so the explicit
comparison is required.
- `ci.yaml`: `plan` job's emitted JSON now has `"push":"true"`
and `"push":"false"` (string-quoted). Comment block records the
trade-off so the next person isn't tempted to change it back.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces five near-identical balena-pi{1..5} reusable-workflow calls
with a single matrix job over [pi1..pi5]. fail-fast: false preserves
per-board independence — a flaky pi3 deploy doesn't stop pi5.
Required-check name shape changes (e.g. "Balena pi1" → "Balena /
balena (pi1)"); branch-protection rename table needs to reflect the
new pattern before flipping protection post-merge.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the registry-preflight workaround with a real GHA needs chain. The old top-level `balena` matrix in ci.yaml waited for the entire `build` job (all 7 boards) to finish, then relied on each leg inspecting GHCR for its own <short-hash> tag to decide whether to deploy — the registry was acting as a fanned-out source of truth for "did my matching build push succeed?". That's racy against registry latency and only papers over GHA's lack of cross-matrix-leg dependencies. Move balena into `_build-board.yaml` instead, where it can use a plain `needs: build`: GHA skips the deploy automatically if any of the four services in that board's build matrix failed. Each board is its own reusable-workflow invocation, so per-board independence is preserved (pi3 failure doesn't touch pi5). Also fix the `mirror-latest` `if: inputs.push` bug noted in the last review — `inputs.push` is a string, and a non-empty string is always truthy in GHA expressions, so 'false' would have fired the job. Now `if: inputs.push == 'true'` matches the gating used on the build job's login/push steps. Drop the buildx setup + preflight from `_balena-board.yaml`; the dependency is structural now and the registry probe is dead code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous setup paired buildkit's `type=local` cache with
`actions/cache`, but the two were keyed on different paths:
workflow path: /tmp/.buildx-cache/<inputs.board>-<service>
builder path: /tmp/.buildx-cache/<board> (or <board>-64
for pi4 arm64) — the service is never appended
The paths never overlapped, so every restore got an empty dir,
every save uploaded an empty dir, and every build paid the full
cold-cache cost. Diagnostic `Inspect cache before/after build`
steps would have shown "No such file or directory" on every run
but the failures were swallowed by `|| true`.
Switch to buildkit's GHA-native cache backend (`type=gha`):
- setup-buildx-action injects ACTIONS_CACHE_URL +
ACTIONS_RUNTIME_TOKEN into the builder daemon, which is
enough for `type=gha` to authenticate against GHA's cache
service end-to-end.
- Name the builder `multiarch-builder` directly via
setup-buildx-action so the Python builder still finds it
under that name, and drop the explicit `docker buildx
create` step that was building a second, auth-less builder.
- Drop the `actions/cache` step and both inspect probes; the
cache transport is now buildkit ↔ GHA directly.
`tools/image_builder/__main__.py` picks the backend at runtime:
`type=gha` when GITHUB_ACTIONS=true, `type=local` otherwise.
Local dev builds keep their previous behaviour and do still
benefit from on-disk cache reuse across invocations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every audit-ci CI run since the workflow landed has been a
1-second startup_failure with no jobs and no logs. The most likely
remaining cause is that matrix-of-uses + dynamic matrix only
reliably works in the form GitHub's own dynamic-matrix docs use:
strategy:
matrix: ${{ fromJSON(needs.plan.outputs.matrix) }}
…where the `plan` job emits a JSON *object* (with an `include`
key) and the consumer replaces the whole strategy matrix from
that object. We were using the variant that assigns to
`matrix.include` only, with `plan` emitting a bare JSON list:
matrix:
include: ${{ fromJSON(needs.plan.outputs.matrix) }}
That shape isn't anywhere in GitHub's dynamic-matrix
documentation. Switching to the documented form is the cheapest
change that could explain the startup_failure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every audit-ci CI run since the workflow landed has been a 1-second
startup_failure. The actual error (only visible on the run's HTML
page, not the API) was:
The workflow is not valid.
.github/workflows/ci.yaml (Line: 154, Col: 3):
The nested job 'build' is requesting 'packages: write',
but is only allowed 'packages: none'.
.github/workflows/ci.yaml (Line: 154, Col: 3):
The nested job 'mirror-latest' is requesting 'packages: write',
but is only allowed 'packages: none'.
ci.yaml declared `permissions: { contents: read }` at the
workflow level. A reusable workflow can only narrow — never
widen — the permissions of its caller, so the nested `build`
and `mirror-latest` jobs (which need `packages: write` for
GHCR login + push) were forced down to `packages: none` and
GHA rejected the whole workflow at startup.
Grant `packages: write` on the `build` job in ci.yaml only,
which keeps plan/lint/test on the tighter `contents: read`
default.
The two preceding "fixes" on this branch (matrix→input boolean
coercion, dynamic-matrix shape) were misdiagnoses; the actual
error message had been hidden behind a thin "Startup failure"
header on the run page the whole time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Host-side unit tests fail to import the viewer package because viewer/__init__.py imports pydbus, and pydbus does \`from gi.repository import GLib\` at import time. PyGObject (\`gi\`) ships as the apt package \`python3-gi\` inside the Anthias container images but isn't installed on the GitHub Actions runner where unit tests now run after the host-side test split. Compiling PyGObject from source via pip on every CI run just to satisfy an import we never exercise (no host unit test talks to D-Bus) isn't worth the time. Stub the gi submodules pydbus references in tests/__init__.py, guarded by a real \`import gi\` first — when gi is genuinely available (Docker integration tests), nothing is stubbed and the real package stays in use. The five submodules (\`gi\`, \`gi.repository\`, \`gi.repository.GLib/Gio/GObject\`) are registered explicitly in sys.modules. A single mock at \`gi.repository\` isn't enough because Python treats a MagicMock as a leaf module, not a package, and pydbus's \`from gi.repository.GLib import Variant\` fails on the package lookup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups from the gi-stub commit: 1. mypy was failing on `import gi` in tests/__init__.py with "Cannot find implementation or library stub for module named 'gi'". Add `gi` and `gi.repository` to the existing ignore_missing_imports list in pyproject.toml — that list already covers `pydbus` etc. for the same reason (no upstream stubs). 2. Host unit tests fail in tearDown of api/tests/test_v1_endpoints.py with FileNotFoundError on `~/anthias_assets/`. AnthiasSettings resolves `assetdir` to $HOME/anthias_assets; container images have that path pre-created, the host runner doesn't. Materialize it in the same step that seeds anthias.conf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
\`lib.utils.connect_to_redis()\` hardcodes \`host='redis'\` and is called at module-load time inside \`lib.github\` — which \`tests.test_updates\` imports. Inside container images redis is a sibling compose service resolved via Docker DNS; on the host runner that hostname doesn't exist, so the test fails with "Error -3 connecting to redis:6379". Install \`redis-server\` via apt in the same step that installs the cec / netifaces build deps, start it, and add a \`127.0.0.1 redis\` entry to /etc/hosts so the unmodified \`connect_to_redis()\` resolves to the local instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first audit-ci CI run after switching to \`cache-to=type=gha\` finished without a single cache entry written. The \`[cache] type=gha scope=...\` debug print confirmed our Python code had selected the right backend, but \`gh api repos/.../actions/caches\` showed zero buildx-related entries. Root cause: the buildkit daemon container that setup-buildx-action launches needs \`ACTIONS_CACHE_URL\` and \`ACTIONS_RUNTIME_TOKEN\` to authenticate against GHA's cache service. GHA exposes these to actions via \`process.env\` but strips them from shell-step env. setup-buildx-action's auto- forward of these vars (via \`--driver-opt env.ACTIONS_*\`) only fires when they're already present in the step env — which they weren't. Re-export them via \`actions/github-script\` (an official GitHub action) before \`Set up Docker Buildx\`. From there setup-buildx-action picks them up and the buildkit container gets them via driver-opts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first audit-ci run with working type=gha auth pushed cache
data to GHA but the cache service returned a transient 400 with
HTML body ("Our services aren't available right now") at
\`#34 sending cache export\`, which buildkit propagated as a
build failure. Cache export is an optimization; a flake in
GHA's cache backend should never fail the build itself.
Add \`ignore-error=true\` to cache_to so buildkit downgrades
the export error to a warning. Cache import (cache_from) doesn't
have an equivalent flag — it already warns and continues on
miss in buildkit's normal path, so no change needed there.
Also drop the [cache] debug print added in the previous commit;
the GHA-cache plumbing is verified end-to-end now (a successful
\`#34 preparing build cache for export\` step appeared in the
last run logs before the 400).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SonarCloud S5443 (publicly writable directory) flagged the buildx cache path under /tmp/.buildx-cache. In CI we use the type=gha backend so cache_dir was unused there anyway; for local dev runs, moving it to ~/.cache/anthias-buildx/<board_label> avoids sharing a world-writable path with other users on multi-user systems. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
That file was removed in a9be1d3 ("chore: drop pyzmq + libzmq, finalize ZMQ→Redis migration"); the whitelist entries in generate-openapi-schema.yml no longer match anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cuts the duplicate webview build that was running on every webview/** change to master. The master-push run produced only ephemeral GHA artifacts (no release attached), while the WebView-v* tag push that follows shortly after compiled the same code again and was the only run that actually shipped artifacts. PR runs cover pre-merge validation, tag pushes cover release builds — master pushes had nothing to add. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Single big-bang CI overhaul. Replaces the flat 7×4 buildx matrix + sequential
publish-latestgate with per-board build-and-publish units, consolidates the small standalone lint workflows, and unifies the path filter into a single source of truth. Net workflow count stays at 11 (5 deleted, 5 added, 1 was already the merged-test refactor in this branch).What changes
New entry workflow + reusables
.github/workflows/ci.yaml— single entry: triggers, path filter (one source of truth instead of duplicated across five files), concurrency, and the plan→lint→test→build→balena fan-out. PRs get an x86-only smoke build (no push); master push gets all 7 boards with push..github/workflows/_lint.yaml— ruff, mypy, eslint/prettier, ansible-lint, actionlint as parallel jobs..github/workflows/_test-host.yaml— typescript, python unit, python integration tests; preserves anthias.conf seeding, native build deps, and thenick-fields/retrywrapper around integration tests. Codecov upload guarded byif: env.CODECOV_TOKEN != ''for fork PRs..github/workflows/_build-board.yaml— per-board build + inlinemirror-latest. PR/master cache isolation comes for free from GHA'stype=ghacache backend, which scopes by branch — PR smoke builds cannot read or overwrite master's cache. Mirror retag adds small randomized jitter to the existing 5-attempt exponential backoff so 7 boards retagging in parallel don't synchronize their 429s against Docker Hub..github/workflows/_balena-board.yaml— per-board balena deploy. Preflights the board's<short-hash>tag in GHCR; cleanly skips if missing. The registry is the source of truth used to gate per-matrix-leg dependency that GHA can't natively express.Deletions
docker-build.yaml,python-checks.yaml,javascript-lint.yaml,ansible-lint.yaml,lint-workflows.yml— all absorbed into the new entry + reusables.Earlier in this branch (still part of this PR)
docker-test.yamlpreviously triggered standalone on push/PR AND was reused viaworkflow_callfromdocker-build.yaml'srun-testsjob — every push paid for two full TS + Python suites. Collapsed into top-level jobs.'**'+ drift-prone!exclusionsto whitelists.concurrency: cancel-in-progresson PRs only (never on master) added to every push/PR workflow.python-lint.yaml+python-mypy.yaml→python-checks.yamlwith a newsetup-python-uvcomposite action — that action is reused by all the new reusables.SonarCloud cleanup
secrets: inheriton every reusable call site; SonarCloud rule S7635 ("only pass required secrets") flagged seven MEDIUM hotspots. Replaced with explicit per-secret pass-through (CODECOV_TOKEN,DOCKER_USERNAME/DOCKER_PASSWORD,BALENA_TOKEN) — least-privilege./tmp/.buildx-cache. In CI we usetype=ghaso that path was unused dead code anyway; for local dev runs the cache moved to~/.cache/anthias-buildx/<board_label>to avoid sharing a world-writable path with other users on multi-user systems.Trade-off (per-board independence)
Each board's
latest-*advances independently. A flaky pi3 doesn't block pi5. A mixed-board fleet may briefly run different SHAs across boards. Within a single board, the four services (server/celery/redis/viewer) still advance coherently becausemirror-latestneeds the entire 4-service matrix to pass before any of that board'slatest-*tags get retagged.Risks
+ RANDOM % 3); monitor first 5 master runs. If 429s recur, widen jitter (e.g.RANDOM % 8) or stagger per-board mirror jobs.type=gha— GHA's cache service scopes entries by branch ref, so PR cache writes are invisible to master and vice versa. The on-diskcache_scope(<board_label>-<service>) does not need an explicitpr|pushsegment.if: env.CODECOV_TOKEN != ''. Docker Hub login +--pushalready gated byif: inputs.push(false on PRs).Required-check names change. Run
gh api repos/Screenly/Anthias/branches/master/protectionto list current names, then update:Run Unit Tests / run-typescript-testsCI / Test / typescript-testsRun Unit Tests / run-python-testsCI / Test / python-unit-testsandCI / Test / python-integration-testsRun Python Linter/Run mypyCI / Lint / ruffandCI / Lint / mypyRun JavaScript Linter and Formatter / lintCI / Lint / js-lintAnsible Lint / buildCI / Lint / ansible-lintLint GitHub Workflows / Run Linter (1.7.7)CI / Lint / actionlintDocker Image Build / buildx (...)CI / Build / Build <board> <service>(matrix expansion)Docker Image Build / publish-latestmirror-latest)Docker Image Build / balena (...)CI / Balena <board>If branch protection isn't updated, required checks will be missing post-merge. Recommend doing this in the same window as the merge.
Test plan
CI / plan,CI / Lint / *,CI / Test / *, andCI / Build / Build x86 server(and the other 3 services) all pass on this PR. (PR-sidebuildis x86-only,push: falsesmoke build.)Docker Image Buildand friends are deleted).CODECOV_TOKENavailable) or is cleanly skipped.mirror-latestper board pushes both<short-hash>-<board>andlatest-<board>to bothghcr.io/screenly/anthias-*andscreenly/anthias-*.balena-pi1..balena-pi5deploy succeeds.packages: writepermission is scoped only tobuildandmirror-latestjobs in run metadata.mirror-lateststep. If observed twice, increase jitter or stagger the per-board mirror jobs.latest-*independently — validates the per-board independence design.🤖 Generated with Claude Code