Skip to content

[META] fbuild build cache must survive CI tar-extract / cross-runner restore #147

@zackees

Description

@zackees

Why this exists

#112 (FastLED examples CI benchmark) landed iteration 3 and measured a warm-cache run that produced zero cross-run compile speedup despite actions/cache correctly restoring 304 MB of fbuild + project state. Every one of the 83 FastLED examples re-compiled at the same rate as cold (~1.5s per example × 82 ≈ 124s of avoidable work).

Root cause is not one bug but a cluster of cache-key and cache-scope bugs, each of which independently breaks cross-run reuse:

  1. fbuild's fast-path fingerprint mixes mtime into the hash → tar-extract invalidates (build: per-TU cache misses across CI runs — fast-path fingerprint bakes mtime, defeats content-based reuse #146).
  2. fbuild bakes absolute filesystem paths into cache keys → cross-workspace / cross-runner miss (build: absolute filesystem paths baked into cache keys — breaks cross-runner / cross-workspace reuse #148).
  3. zccache (the per-TU object store) isn't discoverable by consumer workflows → can't be cached at all (daemon/setup: expose zccache store location so consumer workflows can actions/cache it #149).
  4. zccache's own keys may bake absolute paths (the command line passed to it) → needs audit (build: audit zccache per-TU cache keys for cross-runner stability #150).
  5. The benchmark workflow doesn't include zccache's store in its actions/cache path list yet (bench: add zccache store to bench/fastled-examples actions/cache path list #151).
  6. No consumer-facing doc explains the full caching recipe (docs: document cross-run CI cache strategy for fbuild consumers #152).

Fix all of them and the cross-run warm regime moves from "caches 304 MB and saves 9 seconds" to "caches ~500 MB and saves ~2 minutes."

Evidence

Benchmark on bench/fastled-examples, board uno, fbuild 2.1.21, Ubuntu 24.04:

run cache-hit compile first example examples 2–83
iter3 cold 24626421386 false 140s 20.5s 1.4–2.0s
iter3 warm 24626590831 true (304 MB restored) 142s 11.8s 1.4–2.0s

The 9s saved on example #1 is the ~/.fbuild toolchain-materialization cache paying off. Per-TU reuse across runs: zero.

Within a single run, the fbuild daemon is already warm after example #1 (examples 2–83 run at ~1.5s each). That "warm-within-run" rate is what we want to reproduce cross-run. Nothing in this meta is about speeding up that rate further — we just want to stop throwing it away between runs.

Sub-issues

fbuild code changes

Setup / distribution

Benchmark + validation

Documentation

Ordering / critical path

#146  ─┬──────┐
#148  ─┘      │
              ├─► validate via #151 ─► #152 docs
#149 ─► #150 ─┘

The biggest single lever is #146. On its own it should recover most of the ~124s currently spent recompiling unchanged TUs (assuming zccache's own keying is content-based, which #150 will confirm). #148 protects against regressions when the runner image or workspace path shifts. #149 + #151 turn the warm number from "cheap per-TU reuse because daemon is warm within-run" into "zero compile work because the prior objects are on disk already."

Definition of done

On the bench/fastled-examples benchmark (board=uno, examples=all, FastLED master, fbuild ≥ release-containing-#146-and-#148):

  1. Same-runner warm regime: a warm run after a cold run on the same runner image drops compile phase from ~142s to < 10s (only the first example pays a measurable cost; examples 2–83 finish sub-second each).
  2. Cross-runner warm regime: a warm run on a different runner of the same image (i.e., workspace paths differ, toolchain paths differ, but content is identical) also hits the cache and completes in the same range. This is what build: absolute filesystem paths baked into cache keys — breaks cross-runner / cross-workspace reuse #148 and build: audit zccache per-TU cache keys for cross-runner stability #150 exist to enable.
  3. Numbers published as a final iteration comment on [META] Fastest possible FastLED examples CI rebuild — profile + benchmark #112.
  4. docs/CI_CACHE.md in tree describing the full recipe.

Until (1)+(2) hold, #112 cannot measurably improve beyond its current iter3 numbers, because the compile phase dominates total wall-clock and nothing else we could optimize (parallel, venv cache, uv sync) adds up to the 124s of avoidable recompile.

Non-goals

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority: p1Important follow-up after p0 foundationstrackingUmbrella or tracking issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions