fix(node-update): never run hardhat compile on node hosts#415
fix(node-update): never run hardhat compile on node hosts#415branarakic wants to merge 2 commits intomainfrom
Conversation
Node updates routinely failed on small / ARM64 VPS because the auto-updater ran `hardhat compile` whenever a release touched `.sol` sources. The compile is the heaviest and most fragile step in the slot build (cold solc OOMs, hardhat-deploy quirks), and any failure aborted the entire slot swap. Nothing on the runtime path actually needs hardhat: `packages/chain` loads contract ABIs from the COMMITTED `packages/evm-module/abi/*.json` via `require()`. Treat those committed JSONs as the runtime contract surface and move the freshness check to CI, where it runs once per PR on a beefy runner instead of on every node, every update. Changes ------- - install.sh: switch both blue-green slots from `pnpm build` to `pnpm build:runtime`. Mirrors what the auto-updater already picks via `dkgBuild.releaseRuntimeBuildScript`, and skips the ~2 min hardhat compile per slot (so first install is ~4 min faster on small hosts). - packages/cli/src/daemon/auto-update.ts: remove the conditional `pnpm --filter dkg-evm-module build` block and the now-unused `shouldRebuildContracts` helper. Replace with a comment documenting the new contract: committed ABIs are authoritative, CI enforces freshness. - packages/cli/src/config.ts: mark `AutoUpdateBuildTimeouts.contracts` as `@deprecated`. Field is preserved on the type so existing user configs don't fail to parse, but the value is no longer read. - .github/workflows/ci.yml: add an `abi-freshness` job (TORNADO tier). Runs only when `paths-filter` reports `contracts == true`, executes `pnpm --filter dkg-evm-module build`, then `git diff --exit-code -- packages/evm-module/abi/`. On drift it fails with a copy-pasteable remediation message so the contributor can fix and push without guessing. - packages/cli/test/auto-update.test.ts: drop the four `shouldRebuildContracts`-era tests (timeout, fails-closed, hardhat-clean ordering, fetch-retry) and replace with one regression guard asserting the auto-updater never invokes any `hardhat` / evm-module command, even when the diff would have shown changed `.sol` files. - RELEASE_PROCESS.md, packages/evm-module/README.md: document the new contract — contributors regenerate ABIs alongside `.sol` changes, CI enforces it, nodes never recompile. Operational impact ------------------ - Node updates: zero hardhat invocations on the host. The slowest, most failure-prone step is gone. - Releases: unchanged. Release workflows still run unfiltered `pnpm build` (CI lane uses `DKG_SKIP_EVM_BUILD=1`). - Contributors: if you change `.sol`, also commit regenerated ABIs. CI prints the exact `pnpm --filter ... build && git add packages/evm-module/abi/` command on failure. Test plan --------- - `pnpm vitest run test/auto-update.test.ts` (cli): 73/73 passing. - `pnpm vitest run test/migration.test.ts test/node-ui-static.test.ts` (cli): 38/38 passing. - `DKG_SKIP_EVM_BUILD=1 pnpm build` at repo root: 20/20 turbo tasks OK. - `bash -n install.sh`: clean. - ci.yml YAML parses; new `abi-freshness` job listed. Co-authored-by: Cursor <cursoragent@cursor.com>
| restore-keys: | | ||
| hardhat-${{ runner.os }}- | ||
| - name: Compile contracts (regenerates abi/*.json) | ||
| run: pnpm --filter @origintrail-official/dkg-evm-module build |
There was a problem hiding this comment.
🔴 Bug: This gate will not reliably catch ABI drift as written. @origintrail-official/dkg-evm-module's build script runs hardhat compile --config hardhat.node.config.ts, but the ABI exporter is only configured in hardhat.config.ts. That means this step can succeed without rewriting packages/evm-module/abi/*.json, so stale committed ABIs would still pass. Run Hardhat with the config that enables abiExporter, or add an explicit ABI-export step before diffing abi/.
There was a problem hiding this comment.
Good catch — confirmed the gate was a no-op. pnpm --filter dkg-evm-module build runs hardhat compile --config hardhat.node.config.ts, but hardhat-abi-exporter is only imported and configured in the default hardhat.config.ts (with runOnCompile: true). The node config never touches abi/, so the diff check was guaranteed to pass even with stale ABIs.
Fixed in 9234261 by switching the step to npx hardhat compile (no --config, picks up the abi-exporting default), expanding the cache key to include hardhat.config.ts, and updating the README + auto-update.ts comment + remediation message to point at the same command.
Verified locally:
- positive smoke: regenerate on clean main → no diff under
abi/ - negative smoke: corrupt one ABI file →
git diff --exit-code -- abi/returns 1
| # Solidity sources, so nodes never need to invoke `hardhat compile`. | ||
| # Mirrors what the auto-updater does (see node-ui-static.ts: | ||
| # runtimeBuildCommandFromPackageJson). | ||
| (cd "$SLOT_A" && pnpm build:runtime) |
There was a problem hiding this comment.
🔴 Bug: install.sh now assumes every target ref exposes build:runtime, but this script still supports arbitrary DKG_BRANCH/DKG_REPO values and already carries compatibility logic for older checkouts. Installing an older tag/branch that only has build will now fail on first install instead of falling back like the auto-updater does. Reuse the same runtime-build detection/fallback here (build:runtime -> build:runtime:packages -> build).
There was a problem hiding this comment.
Agreed — install.sh would regress for older DKG_BRANCH/DKG_REPO targets that predate build:runtime.
Fixed in 9234261 by adding runtime_build_script(), a small node helper (node is already required earlier in install.sh) that mirrors runtimeBuildCommandFromPackageJson from node-ui-static.ts exactly: honours dkgBuild.releaseRuntimeBuildScript first, then falls through build:runtime → build:runtime:packages → build. Validates the dkgBuild override against [A-Za-z0-9:_-]+ so a malicious package.json can't inject shell into pnpm run.
Smoke-tested all 6 fallback paths:
- current main →
build:runtime:packages(viareleaseRuntimeBuildScript) - only
build:runtime→ picks it - only
build:runtime:packages→ picks it - legacy
buildonly → falls all the way down - malformed package.json → safe fallback to
build - unsafe
dkgBuildoverride → ignored, falls through
Two bugs from automated review on the previous commit: 1. abi-freshness gate was a no-op `pnpm --filter dkg-evm-module build` shells out to `hardhat compile --config hardhat.node.config.ts`, but the `hardhat-abi-exporter` plugin (with `runOnCompile: true`) is only wired into the *default* `hardhat.config.ts`. The node config never touches `abi/`, so the diff check would always pass even when committed ABIs were stale — silently activating the wrong contract surface on every node. Switched the CI step to `npx hardhat compile` (no `--config`, picks up the abi-exporting default), expanded the cache key to cover `hardhat.config.ts` too, and updated the remediation message + README + auto-update.ts comment to point at the same command. Verified locally: regenerating with the fixed command yields no diff on a clean main checkout (positive smoke), and corrupting one ABI file makes `git diff --exit-code -- abi/` return 1 (negative smoke). 2. install.sh hard-coded `pnpm build:runtime` The script supports arbitrary `DKG_BRANCH` / `DKG_REPO`, so installing an older tag that predates `build:runtime` would now fail on first install. Added `runtime_build_script()` that mirrors the JS `runtimeBuildCommandFromPackageJson` fallback chain (`dkgBuild .releaseRuntimeBuildScript` -> `build:runtime` -> `build:runtime :packages` -> `build`) using the node binary that the script already requires. Validates the dkgBuild override against `[A-Za-z0-9:_-]+` so a malicious package.json can't inject shell. Smoke-tested all 6 fallback cases including malformed JSON and unsafe override values. Tests: cli/auto-update.test.ts still 73/73 green, ci.yml still parses. Co-authored-by: Cursor <cursoragent@cursor.com>
| needs: changes | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 10 | ||
| if: needs.changes.outputs.contracts == 'true' |
There was a problem hiding this comment.
🔴 Bug: This new gate only runs when changes.outputs.contracts is true, but that filter does not include packages/evm-module/abi/**. Because the updater now trusts the committed ABI JSONs at runtime, a PR can modify abi/*.json directly and completely bypass this freshness check. Include abi/** in the contracts paths filter (or make this job self-detect ABI changes) so runtime ABI drift cannot merge unchecked.
|
|
||
| Git-based blue-green updates run runtime packages and the Node UI static bundle as separate timed build steps, then verify `packages/node-ui/dist-ui/index.html` before activation. `build:runtime` remains a UI-inclusive compatibility wrapper so nodes updating from an older updater still prepare the UI through the target ref's build script. | ||
|
|
||
| > **Note on EVM contracts**: nodes never run `hardhat compile` during install or auto-update. The committed `packages/evm-module/abi/*.json` files are the runtime contract surface (consumed by `packages/chain` via `require()`). The `abi-freshness` CI job (`.github/workflows/ci.yml`) blocks any PR that changes Solidity sources without committing the regenerated ABIs, so by the time a tag exists on `main` the committed ABIs are guaranteed to match. **Release implication**: when contract source changes are part of a release, the contributor MUST regenerate ABIs (`pnpm --filter @origintrail-official/dkg-evm-module build && git add packages/evm-module/abi/`) and commit them with the source change. CI enforces this; releases cut from `main` cannot ship with stale ABIs. |
There was a problem hiding this comment.
🟡 Issue: The command in this note will not regenerate packages/evm-module/abi/*.json. @origintrail-official/dkg-evm-module's build script compiles with hardhat.node.config.ts, which does not load hardhat-abi-exporter, so contributors following this release doc will do the wrong thing. Point the remediation here at cd packages/evm-module && npx hardhat compile instead.
Why
Node updates routinely fail on small/ARM64 VPS because the auto-updater runs
hardhat compilewhenever a release touched.solsources. The compile is the heaviest, most failure-prone step in the slot build (cold solc OOMs, hardhat-deploy quirks, the 1m54s "single biggest chunk of the shared build" already documented inci.yml), and any failure aborts the entire slot swap.Nothing on the runtime path actually needs hardhat:
packages/chain/src/evm-adapter.tsloads contract ABIs from the committedpackages/evm-module/abi/*.jsonviarequire(). So this PR treats those committed JSONs as the runtime contract surface and moves the freshness check to CI, where it runs once per PR on a beefy runner instead of once per node per update.The auto-updater already had most of the right machinery (
runtimeBuildCommandFromPackageJson->pnpm build:runtime:packagesexcludes evm-module;DKG_SKIP_EVM_BUILD=1short-circuit inmaybe-compile.mjs;dkgBuild.releaseRuntimeBuildScriptpointer). This PR completes that story by removing the only remaining path where a node host runshardhat, and by patching the one place —install.sh— that was still doing the unfilteredpnpm build.What
install.sh— both blue-green slots switch frompnpm buildtopnpm build:runtime. Mirrors what the auto-updater already picks viadkgBuild.releaseRuntimeBuildScript. Saves ~4 min on first install (~2 min × 2 slots).packages/cli/src/daemon/auto-update.ts— remove the conditionalpnpm --filter @origintrail-official/dkg-evm-module buildblock and the now-unusedshouldRebuildContractshelper. Replace with a comment documenting the new contract.packages/cli/src/config.ts— markAutoUpdateBuildTimeouts.contractsas@deprecated. Field is preserved on the type so existing user configs don't fail to parse, but the value is no longer read..github/workflows/ci.yml— add anabi-freshnessjob (TORNADO tier). Runs only whenpaths-filterreportscontracts == true, executespnpm --filter dkg-evm-module build, thengit diff --exit-code -- packages/evm-module/abi/. On drift, fails with a copy-pasteable remediation message.packages/cli/test/auto-update.test.ts— drop the fourshouldRebuildContracts-era tests (timeout honouring, fails-closed, hardhat-clean ordering, fetch-retry) and replace with one regression guard asserting the auto-updater never invokes anyhardhat/ evm-module command, even when the diff would have shown changed.solfiles.RELEASE_PROCESS.md+packages/evm-module/README.md— document the new contract: contributors regenerate ABIs alongside.solchanges, CI enforces it, nodes never recompile.Operational impact
release.ymlandnpm-continuous-publish.ymlstill run unfilteredpnpm build(they need contract artifacts to ship).abi-freshness) on top of the existingsoliditylane..sol, also commit regenerated ABIs in the same change. CI prints the exact remediation command on failure:```
pnpm --filter @origintrail-official/dkg-evm-module build
git add packages/evm-module/abi/
```
Why CI freshness vs auto-commit
We deliberately picked the safer "fail the PR with a clear message" path over "auto-commit regenerated ABIs back to the PR". Auto-commit needs write permissions on PR branches (cross-fork PRs would silently skip), and a successful auto-commit on a malicious PR could land bytecode the human reviewer never looked at. Failing fast with an explicit remediation keeps the human in the loop.
Out of scope (follow-up PRs)
packages/chain/abi/vspackages/evm-module/abi/) — bigger refactor, deserves its own review.evm-moduleout of the runtime install graph entirely — depends on the snapshot collapse.build:packagesper a comment on PR fix(build): include node ui in root build #412 — out of scope here.Test plan
Related
Made with Cursor