Skip to content

fix(node-update): never run hardhat compile on node hosts#415

Open
branarakic wants to merge 2 commits intomainfrom
fix/skip-evm-build-on-node-update
Open

fix(node-update): never run hardhat compile on node hosts#415
branarakic wants to merge 2 commits intomainfrom
fix/skip-evm-build-on-node-update

Conversation

@branarakic
Copy link
Copy Markdown
Contributor

Why

Node updates routinely fail on small/ARM64 VPS because the auto-updater runs hardhat compile whenever a release touched .sol sources. The compile is the heaviest, most failure-prone step in the slot build (cold solc OOMs, hardhat-deploy quirks, the 1m54s "single biggest chunk of the shared build" already documented in ci.yml), and any failure aborts the entire slot swap.

Nothing on the runtime path actually needs hardhat: packages/chain/src/evm-adapter.ts loads contract ABIs from the committed packages/evm-module/abi/*.json via require(). So this PR treats those committed JSONs as the runtime contract surface and moves the freshness check to CI, where it runs once per PR on a beefy runner instead of once per node per update.

The auto-updater already had most of the right machinery (runtimeBuildCommandFromPackageJson -> pnpm build:runtime:packages excludes evm-module; DKG_SKIP_EVM_BUILD=1 short-circuit in maybe-compile.mjs; dkgBuild.releaseRuntimeBuildScript pointer). This PR completes that story by removing the only remaining path where a node host runs hardhat, and by patching the one place — install.sh — that was still doing the unfiltered pnpm build.

What

  • install.sh — both blue-green slots switch from pnpm build to pnpm build:runtime. Mirrors what the auto-updater already picks via dkgBuild.releaseRuntimeBuildScript. Saves ~4 min on first install (~2 min × 2 slots).
  • packages/cli/src/daemon/auto-update.ts — remove the conditional pnpm --filter @origintrail-official/dkg-evm-module build block and the now-unused shouldRebuildContracts helper. Replace with a comment documenting the new contract.
  • packages/cli/src/config.ts — mark AutoUpdateBuildTimeouts.contracts as @deprecated. Field is preserved on the type so existing user configs don't fail to parse, but the value is no longer read.
  • .github/workflows/ci.yml — add an abi-freshness job (TORNADO tier). Runs only when paths-filter reports contracts == true, executes pnpm --filter dkg-evm-module build, then git diff --exit-code -- packages/evm-module/abi/. On drift, fails with a copy-pasteable remediation message.
  • packages/cli/test/auto-update.test.ts — drop the four shouldRebuildContracts-era tests (timeout honouring, fails-closed, hardhat-clean ordering, fetch-retry) and replace with one regression guard asserting the auto-updater never invokes any hardhat / evm-module command, even when the diff would have shown changed .sol files.
  • RELEASE_PROCESS.md + packages/evm-module/README.md — document the new contract: contributors regenerate ABIs alongside .sol changes, CI enforces it, nodes never recompile.

Operational impact

  • Node updates: zero hardhat invocations on the host. The slowest, most failure-prone step is gone. Updates that previously failed on resource-constrained nodes when contracts changed should now succeed.
  • Releases: unchanged. release.yml and npm-continuous-publish.yml still run unfiltered pnpm build (they need contract artifacts to ship).
  • CI lanes: unchanged for non-contract PRs (zero new cost). Contract-touching PRs get one extra ~1m job (abi-freshness) on top of the existing solidity lane.
  • Contributors: if you change .sol, also commit regenerated ABIs in the same change. CI prints the exact remediation command on failure:
    ```
    pnpm --filter @origintrail-official/dkg-evm-module build
    git add packages/evm-module/abi/
    ```

Why CI freshness vs auto-commit

We deliberately picked the safer "fail the PR with a clear message" path over "auto-commit regenerated ABIs back to the PR". Auto-commit needs write permissions on PR branches (cross-fork PRs would silently skip), and a successful auto-commit on a malicious PR could land bytecode the human reviewer never looked at. Failing fast with an explicit remediation keeps the human in the loop.

Out of scope (follow-up PRs)

  • Collapsing to a single ABI snapshot (packages/chain/abi/ vs packages/evm-module/abi/) — bigger refactor, deserves its own review.
  • Pruning evm-module out of the runtime install graph entirely — depends on the snapshot collapse.
  • Renaming build:packages per a comment on PR fix(build): include node ui in root build #412 — out of scope here.

Test plan

  • `pnpm vitest run test/auto-update.test.ts` (cli): 73/73 passing.
  • `pnpm vitest run test/migration.test.ts test/node-ui-static.test.ts` (cli): 38/38 passing.
  • `DKG_SKIP_EVM_BUILD=1 pnpm build` at repo root: 20/20 turbo tasks OK.
  • `bash -n install.sh`: clean.
  • `.github/workflows/ci.yml` parses as YAML; new `abi-freshness` job is registered.
  • CI on this PR — full matrix.
  • Manual smoke: deliberately tweak a `.sol` file without regenerating ABIs and confirm `abi-freshness` fails with the expected remediation message.
  • Manual smoke: end-to-end `dkg update` on a low-RAM test VM with a release that touches contracts; confirm no hardhat invocation in the slot logs.

Related

  • Removes the failure surface that's been the recurring "update broke my node" report.
  • Complementary to fix(build): include node ui in root build #412 (different surface — that PR fixes what `pnpm build` means at the root; this PR is about what nodes invoke). No conflict.

Made with Cursor

Node updates routinely failed on small / ARM64 VPS because the auto-updater
ran `hardhat compile` whenever a release touched `.sol` sources. The compile
is the heaviest and most fragile step in the slot build (cold solc OOMs,
hardhat-deploy quirks), and any failure aborted the entire slot swap.

Nothing on the runtime path actually needs hardhat: `packages/chain` loads
contract ABIs from the COMMITTED `packages/evm-module/abi/*.json` via
`require()`. Treat those committed JSONs as the runtime contract surface and
move the freshness check to CI, where it runs once per PR on a beefy runner
instead of on every node, every update.

Changes
-------
- install.sh: switch both blue-green slots from `pnpm build` to
  `pnpm build:runtime`. Mirrors what the auto-updater already picks via
  `dkgBuild.releaseRuntimeBuildScript`, and skips the ~2 min hardhat compile
  per slot (so first install is ~4 min faster on small hosts).
- packages/cli/src/daemon/auto-update.ts: remove the conditional
  `pnpm --filter dkg-evm-module build` block and the now-unused
  `shouldRebuildContracts` helper. Replace with a comment documenting the
  new contract: committed ABIs are authoritative, CI enforces freshness.
- packages/cli/src/config.ts: mark `AutoUpdateBuildTimeouts.contracts` as
  `@deprecated`. Field is preserved on the type so existing user configs
  don't fail to parse, but the value is no longer read.
- .github/workflows/ci.yml: add an `abi-freshness` job (TORNADO tier).
  Runs only when `paths-filter` reports `contracts == true`, executes
  `pnpm --filter dkg-evm-module build`, then
  `git diff --exit-code -- packages/evm-module/abi/`. On drift it fails
  with a copy-pasteable remediation message so the contributor can fix and
  push without guessing.
- packages/cli/test/auto-update.test.ts: drop the four
  `shouldRebuildContracts`-era tests (timeout, fails-closed, hardhat-clean
  ordering, fetch-retry) and replace with one regression guard asserting
  the auto-updater never invokes any `hardhat` / evm-module command, even
  when the diff would have shown changed `.sol` files.
- RELEASE_PROCESS.md, packages/evm-module/README.md: document the new
  contract — contributors regenerate ABIs alongside `.sol` changes, CI
  enforces it, nodes never recompile.

Operational impact
------------------
- Node updates: zero hardhat invocations on the host. The slowest, most
  failure-prone step is gone.
- Releases: unchanged. Release workflows still run unfiltered `pnpm build`
  (CI lane uses `DKG_SKIP_EVM_BUILD=1`).
- Contributors: if you change `.sol`, also commit regenerated ABIs. CI
  prints the exact `pnpm --filter ... build && git add packages/evm-module/abi/`
  command on failure.

Test plan
---------
- `pnpm vitest run test/auto-update.test.ts` (cli): 73/73 passing.
- `pnpm vitest run test/migration.test.ts test/node-ui-static.test.ts`
  (cli): 38/38 passing.
- `DKG_SKIP_EVM_BUILD=1 pnpm build` at repo root: 20/20 turbo tasks OK.
- `bash -n install.sh`: clean.
- ci.yml YAML parses; new `abi-freshness` job listed.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread .github/workflows/ci.yml Outdated
restore-keys: |
hardhat-${{ runner.os }}-
- name: Compile contracts (regenerates abi/*.json)
run: pnpm --filter @origintrail-official/dkg-evm-module build
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This gate will not reliably catch ABI drift as written. @origintrail-official/dkg-evm-module's build script runs hardhat compile --config hardhat.node.config.ts, but the ABI exporter is only configured in hardhat.config.ts. That means this step can succeed without rewriting packages/evm-module/abi/*.json, so stale committed ABIs would still pass. Run Hardhat with the config that enables abiExporter, or add an explicit ABI-export step before diffing abi/.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — confirmed the gate was a no-op. pnpm --filter dkg-evm-module build runs hardhat compile --config hardhat.node.config.ts, but hardhat-abi-exporter is only imported and configured in the default hardhat.config.ts (with runOnCompile: true). The node config never touches abi/, so the diff check was guaranteed to pass even with stale ABIs.

Fixed in 9234261 by switching the step to npx hardhat compile (no --config, picks up the abi-exporting default), expanding the cache key to include hardhat.config.ts, and updating the README + auto-update.ts comment + remediation message to point at the same command.

Verified locally:

  • positive smoke: regenerate on clean main → no diff under abi/
  • negative smoke: corrupt one ABI file → git diff --exit-code -- abi/ returns 1

Comment thread install.sh Outdated
# Solidity sources, so nodes never need to invoke `hardhat compile`.
# Mirrors what the auto-updater does (see node-ui-static.ts:
# runtimeBuildCommandFromPackageJson).
(cd "$SLOT_A" && pnpm build:runtime)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: install.sh now assumes every target ref exposes build:runtime, but this script still supports arbitrary DKG_BRANCH/DKG_REPO values and already carries compatibility logic for older checkouts. Installing an older tag/branch that only has build will now fail on first install instead of falling back like the auto-updater does. Reuse the same runtime-build detection/fallback here (build:runtime -> build:runtime:packages -> build).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — install.sh would regress for older DKG_BRANCH/DKG_REPO targets that predate build:runtime.

Fixed in 9234261 by adding runtime_build_script(), a small node helper (node is already required earlier in install.sh) that mirrors runtimeBuildCommandFromPackageJson from node-ui-static.ts exactly: honours dkgBuild.releaseRuntimeBuildScript first, then falls through build:runtimebuild:runtime:packagesbuild. Validates the dkgBuild override against [A-Za-z0-9:_-]+ so a malicious package.json can't inject shell into pnpm run.

Smoke-tested all 6 fallback paths:

  • current main → build:runtime:packages (via releaseRuntimeBuildScript)
  • only build:runtime → picks it
  • only build:runtime:packages → picks it
  • legacy build only → falls all the way down
  • malformed package.json → safe fallback to build
  • unsafe dkgBuild override → ignored, falls through

Two bugs from automated review on the previous commit:

1. abi-freshness gate was a no-op
   `pnpm --filter dkg-evm-module build` shells out to
   `hardhat compile --config hardhat.node.config.ts`, but the
   `hardhat-abi-exporter` plugin (with `runOnCompile: true`) is only wired
   into the *default* `hardhat.config.ts`. The node config never touches
   `abi/`, so the diff check would always pass even when committed ABIs
   were stale — silently activating the wrong contract surface on every
   node. Switched the CI step to `npx hardhat compile` (no `--config`,
   picks up the abi-exporting default), expanded the cache key to cover
   `hardhat.config.ts` too, and updated the remediation message + README +
   auto-update.ts comment to point at the same command.

   Verified locally: regenerating with the fixed command yields no diff
   on a clean main checkout (positive smoke), and corrupting one ABI
   file makes `git diff --exit-code -- abi/` return 1 (negative smoke).

2. install.sh hard-coded `pnpm build:runtime`
   The script supports arbitrary `DKG_BRANCH` / `DKG_REPO`, so installing
   an older tag that predates `build:runtime` would now fail on first
   install. Added `runtime_build_script()` that mirrors the JS
   `runtimeBuildCommandFromPackageJson` fallback chain (`dkgBuild
   .releaseRuntimeBuildScript` -> `build:runtime` -> `build:runtime
   :packages` -> `build`) using the node binary that the script already
   requires. Validates the dkgBuild override against `[A-Za-z0-9:_-]+`
   so a malicious package.json can't inject shell.

   Smoke-tested all 6 fallback cases including malformed JSON and unsafe
   override values.

Tests: cli/auto-update.test.ts still 73/73 green, ci.yml still parses.
Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread .github/workflows/ci.yml
needs: changes
runs-on: ubuntu-latest
timeout-minutes: 10
if: needs.changes.outputs.contracts == 'true'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This new gate only runs when changes.outputs.contracts is true, but that filter does not include packages/evm-module/abi/**. Because the updater now trusts the committed ABI JSONs at runtime, a PR can modify abi/*.json directly and completely bypass this freshness check. Include abi/** in the contracts paths filter (or make this job self-detect ABI changes) so runtime ABI drift cannot merge unchecked.

Comment thread RELEASE_PROCESS.md

Git-based blue-green updates run runtime packages and the Node UI static bundle as separate timed build steps, then verify `packages/node-ui/dist-ui/index.html` before activation. `build:runtime` remains a UI-inclusive compatibility wrapper so nodes updating from an older updater still prepare the UI through the target ref's build script.

> **Note on EVM contracts**: nodes never run `hardhat compile` during install or auto-update. The committed `packages/evm-module/abi/*.json` files are the runtime contract surface (consumed by `packages/chain` via `require()`). The `abi-freshness` CI job (`.github/workflows/ci.yml`) blocks any PR that changes Solidity sources without committing the regenerated ABIs, so by the time a tag exists on `main` the committed ABIs are guaranteed to match. **Release implication**: when contract source changes are part of a release, the contributor MUST regenerate ABIs (`pnpm --filter @origintrail-official/dkg-evm-module build && git add packages/evm-module/abi/`) and commit them with the source change. CI enforces this; releases cut from `main` cannot ship with stale ABIs.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: The command in this note will not regenerate packages/evm-module/abi/*.json. @origintrail-official/dkg-evm-module's build script compiles with hardhat.node.config.ts, which does not load hardhat-abi-exporter, so contributors following this release doc will do the wrong thing. Point the remediation here at cd packages/evm-module && npx hardhat compile instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant