We observe recurring stalls in the Engine API (reth) causing op-node forkchoiceUpdated/newPayload to time out and l2_unsafe to grow. CPU/
RAM/IO look normal. This happens before we restart containers; after restart op-node re-syncs.
Version: ghcr.io/base/node-reth:v0.14.3
Symptoms (op-node log):
- Repeated Post "http://reth:8551": context deadline exceeded while inserting payloads.
- Failed to share forkchoice-updated signal and Engine temporary error.
- op-node eventually starts EL sync again after restart.
Reth log evidence
- engine::tree::payload_processor::multiproof → "read transaction has been timed out" (DatabaseErrorInfo, code -96000).
- Followed by engine::tree "Failed to send internal event: SendError" spam.
- Also Invalid block on new payload with blob gas used mismatch: got 0, expected …
- After restart, reth logs: waiting for first Flashblock and could not process Flashblock ... recently restarted or syncing.
Impact
- op_node_default_refs_time alert fires (>200).
- l2_unsafe grows until we restart containers.
Expected
- Engine API remains responsive; no DB read tx timeout or internal event channel failures under normal load.