Skip to content

fix: raise RLIMIT_NOFILE soft limit at startup#422

Merged
MegaRedHand merged 1 commit into
mainfrom
fix/raise-fd-limit
Jun 8, 2026
Merged

fix: raise RLIMIT_NOFILE soft limit at startup#422
MegaRedHand merged 1 commit into
mainfrom
fix/raise-fd-limit

Conversation

@MegaRedHand

Copy link
Copy Markdown
Collaborator

Summary

Bump the soft open-file-descriptor limit to the hard limit at process start, mirroring ethrex's raise_fd_limit.

Why

The storage layer opens RocksDB with set_max_open_files(-1) (crates/storage/src/backend/rocksdb.rs:38), so the table cache holds every SST file open. On containerized hosts the soft ulimit -n is 1024 by default. After a few days the cache exceeds that, the next open(2) returns EMFILE, RocksDB surfaces it as

commit: Error { message: "IO error: While open a file for random read: /data/157678.sst: Too many open files" }

and the .expect("commit") in Store (crates/storage/src/store.rs:686 and :1046) panics on the tokio worker. spawned_concurrency logs Panic in message handler, the BlockChain actor stops, P2P keeps receiving messages it can't forward (Failed to forward ... err=Actor stopped), and the container still reports Up while the chain stalls.

Observed in a 4-day, 16-node Kurtosis devnet where every node panicked within minutes of each other once the cache grew past 1024 FDs:

node-0 : panicked at crates/storage/src/store.rs:686:24
node-1 : panicked at crates/storage/src/store.rs:686:24
...
node-15: panicked at crates/storage/src/store.rs:686:24

ethrex uses the same RocksDB -1 policy and never hits this because it calls raise_fd_limit() from cmd/ethrex/initializers.rs:543 at startup. This PR ports the same module to ethlambda.

Changes

  • bin/ethlambda/src/fd_limit.rs — new, ported from ethrex (Apache-2.0, copyright header preserved). Linux: setrlimit(NOFILE, soft = hard). macOS/iOS: bump to min(kern.maxfilesperproc, hard). Other platforms: no-op.
  • bin/ethlambda/src/main.rs — call fd_limit::raise_fd_limit() right after the startup banner, log the from→to values, warn on failure instead of aborting.
  • Cargo.toml, bin/ethlambda/Cargo.toml — add libc dep.

Test plan

  • cargo build --bin ethlambda
  • cargo clippy --bin ethlambda -- -D warnings
  • cargo fmt --all -- --check
  • Boot a node and confirm the log line Raised RLIMIT_NOFILE soft limit from=1024 to=524288 appears
  • Run a multi-day devnet without nodes panicking on Too many open files

Follow-ups (not in this PR)

  • The .expect("commit") calls in Store are still load-bearing for any transient IO error — worth converting to Result and letting the actor retry/escalate cleanly rather than panic.
  • A bounded set_max_open_files(N) would also work but trades cache-miss latency for FD safety; raising the limit is the change with no perf cost.

@MegaRedHand MegaRedHand force-pushed the fix/raise-fd-limit branch from 98abc4a to 1f2611d Compare June 8, 2026 14:55
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

🤖 Kimi Code Review

The code is generally correct and follows the expected pattern for raising file descriptor limits in systems programming. This is a straightforward port of well-tested code from Parity/ethrex.

Minor issues:

  1. Missing SAFETY comments (fd_limit.rs, lines 56, 117, 120, 129):
    The unsafe blocks lack // SAFETY: documentation explaining why the raw pointer operations and libc calls are sound. While the usage is correct (properly sized buffers, valid pointers), Rust best practice requires documenting the safety invariants, e.g.:

    // SAFETY: `rlim` is a valid, properly aligned stack-allocated struct.
    if libc::getrlimit(libc::RLIMIT_NOFILE, &mut rlim) != 0 {
  2. Redundant clippy allow (fd_limit.rs, line 24):
    #[allow(clippy::enum_variant_names)] appears unnecessary—the variants don't end with "Error" (they use "FailedTo..." prefixes), so this is likely copy-paste residue.

  3. macOS integer conversion edge case (fd_limit.rs, line 70):
    maxfiles is libc::c_int (signed). While sysctl should never return negative values, casting a negative c_int to libc::rlim_t (unsigned) would produce a massive limit value. Consider adding debug_assert!(maxfiles > 0) after the sysctl call for defense-in-depth.

Correctness acknowledgments:

  • The platform-specific #[cfg] gates are correct: macOS/iOS use KERN_MAXFILESPERPROC sysctl, Linux uses hard limit.
  • Error handling properly uses io::Error::last_os_error() after checking syscall return values.
  • The Outcome/Error enums properly abstract platform differences.
  • main.rs correctly treats fd limit failures as non-fatal (warning only), which is appropriate for a consensus client that should attempt to start even with constrained resources.
  • Integer conversions from rlim_t to u64 use .into(), which is safe (widening conversion on all platforms).

Security/Performance:
No issues. The code executes once at startup, uses stack-allocated buffers, and doesn't introduce new attack surfaces. The RocksDB justification comment in main.rs (lines 147-149) accurately describes the motivation—unbounded table cache with default ulimits causes EMFILE panics.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

🤖 Codex Code Review

No findings.

The PR is operational only; it does not touch fork choice, attestation handling, justification/finalization, state transition, XMSS, or SSZ paths. The startup change in main.rs is consistent with the existing RocksDB configuration in rocksdb.rs: since the backend already uses set_max_open_files(-1), raising RLIMIT_NOFILE at boot is the right mitigation for EMFILE failures on low-ulimit hosts. The fd_limit helper in fd_limit.rs handles failures by logging and continuing, which is the correct failure mode for availability.

Residual risk is low and mostly around validation coverage, not design. I could not complete cargo check in this environment because the pinned toolchain/dependency flow needs writable rustup/cargo state and a git fetch for leanVM, both blocked here. The macOS-specific branch in fd_limit.rs is therefore unverified in practice from this sandbox.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

🤖 Claude Code Review

Here is the review output:


PR #422fix: raise RLIMIT_NOFILE soft limit at startup

Author: Tomás Grüner | +186 / -0 | 4 files changed


Overview

Ports raise_fd_limit() from ethrex to prevent the node from panicking with EMFILE after a few days of uptime. The root cause is set_max_open_files(-1) in crates/storage/src/backend/rocksdb.rs:38 keeping every SST file open indefinitely against a container default soft limit of 1024. The fix is correct, well-motivated, and focused.


Correctness

RLIM_INFINITY on Linux (minor)

bin/ethlambda/src/fd_limit.rs:144:

rlim.rlim_cur = rlim.rlim_max;

If the hard limit is RLIM_INFINITY (e.g. a container launched with --ulimit nofile=-1:-1), this sets rlim_cur to u64::MAX. setrlimit will then either:

  • Succeed (with CAP_SYS_RESOURCE) → the to field logged is 18446744073709551615, which looks alarming.
  • Fail with EPERM → the warning path fires, which is fine.

In practice, Kurtosis/Docker containers cap the hard limit at 524288 or similar, so this won't bite in the target environment. The ethrex original has the same behavior. Still worth a brief note, or capping with a practical sentinel like 1 << 20 the way the macOS branch caps at kern.maxfilesperproc.

from == to produces a misleading log (cosmetic)

If the process already runs with soft == hard (e.g. a privileged container or a dev machine that already bumped the limit), the code still logs:

Raised RLIMIT_NOFILE soft limit from=524288 to=524288

from and to being identical suggests the limit was raised when nothing changed. Simple fix at the call site in main.rs:

Ok(fd_limit::Outcome::LimitRaised { from, to }) if from == to => {
    debug!(from, "RLIMIT_NOFILE soft limit already at hard limit, no change needed");
}
Ok(fd_limit::Outcome::LimitRaised { from, to }) => {
    info!(from, to, "Raised RLIMIT_NOFILE soft limit");
}

Or handle it inside raise_fd_limit itself by returning Outcome::Unsupported / a new NoChangeNeeded variant when already at max.


Code Quality

  • The #[allow(clippy::enum_variant_names)] on Error is correct — all three variants share the Failed prefix, which is inherent to the error domain, not a naming mistake.
  • #[allow(dead_code)] on Outcome::Unsupported is appropriate: the variant is only reachable on platforms other than Linux/macOS/iOS, so it would be dead in a typical CI build.
  • thiserror is correctly pulled from the workspace (bin/ethlambda/Cargo.toml:31), no new dep entry required.
  • The Apache-2.0 attribution header from Parity Technologies is preserved — correct.
  • #[allow(clippy::useless_conversion, non_camel_case_types)] inherited from ethrex; both are needed for the libc FFI types and .into() conversions on platforms where the underlying integer widths already match.

Convention Adherence

  • info!(from, to, "Raised RLIMIT_NOFILE soft limit") uses tracing shorthand correctly per project style.
  • warn!(%err, "Failed to raise ...") matches the inspect_err/warn! pattern used elsewhere.
  • Module placement (mod fd_limit; before mod version;) follows alphabetical ordering.
  • The startup comment explaining why the limit matters (unbounded table cache, EMFILE, panic chain) is the right level of context — the "why is non-obvious" bar from the project guide is clearly met here.

Security / Safety

The unsafe blocks are minimal and correct:

  • Both getrlimit/setrlimit calls check return codes.
  • No pointer aliasing or lifetime issues; all buffers are stack-allocated and passed by address.
  • The macOS sysctl call initialises maxfiles to 0 before use and checks the return value.

One theoretical issue on macOS: if sysctl succeeds but writes a negative value into maxfiles: libc::c_int, the maxfiles as libc::rlim_t cast sign-extends to a very large u64, and cmp::min(..., rlim.rlim_max) picks rlim_max instead — benign in practice since kern.maxfilesperproc is always positive.


Summary

This is a clean, focused fix for a real production incident. The code is a faithful port of a proven module, handles errors gracefully (warn, don't crash), and is correctly gated per platform. The two items above (RLIM_INFINITY edge case and from == to logging) are low-severity and inherited from upstream — the PR is ready to merge as-is, with the logging cosmetic as an optional follow-up.

The PR description's own follow-ups (converting expect("commit") in Store to Result, considering a bounded set_max_open_files) are the right longer-term mitigations and don't block this change.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR ports the raise_fd_limit module from ethrex to ethlambda, bumping the process soft RLIMIT_NOFILE to the hard limit at startup so RocksDB's unbounded table cache no longer exhausts the default container ulimit of 1024 and causes panics.

  • fd_limit.rs is a clean port with platform-specific branches: Linux sets soft = hard, macOS/iOS caps at min(kern.maxfilesperproc, hard), and other platforms are a no-op.
  • main.rs calls the function immediately after the banner, logs the from→to values on success, and only warns on failure — no change can abort the node.
  • Cargo.toml and bin/ethlambda/Cargo.toml add libc = "0.2" to the workspace and binary crate respectively; thiserror was already present.

Confidence Score: 4/5

Safe to merge — failure paths are non-fatal and the fix directly addresses the documented production panic.

The change is a narrow, well-understood system call sequence ported from a proven sibling codebase. The Linux path has a silent degradation when the hard limit is RLIM_INFINITY (setrlimit returns EINVAL, only a warning is logged), and the LimitRaised variant fires even when no actual change was made — both are cosmetic/edge-case issues that do not affect the primary fix for the container ulimit-1024 panic.

bin/ethlambda/src/fd_limit.rs — Linux branch edge case with RLIM_INFINITY hard limits and the always-fires LimitRaised variant.

Important Files Changed

Filename Overview
bin/ethlambda/src/fd_limit.rs New module ported from ethrex that raises RLIMIT_NOFILE soft limit to the hard limit on Linux, and to min(kern.maxfilesperproc, hard) on macOS/iOS; no-op elsewhere. The Linux path does not guard against RLIM_INFINITY hard limits, which causes setrlimit to fail silently (only a warn is emitted).
bin/ethlambda/src/main.rs Calls raise_fd_limit() right after the startup banner, logs from→to on success, and only warns on failure — safe and non-fatal.
Cargo.toml Adds libc = "0.2" to workspace dependencies; straightforward.
bin/ethlambda/Cargo.toml Wires libc.workspace = true into the binary crate; thiserror was already present so all derives compile.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[main: startup banner] --> B[fd_limit::raise_fd_limit]
    B --> C{Platform?}
    C -- Linux --> D[getrlimit RLIMIT_NOFILE]
    C -- macOS/iOS --> E[sysctl kern.maxfilesperproc]
    C -- Other --> F[return Unsupported]
    D --> G[rlim_cur = rlim_max]
    E --> H[getrlimit RLIMIT_NOFILE]
    H --> I["rlim_cur = min(maxfilesperproc, rlim_max)"]
    G --> J[setrlimit]
    I --> J
    J -- success --> K["Ok(LimitRaised { from, to })"]
    J -- error --> L["Err(FailedToSetLimit)"]
    K --> M["info! Raised RLIMIT_NOFILE from=X to=Y"]
    L --> N["warn! Failed to raise RLIMIT_NOFILE"]
    F --> O["info! Not supported on this platform"]
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
bin/ethlambda/src/fd_limit.rs:144
**RLIM_INFINITY hard limit causes setrlimit to fail**

When the process hard limit for `RLIMIT_NOFILE` is `RLIM_INFINITY` (e.g., `ulimit -H -n unlimited`), `rlim_cur` is set to `RLIM_INFINITY` (`u64::MAX`). Linux rejects this with `EINVAL` because `RLIMIT_NOFILE` is silently capped by `/proc/sys/fs/nr_open` (default 1048576). The call lands in the `Err` branch in `main.rs` and only a warning is logged, so the fix has no effect on those hosts. The macOS path avoids this by capping against `kern.maxfilesperproc`; the same approach — reading `/proc/sys/fs/nr_open` and using `min(rlim_max, nr_open)` — would make the Linux path robust in this scenario.

### Issue 2 of 2
bin/ethlambda/src/fd_limit.rs:155-158
**`LimitRaised` fires even when the limit is unchanged**

If the soft limit is already equal to the hard limit, the code calls `setrlimit` with an identical value (which succeeds), and returns `LimitRaised { from: N, to: N }`. The caller in `main.rs` then logs `"Raised RLIMIT_NOFILE soft limit from=N to=N"`, which may confuse operators inspecting startup logs. Adding a short-circuit before `setrlimit` — returning `Unsupported` or a dedicated `AlreadyAtMax` variant when `old_value == rlim.rlim_cur` — would make the log accurate.

Reviews (1): Last reviewed commit: "fix: raise RLIMIT_NOFILE soft limit at s..." | Re-trigger Greptile

let old_value = rlim.rlim_cur;

// Set soft limit to hard limit
rlim.rlim_cur = rlim.rlim_max;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 RLIM_INFINITY hard limit causes setrlimit to fail

When the process hard limit for RLIMIT_NOFILE is RLIM_INFINITY (e.g., ulimit -H -n unlimited), rlim_cur is set to RLIM_INFINITY (u64::MAX). Linux rejects this with EINVAL because RLIMIT_NOFILE is silently capped by /proc/sys/fs/nr_open (default 1048576). The call lands in the Err branch in main.rs and only a warning is logged, so the fix has no effect on those hosts. The macOS path avoids this by capping against kern.maxfilesperproc; the same approach — reading /proc/sys/fs/nr_open and using min(rlim_max, nr_open) — would make the Linux path robust in this scenario.

Prompt To Fix With AI
This is a comment left during a code review.
Path: bin/ethlambda/src/fd_limit.rs
Line: 144

Comment:
**RLIM_INFINITY hard limit causes setrlimit to fail**

When the process hard limit for `RLIMIT_NOFILE` is `RLIM_INFINITY` (e.g., `ulimit -H -n unlimited`), `rlim_cur` is set to `RLIM_INFINITY` (`u64::MAX`). Linux rejects this with `EINVAL` because `RLIMIT_NOFILE` is silently capped by `/proc/sys/fs/nr_open` (default 1048576). The call lands in the `Err` branch in `main.rs` and only a warning is logged, so the fix has no effect on those hosts. The macOS path avoids this by capping against `kern.maxfilesperproc`; the same approach — reading `/proc/sys/fs/nr_open` and using `min(rlim_max, nr_open)` — would make the Linux path robust in this scenario.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread bin/ethlambda/src/fd_limit.rs Outdated
Comment on lines +155 to +158
Ok(Outcome::LimitRaised {
from: old_value.into(),
to: rlim.rlim_cur.into(),
})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 LimitRaised fires even when the limit is unchanged

If the soft limit is already equal to the hard limit, the code calls setrlimit with an identical value (which succeeds), and returns LimitRaised { from: N, to: N }. The caller in main.rs then logs "Raised RLIMIT_NOFILE soft limit from=N to=N", which may confuse operators inspecting startup logs. Adding a short-circuit before setrlimit — returning Unsupported or a dedicated AlreadyAtMax variant when old_value == rlim.rlim_cur — would make the log accurate.

Prompt To Fix With AI
This is a comment left during a code review.
Path: bin/ethlambda/src/fd_limit.rs
Line: 155-158

Comment:
**`LimitRaised` fires even when the limit is unchanged**

If the soft limit is already equal to the hard limit, the code calls `setrlimit` with an identical value (which succeeds), and returns `LimitRaised { from: N, to: N }`. The caller in `main.rs` then logs `"Raised RLIMIT_NOFILE soft limit from=N to=N"`, which may confuse operators inspecting startup logs. Adding a short-circuit before `setrlimit` — returning `Unsupported` or a dedicated `AlreadyAtMax` variant when `old_value == rlim.rlim_cur` — would make the log accurate.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@MegaRedHand MegaRedHand force-pushed the fix/raise-fd-limit branch from 1f2611d to f80aba1 Compare June 8, 2026 15:13
The storage layer opens RocksDB with `set_max_open_files(-1)`, which
keeps an unbounded number of SST files in the table cache. On
containerized hosts the soft `ulimit -n` is 1024 by default, so on a
long-running devnet the table cache eventually grows past that and the
next `open(2)` returns `EMFILE`. RocksDB surfaces this as
`commit: IO error: ... Too many open files`, the `.expect("commit")`
in `Store` panics, the BlockChain actor dies, and the node stops
making progress while the container still reports "Up".

Mirror ethrex's approach: at startup, bump the soft `RLIMIT_NOFILE`
to the hard limit (Linux) or `kern.maxfilesperproc` (macOS). The
ported file is `fd_limit.rs`, originally from Parity Tech via ethrex
under Apache-2.0; copyright header is preserved.

Observed in a 4-day 16-node Kurtosis devnet where every node panicked
within a few minutes of each other once the table cache grew past
1024 FDs.
@MegaRedHand MegaRedHand force-pushed the fix/raise-fd-limit branch from f80aba1 to 0ca9d2a Compare June 8, 2026 15:14
@MegaRedHand MegaRedHand merged commit 899babe into main Jun 8, 2026
2 of 3 checks passed
@MegaRedHand MegaRedHand deleted the fix/raise-fd-limit branch June 8, 2026 21:25
@MegaRedHand MegaRedHand mentioned this pull request Jun 10, 2026
@MegaRedHand MegaRedHand linked an issue Jun 10, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tune RocksDB options

2 participants