feat: Shadow simulator compatibility behind shadow-integration feature#420
Conversation
🤖 Kimi Code ReviewReview of PR #420 (Shadow simulator integration) The PR introduces Shadow simulator compatibility by patching Critical Issues1. Typo in error message last_send_error.lock().expect("poisend lock");Change "poisend" to "poisoned". Code Quality & Safety2. Unused imports when logging disabled #[cfg(any(feature = "tracing", feature = "direct-log"))]
use std::sync::Mutex;
#[cfg(any(feature = "tracing", feature = "direct-log"))]
use std::time::Instant;3. Unsafe block needs justification // Safety: MaybeUninitSlice is #[repr(transparent)] over IoSliceMut,
// guaranteeing identical layout.
let bufs = unsafe { ... };4. Error swallowing behavior Build System5. Dockerfile ARG expansion Consensus/Security NoteThis PR modifies networking infrastructure only; no consensus-critical code (state transition, fork choice, XMSS verification) is affected. The SummaryThe PR is architecturally correct. Fix the typo (Item 1), address the unused imports (Item 2), and document the unsafe block (Item 3) before merging. The error handling (Item 4) should be verified as intentional for Shadow's requirements. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code Review
No fork-choice, attestation, STF, XMSS, or SSZ logic changed in this PR, so I don’t have consensus-layer findings beyond the transport-level availability risks above. I couldn’t verify with Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewHere is my review: PR Review: feat: Shadow simulator compatibility behind shadow-integration featureOverview: This PR gates Shadow network simulator compatibility behind a Significant Issue:
|
Greptile SummaryThis PR gates Shadow network simulator compatibility behind a
Confidence Score: 3/5The fallback recv path has an unguarded unwrap that can crash the node, and the global quinn-udp patch quietly removes batch UDP throughput from every build on this branch. The fallback recv function calls addr.as_socket().unwrap() with no error handling — if the kernel returns an unexpected address family the node panics rather than logging an error. The global [patch.crates-io] also means anyone building from this branch gets degraded UDP performance without any compile-time or runtime signal. patches/quinn-udp/src/fallback.rs warrants the closest look — it contains the unguarded unwrap and an undocumented unsafe layout cast. Cargo.toml is worth reviewing for the global-patch scope implications.
|
| Filename | Overview |
|---|---|
| patches/quinn-udp/src/fallback.rs | New fallback UDP implementation for Shadow; contains an unguarded addr.as_socket().unwrap() that can panic at runtime, and an unsafe cast with no documented layout-safety justification. |
| patches/quinn-udp/src/lib.rs | Public API surface for the patched quinn-udp crate; minor typo in a lock-panic message, otherwise structurally sound. |
| bin/ethlambda/src/main.rs | Adds feature-gated allocator/runtime changes; compile_error! mutual-exclusion guard, conditional jemalloc, and current_thread tokio flavor for shadow builds — all clean. |
| Cargo.toml | Adds a global [patch.crates-io] for quinn-udp; intentional but silently degrades UDP batch performance on all builds from this branch, not just shadow builds. |
| Dockerfile | Copies patches/ before cargo-chef resolves the graph and adds NO_DEFAULT_FEATURES build arg — correct ordering and flag threading. |
| Makefile | Adds shadow-build and docker-build-shadow targets with correct --no-default-features --features shadow-integration flags. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[cargo build] --> B{Features?}
B -->|default| C[jemalloc allocator\nmulti-thread tokio]
B -->|no-default-features + shadow-integration| D[system allocator\ncurrent_thread tokio]
C --> E[quinn-udp fallback patch\nBATCH_SIZE=1\nno GSO/GRO]
D --> F[quinn-udp fallback patch\nBATCH_SIZE=1\nno GSO/GRO]
E --> G[Production binary\nUDP perf degraded on branch]
F --> H[Shadow-compatible binary]
I[compile_error!] -. blocks .-> J[jemalloc + shadow-integration combined]
Comments Outside Diff (1)
-
Cargo.toml, line 83-90 (link)quinn-udp patch silently degrades production UDP performance on this branch
The
[patch.crates-io]table is global — everycargo build --releaseon this branch (including CI/CD) will use the fallbacksend_to/recv_frompath and getBATCH_SIZE = 1, forfeiting GSO/GRO batch throughput that the upstreamquinn-udpprovides. Anyone who pulls this branch and runs the default build will get a quietly degraded production binary without any warning at build time. The PR description mentions this tradeoff explicitly, but since there is no runtime warning or log message distinguishing a patched build from an upstream one, it could easily go unnoticed in the field.Prompt To Fix With AI
This is a comment left during a code review. Path: Cargo.toml Line: 83-90 Comment: **quinn-udp patch silently degrades production UDP performance on this branch** The `[patch.crates-io]` table is global — every `cargo build --release` on this branch (including CI/CD) will use the fallback `send_to`/`recv_from` path and get `BATCH_SIZE = 1`, forfeiting GSO/GRO batch throughput that the upstream `quinn-udp` provides. Anyone who pulls this branch and runs the default build will get a quietly degraded production binary without any warning at build time. The PR description mentions this tradeoff explicitly, but since there is no runtime warning or log message distinguishing a patched build from an upstream one, it could easily go unnoticed in the field. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 4
patches/quinn-udp/src/fallback.rs:52
**Unguarded `unwrap` on received socket address**
`SockAddr::as_socket()` returns `None` for non-IPv4/IPv6 socket families (e.g. `AF_UNIX`, `AF_NETLINK`, abstract sockets). While UDP sockets should always produce IPv4/IPv6 addresses in practice, the `unwrap()` panics unconditionally if the kernel ever returns an unexpected address family — crashing the node process rather than surfacing an `io::Error`.
### Issue 2 of 4
patches/quinn-udp/src/lib.rs:81
Typo in the lock-poison message: "poisend" should be "poisoned".
```suggestion
let last_send_error = &mut *last_send_error.lock().expect("poisoned lock");
```
### Issue 3 of 4
patches/quinn-udp/src/fallback.rs:44-47
**Unsafe cast missing `// SAFETY:` justification**
The cast from `*mut [IoSliceMut<'_>]` to `*mut [socket2::MaybeUninitSlice<'_>]` is sound only because both types are thin wrappers over `struct iovec` (on Unix) / `WSABUF` (on Windows) and therefore share identical memory layout. This invariant is implicit and not documented; a future Rust or `socket2` version could silently break it. A `// SAFETY:` comment explaining the layout equivalence assumption and citing relevant ABI guarantees would make the invariant auditable.
### Issue 4 of 4
Cargo.toml:83-90
**quinn-udp patch silently degrades production UDP performance on this branch**
The `[patch.crates-io]` table is global — every `cargo build --release` on this branch (including CI/CD) will use the fallback `send_to`/`recv_from` path and get `BATCH_SIZE = 1`, forfeiting GSO/GRO batch throughput that the upstream `quinn-udp` provides. Anyone who pulls this branch and runs the default build will get a quietly degraded production binary without any warning at build time. The PR description mentions this tradeoff explicitly, but since there is no runtime warning or log message distinguishing a patched build from an upstream one, it could easily go unnoticed in the field.
Reviews (1): Last reviewed commit: "feat: Shadow simulator compatibility beh..." | Re-trigger Greptile
| meta[0] = RecvMeta { | ||
| len, | ||
| stride: len, | ||
| addr: addr.as_socket().unwrap(), |
There was a problem hiding this comment.
Unguarded
unwrap on received socket address
SockAddr::as_socket() returns None for non-IPv4/IPv6 socket families (e.g. AF_UNIX, AF_NETLINK, abstract sockets). While UDP sockets should always produce IPv4/IPv6 addresses in practice, the unwrap() panics unconditionally if the kernel ever returns an unexpected address family — crashing the node process rather than surfacing an io::Error.
Prompt To Fix With AI
This is a comment left during a code review.
Path: patches/quinn-udp/src/fallback.rs
Line: 52
Comment:
**Unguarded `unwrap` on received socket address**
`SockAddr::as_socket()` returns `None` for non-IPv4/IPv6 socket families (e.g. `AF_UNIX`, `AF_NETLINK`, abstract sockets). While UDP sockets should always produce IPv4/IPv6 addresses in practice, the `unwrap()` panics unconditionally if the kernel ever returns an unexpected address family — crashing the node process rather than surfacing an `io::Error`.
How can I resolve this? If you propose a fix, please make it concise.| transmit: &Transmit, | ||
| ) { | ||
| let now = Instant::now(); | ||
| let last_send_error = &mut *last_send_error.lock().expect("poisend lock"); |
There was a problem hiding this comment.
Typo in the lock-poison message: "poisend" should be "poisoned".
| let last_send_error = &mut *last_send_error.lock().expect("poisend lock"); | |
| let last_send_error = &mut *last_send_error.lock().expect("poisoned lock"); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: patches/quinn-udp/src/lib.rs
Line: 81
Comment:
Typo in the lock-poison message: "poisend" should be "poisoned".
```suggestion
let last_send_error = &mut *last_send_error.lock().expect("poisoned lock");
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| ) -> io::Result<usize> { | ||
| let bufs = unsafe { | ||
| &mut *(bufs as *mut [IoSliceMut<'_>] as *mut [socket2::MaybeUninitSlice<'_>]) | ||
| }; |
There was a problem hiding this comment.
Unsafe cast missing
// SAFETY: justification
The cast from *mut [IoSliceMut<'_>] to *mut [socket2::MaybeUninitSlice<'_>] is sound only because both types are thin wrappers over struct iovec (on Unix) / WSABUF (on Windows) and therefore share identical memory layout. This invariant is implicit and not documented; a future Rust or socket2 version could silently break it. A // SAFETY: comment explaining the layout equivalence assumption and citing relevant ABI guarantees would make the invariant auditable.
Prompt To Fix With AI
This is a comment left during a code review.
Path: patches/quinn-udp/src/fallback.rs
Line: 44-47
Comment:
**Unsafe cast missing `// SAFETY:` justification**
The cast from `*mut [IoSliceMut<'_>]` to `*mut [socket2::MaybeUninitSlice<'_>]` is sound only because both types are thin wrappers over `struct iovec` (on Unix) / `WSABUF` (on Windows) and therefore share identical memory layout. This invariant is implicit and not documented; a future Rust or `socket2` version could silently break it. A `// SAFETY:` comment explaining the layout equivalence assumption and citing relevant ABI guarantees would make the invariant auditable.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
40f3383 to
387cb1d
Compare
Adds opt-in support for running ethlambda under the Shadow network simulator (for the lean-shadow-fuzzer), gated so the default build, main, and the committed Cargo.lock are unchanged. jemalloc: Jemalloc causes programs to deadlock during process startup under Shadow (shadow/shadow#3763). The `jemalloc` feature (default-on) pulls the optional `tikv-jemallocator` dependency; the Shadow binary builds with `--no-default-features` to drop it entirely. A `compile_error!` catches enabling `shadow-integration` without `--no-default-features` (which would leave jemalloc linked). `jemalloc_pprof` stays: its `PROF_CTL` is `None` without the allocator, so the heap-profiling endpoints degrade gracefully. runtime: Shadow single-steps execution, so a multi-threaded runtime only adds scheduling noise. The `current_thread` flavor is gated behind `shadow-integration`. This is an optimization, not a requirement. quinn-udp: Shadow's UDP emulation lacks GSO/GRO batch syscalls, so quinn-udp must fall back to plain send_to/recv_from (see `shadow/quinn-udp-patch`). A Cargo `[patch]` cannot be feature-gated, so it is kept OUT of the committed manifest and injected at build time by `shadow/build.sh` (and the Dockerfile's SHADOW path). The committed Cargo.lock therefore stays identical to main. Adds `make shadow-build` / `make shadow-docker-build`, the `shadow/` directory (patch crate, injection script, README), plus `SHADOW`, `NO_DEFAULT_FEATURES` and `LOCKED` Docker build args. Ref: kamilsa/ethlambda@ed5a447
387cb1d to
354d29a
Compare
Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
What
Integrates Shadow network simulator compatibility (ref: kamilsa/ethlambda@ed5a447), gated so the default build,
main, and the committedCargo.lockare unchanged. Used for the lean-shadow-fuzzer integration. Seeshadow/README.mdfor full docs.Three Shadow build changes
malloc_confexportunprefixed_mallocinterposes the global Cmallocand self-deadlocks at startup under Shadow (shadow#3763): the shimfopens/proc/self/mapson its first intercepted syscall, callingmalloc, which re-enters jemalloc mid-init while it holds its non-recursive init lockjemallocis a default feature pulling the optionaltikv-jemallocatordep; Shadow build uses--no-default-featuresquinn-udp-> fallback (send_to/recv_from)sendmmsg, segmentation offload)[patch]injected at build time (see below)#[tokio::main]->current_threadflavorcfg(feature = "shadow-integration")jemalloc: the
jemallocfeaturejemallocis on by default and pulls the optionaltikv-jemallocatordependency. Cargo features are additive, so a feature can't subtract a default-on dependency, and merely disabling#[global_allocator]isn't enough (theunprefixed_mallocinterposition is what deadlocks). So the Shadow build drops it with--no-default-features. Acompile_error!catches the footgun of enablingshadow-integrationwhile jemalloc is still linked.quinn-udp: patch injected at build time (not committed)
A Cargo
[patch.crates-io]table cannot be feature-gated (patches are global). To keep normal builds and the lockfile pristine, the patch is not in the committedCargo.toml. Insteadshadow/build.shappends it (fromshadow/cargo-patch.toml) to the manifest, runs the build, and restoresCargo.toml+Cargo.lockafterward. The dormantshadow/quinn-udp-patchcrate lives in the repo but is referenced only during Shadow builds.Building
Verification
cargo check+clippypass for the default build,--no-default-features, and the Shadow config.make shadow-buildand defaultcargo build --release --lockedboth compile;Cargo.toml/Cargo.lockrestored cleanly after the Shadow build.Cargo.lockis byte-identical tomain.compile_error!fires whenshadow-integrationis enabled without--no-default-features.Using system allocator), forms a full QUIC mesh, gossips blocks + attestations, and reaches finality (all nodeshead_slot=14 justified_slot=12 finalized_slot=11) over a 120s simulated run, with no panics.Notes
jemalloc_pprofis retained: without the allocator itsPROF_CTLisNone, so/debug/pprof/allocsdegrades gracefully rather than failing to compile.SHADOW,NO_DEFAULT_FEATURES, andLOCKEDbuild args (Shadow builds inject the patch and build unlocked, since the patch is absent from the committed lockfile).