resource manager trait and impl by elnosh · Pull Request #4409 · lightningdevkit/rust-lightning

elnosh · 2026-02-10T22:44:23Z

Part of #4384

This PR introduces a ResourceManager trait and DefaultResourceManager implementation of that trait which is based on the proposed mitigation in lightning/bolts#1280.

It only covers the standalone implementation of the mitigation. I have done some testing with integrating it into the ChannelManager but that can be done separately. As mentioned in the issue, the resource manager trait defines these 4 methods to be called from the channel manager:

add_channel
remove_channel
add_htlc
resolve_htlc

Integrating into the `ChannelManager`

The ResourceManager is intended to be internal to the ChannelManager rather than users instantiating their own and passing it to a ChannelManager constructor.
add/remove_channel should be called when channels are opened/closed.
add_htlc: When processing HTLCs, the channel manager would call add_htlc which returns a ForwardingOutcome telling it whether to forward or fail the HTLC along with the accountable signal to use in case that it should be forwarded. For the initial "read-only" mode, the channel manager would log the results but not actually fail the HTLC if it was told to do so. A bit more specific on where it would be called: I think it will be when processing the forward_htlcs before we queue the add_htlc to the outgoing channel

rust-lightning/lightning/src/ln/channelmanager.rs

Line 7650 in caf0aac

if let Err((reason, msg)) = optimal_channel.queue_add_htlc(
resolve_htlc: Used to tell back the ResourceManager the resolution of an HTLC. It will be used to release bucket resources and update reputation/revenue values internally.

This could have more tests but opening early to get thoughts on design if possible

cc @carlaKC

ldk-reviews-bot · 2026-02-10T22:44:26Z

👋 Thanks for assigning @carlaKC as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

codecov · 2026-02-11T01:27:46Z

Codecov Report

❌ Patch coverage is 93.77028% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.29%. Comparing base (94d1e5e) to head (41d3ba9).
⚠️ Report is 315 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/ln/resource_manager.rs	93.77%	65 Missing and 31 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4409      +/-   ##
==========================================
+ Coverage   86.03%   86.29%   +0.26%     
==========================================
  Files         156      161       +5     
  Lines      103091   109078    +5987     
  Branches   103091   109078    +5987     
==========================================
+ Hits        88690    94134    +5444     
- Misses      11891    12286     +395     
- Partials     2510     2658     +148

Flag	Coverage Δ
tests	`86.29% <93.77%> (+0.26%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

carlaKC

Really great job on this! Done an overly-specific first review round for something that's in draft because I've taken a look at previous versions of this code before when we wrote simulations. Also haven't looked at the tests in detail yet, but coverage is looking ✨ great ✨ .

I think that taking a look at tracking slot usage in GeneralBucket with a single source of truth is worth taking a look at, seems like it could clean up a few places where we need to two hashmap lookups one after the other.

In the interest of one day fuzzing this, I think it could also use some validation that enforces our protocol assumptions (eg, number of slots <= 483).

lightning/src/ln/resource_manager.rs

ldk-reviews-bot · 2026-02-13T14:07:40Z

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

elnosh · 2026-02-16T22:43:21Z

think I have addressed most of the comments code-wise. Still need to add some requested comments/docs changes.

elnosh · 2026-02-17T18:33:51Z

pushed more fixups addressing requests for adding docs/comments, lmk if those look good

carlaKC · 2026-02-20T06:39:35Z

lightning/src/ln/resource_manager.rs

+	/// Tracks the occupancy of HTLC slots in the bucket.
+	slots_occupied: Vec<bool>,
+
+	/// SCID -> (slots assigned, salt)
+	/// Maps short channel IDs to an array of tuples with the slots that the channel is allowed
+	/// to use and the current usage state for each slot. It also stores the salt used to
+	/// generate the slots for the channel. This is used to deterministically generate the
+	/// slots for each channel on restarts.
+	channels_slots: HashMap<u64, (Vec<(u16, bool)>, [u8; 32])>,


this shouldn't accidentally double-assign them.

Yeah it shouldn't (provided we don't have bugs), but tracking the same information (whether a slot is occupied) in multiple places is a design that allows for inconsistency / the possibility of bugs. If we have a single source of truth, we move from "shouldn't double assign" to "can't double assign".

Gave it a shot here, lmk what you think!

lightning/src/ln/resource_manager.rs

carlaKC · 2026-02-20T08:02:55Z

lightning/src/ln/resource_manager.rs

+		let general_liquidity_allocation =
+			liquidity_allocated * general_slot_allocation as u64 / slots_allocated as u64;


Error if slots_allocated is zero otherwise we'll panic? Could happen with really small max in flight + general not given that much space?

lightning/src/ln/resource_manager.rs

TheBlueMatt · 2026-02-22T23:23:33Z

First of all not sure why all your commit messages are line-wrapped at 40 chars, but you can use like 60 or 70 lol.

TheBlueMatt

A few comments, I think the design is fine, but startup resync may be annoying.

TheBlueMatt · 2026-02-22T23:33:21Z

lightning/src/ln/resource_manager.rs

+	}
+}
+
+/// Tracks an average value over multiple rolling windows to smooth out volatility.


I'm kinda confused by this struct. First of all, the docs here are wrong - we aren't tracking "multiple windows" we're tracking a rolling average over one window of window * window_count. The only difference between this and DecayingAverage is it tries to compensate for if we don't have enough data to actually go back window_count * window. Why shouldn't we just have DecayingAverage do that instead of having a separate struct here?

I think it makes sense to keep separate because the use of DecayingAverage for reputation differs from AggregatedWindowAverage when tracking revenue. For reputation, we want the DecayingAverage over the full window (24 weeks). For revenue, using AggregatedWindowAverage, we track the decaying average over the same window (24 weeks) but divide by window_count because we want the revenue for 2 weeks.

I agree that we want to track two different things here:

Reputation (as DecayingAverage): we want shocks to reflect, so that we can quickly react to a change in attacker behavior

Revenue (as AggregatedWindowAverage): we want to smooth shocks to track our peer's average revenue in two weeks over a window_count periods.

But ran some numbers and it does look like we're penalizing old data a bit too much with this approach, as mentioned below.

TheBlueMatt · 2026-02-22T23:35:03Z

lightning/src/ln/resource_manager.rs

+struct DecayingAverage {
+	value: i64,
+	last_updated_unix_secs: u64,
+	window: Duration,


You don't actually use window (only decay_rate) so we can drop it here.

yeah, I was keeping window for the decay_rate when reading back here https://github.com/elnosh/rust-lightning/blob/90943195bee498f34247f65a68a8511d57997aae/lightning/src/ln/resource_manager.rs#L1042-L1045

Seems okay to me to just write the decay_rate directly. We'd only need the window if we wanted to change the way that we calculate it, and that seems unlikely?

TheBlueMatt · 2026-02-22T23:50:05Z

lightning/src/ln/resource_manager.rs

+		// We are not concerned with the rounding precision loss for this value because it is
+		// negligible when dealing with a long rolling average.
+		Ok((self.aggregated_revenue_decaying.value_at_timestamp(timestamp_unix_secs)? as f64
+			/ window_divisor)


I don't buy this? Let's say our windows_tracked is 4 and we have some data for the last 3 windows. On average, those 3 windows worth of data data will have been multiplied by 0.62175 (https://www.wolframalpha.com/input?i=%28integral+from+0+to+3+%280.5+%5E+0.5%29+%5E+x%29+%2F+3) but then we divide it by three. Whereas if we only have data for a single-window, that data will multiplied by, on average, 0.845111 (https://www.wolframalpha.com/input?i=%28integral+from+0+to+1+%280.5+%5E+0.5%29+%5E+x%29+%2F+1), and then we'll divide by one. We have to factor in the decrease in the data from the decay as well as just the increased amount of data here.

lightning/src/ln/resource_manager.rs

TheBlueMatt · 2026-02-23T00:05:32Z

lightning/src/ln/resource_manager.rs

+	/// Tracks the occupancy of HTLC slots in the bucket.
+	slots_occupied: Vec<bool>,
+
+	/// SCID -> (slots assigned, salt)
+	/// Maps short channel IDs to an array of tuples with the slots that the channel is allowed
+	/// to use and the current usage state for each slot. It also stores the salt used to
+	/// generate the slots for the channel. This is used to deterministically generate the
+	/// slots for each channel on restarts.
+	channels_slots: HashMap<u64, (Vec<(u16, bool)>, [u8; 32])>,


Does the protection algorithm break if slots are allocated probabilistically? We could reduce implementation complexity a good bit if we just drop channel_slots entirely and generate the list of slots the channel can occupy any time we need it and allow two channels to occupy the same slot (presumably leading to some extra HTLC failures in that case?). This feels very much like a bloom filter problem where we should be able to reduce FPs somehow, though maybe it isn't quite the same because we actually do want conflicts to be "common".

TheBlueMatt · 2026-02-23T00:45:13Z

lightning/src/ln/resource_manager.rs

+	}
+}
+
+impl Readable for DefaultResourceManager {


Hmmmmmmmmmmmmmmmmmmm. Reconciliation on startup is gonna be tricky here. What happens if we accept an HTLC then restart and actually it never made it to disk in the ChannelMonitor? Theoretically this can be persisted as a part of ChannelManager and it should be consistent-ish, but Val is hard at work making it so that we don't have to persist ChannelManager at all.

Instead, I wonder how easy we can make it to rebuild this from HTLC information. It would require some additional integration into "LDK core" but hopefully not much. If we have some HTLCSlotUsage struct that we return from add_htlc in the ForwardingOutcome::Forward case, we could presumably shove that into the HTLCSource (as the lots are "on" the inbound channel) and rebuild the resource manager very cheaply.

What happens if we accept an HTLC then restart and actually it never made it to disk in the ChannelMonitor? Theoretically this can be persisted as a part of ChannelManager and it should be consistent-ish, but Val is hard at work making it so that we don't have to persist ChannelManager at all.

hmmmm yeah I thought about that but was operating under the assumption that by persisting along with the ChannelManager it should stay consistent.

In a world where we don't persist the ChannelManager I was exploring your suggestion to rebuild the resource manager from HTLC data we have on startup and came up with the approach here: elnosh@cdd0bf8 With some caveats, I think we can replay HTLCs by calling add_htlc on the ResourceManager so we would only need general HTLC information and no need to shove bucket/resourcemanager specific information into HTLCSource. We would basically need this HTLC info on startup. I added 2 helper methods in channel.rs and the replay on the ChannelManager could look like this https://github.com/elnosh/rust-lightning/blob/cdd0bf80cb200d370995c4f859645c0a54b3a798/lightning/src/ln/channelmanager.rs#L19303-L19366

With this, I was able to restart a node with pending HTLCs and replayed them fine in the resource manager using Channel data. The only field I would need to add to HTLCSource is incoming_accountable

The caveat is that reputation and in-flight-risk when replaying the HTLCs might be somewhat (slightly) different if the shutdown time was long because the current timestamp is different.

Another approach would be to store the specific bucket usage in the HTLCSource so we replay HTLCs and add them directly to the bucket they were before shutdown. I went with previous approach mentioned since I think that will be less intrusive in the channel manager and would require less resourcemanager-specific information to leak into the channel manager. Let me know what you think

My only question there is what the performance cost is. If we have 500 channels and have to replay a hundred HTLCs per channel how bad does it get?

I'd have to run it but, indeed, it is not optimal because for each outbound HTLC in each channel it needs to lookup the inbound htlc on the incoming channel. It could store the missing fields in the HTLCSource as well to avoid the inbound htlc lookup.

did alternative approach in 9094319

Implements a decaying average over a rolling window. It will be used in upcoming commits by the resource manager to track reputation and revenue of channels.

elnosh · 2026-03-03T14:02:20Z

I have pushed changes for majority of comments from last round - diff here.

The most notable things are:

added a PendingHTLCReplay to be passed from upstream by the ChannelManager to replay pending HTLCs on startup instead of writing them in the ResourceManager
Do not double-track HTLC slot occupancy in general bucket and only track them in slots_occupied.
Use ChaCha instead of sha256 for slot generation in general bucket
Added more test cases

ldk-reviews-bot · 2026-03-05T14:41:06Z

🔔 1st Reminder

Hey @carlaKC! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-03-07T14:41:45Z

🔔 2nd Reminder

Hey @carlaKC! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-03-09T14:42:45Z

🔔 3rd Reminder

Hey @carlaKC! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

carlaKC

Didn't review tests yet, main comment is about how we handle replays on restart (+ saving needing to persist a few things).

carlaKC · 2026-03-09T12:58:12Z

lightning/src/ln/resource_manager.rs

+struct DecayingAverage {
+	value: i64,
+	last_updated_unix_secs: u64,
+	window: Duration,


Seems okay to me to just write the decay_rate directly. We'd only need the window if we wanted to change the way that we calculate it, and that seems unlikely?

carlaKC · 2026-03-09T14:13:02Z

lightning/src/ln/resource_manager.rs

+	}
+}
+
+/// Tracks an average value over multiple rolling windows to smooth out volatility.


I agree that we want to track two different things here:

Reputation (as DecayingAverage): we want shocks to reflect, so that we can quickly react to a change in attacker behavior

Revenue (as AggregatedWindowAverage): we want to smooth shocks to track our peer's average revenue in two weeks over a window_count periods.

But ran some numbers and it does look like we're penalizing old data a bit too much with this approach, as mentioned below.

carlaKC · 2026-03-09T14:55:11Z

lightning/src/ln/resource_manager.rs

+			// TODO: could return the slots already assigned instead of erroring.
+			Entry::Occupied(_) => Err(()),


Meant that assign_slots_for_channel doesn't need &self at all - we can just pass in our_scid + per_channel_slots, return the slots/salt we're adding and then have the caller be responsible for adding these values to self.channel_slots.

Saves us a double lookup because we're looking up the entry in the caller (to see if we need to assign_slots_for_channel and looking up again here).

carlaKC · 2026-03-10T12:15:32Z

lightning/src/ln/resource_manager.rs

+		self.slots_used += 1;
+		self.liquidity_used += htlc_amount_msat;


nit: debug_assert that we never go over our _allocated values?

isn't that caught by the resources_available check above this?

lightning/src/ln/resource_manager.rs

carlaKC · 2026-03-10T13:01:37Z

lightning/src/ln/resource_manager.rs

+		let below_liquidity_limit = htlc_amount_msat
+			<= self.congestion_bucket.liquidity_allocated
+				/ self.congestion_bucket.slots_allocated as u64;


Is it possible that congestion_bucket.slots_allocated could be zero?

I think this and #4409 (comment) are highly unlikely given that the bucket percentages can't be set by the user. Zero slots allocated in any bucket could happen with very low max_accepted_htlcs but at that point it means that number is so low that the resource manager as a whole will be basically unusable? Should we have a generic error to catch this in Channel::new?

carlaKC · 2026-03-10T13:46:02Z

lightning/src/ln/resource_manager.rs

+				.map_err(|_| DecodeError::InvalidValue)?,
+			);
+		}
+		Ok(forwarding_outcomes)


Discussed offline: it seems possible that we end up in a state where we replay a pending htlc (which we know we previously forwarded) and get a ForwardingOutcome::Fail because the decaying averages we're tracking have changed since we first called add_htlc.

This will be fine in "read only" mode, but once we're running for real we'll need a way to handle this (because we won't have HTLCs in our internal state, and resolve_htlc will be called for an unknown htlc).

Seems like a reasonable solution to track these in a failed_replays map and have some special case handling for them. A really nice side effect of this is that we no longer need to persist GeneralBucket + ResourceManagerConfig (previously, we had the persist these to make sure we have the same bucket sizes so that everything we replay will fit in them - but if we accept that this isn't the world we live in already, then this issue is already handled).

Practically in terms of our defense, failing to re-add htlcs to our state means that we're a bit more forgiving on restarts (since each failed_replay isn't in our internal state). Since we're just shooting for readonly now, seems reasonable to live with this (log it to understand how much it happens) and improve as necessary.

The AggregatedWindowAverage implemented here will be used in upcoming commits to track the incoming revenue that channels have generated through HTLC forwards.

Resources available in the channel will be divided into general, congestion and protected resources. Here we implement the general bucket with basic denial of service protections. Co-authored-by: Carla Kirk-Cohen <kirkcohenc@gmail.com>

Resources available in the channel will be divided into general, congestion and protected resources. Here we implement the bucket resources that will be used for congestion and protected.

The Channel struct introduced here has the core information that will be used by the resource manager to make forwarding decisions on HTLCs: - Reputation that this channel has accrued as an outgoing link in HTLC forwards. - Revenue (forwarding fees) that the channel has earned us as an incoming link. - Pending HTLCs this channel is currently holding as an outgoing link. - Bucket resources that are currently in use in general, congestion and protected.

ldk-claude-review-bot · 2026-03-23T17:34:32Z

lightning/src/ln/resource_manager.rs

+			u8::max(5, u8::try_from((slots_allocated * 5).div_ceil(100)).unwrap());
+
+		let general_liquidity_allocation =
+			liquidity_allocated * general_slot_allocation as u64 / slots_allocated as u64;


Bug: Division by zero when slots_allocated is 0.

With small max_accepted_htlcs values (e.g., 1 or 2), bucket_allocations computes general_slots = max_accepted_htlcs * 40 / 100 = 0. This feeds into GeneralBucket::new(scid, 0, 0), which:

Sets per_channel_slots = max(5, 0) = 5 (5 slots per channel, but 0 total slots)

Hits division by zero here: liquidity_allocated * 5 / 0

Similarly, assign_slots_for_channel would do % self.total_slots → % 0.

You need either a minimum max_accepted_htlcs validation in Channel::new (e.g., require that each bucket gets at least 1 slot), or handle the zero case in GeneralBucket::new.

ldk-claude-review-bot · 2026-03-23T17:34:35Z

lightning/src/ln/resource_manager.rs

+		let below_liquidity_limit = htlc_amount_msat
+			<= self.congestion_bucket.liquidity_allocated
+				/ self.congestion_bucket.slots_allocated as u64;


Bug: Division by zero when congestion_bucket.slots_allocated is 0.

With max_accepted_htlcs of 3 or 4, congestion_slots = max_accepted_htlcs * 20 / 100 = 0. All three sub-expressions are eagerly evaluated (no short-circuit), so even when congestion_resources_available is false, this division still executes and panics.

Suggested change

let below_liquidity_limit = htlc_amount_msat

<= self.congestion_bucket.liquidity_allocated

/ self.congestion_bucket.slots_allocated as u64;

let below_liquidity_limit = self.congestion_bucket.slots_allocated > 0

&& htlc_amount_msat

<= self.congestion_bucket.liquidity_allocated

/ self.congestion_bucket.slots_allocated as u64;

ldk-claude-review-bot · 2026-03-23T17:34:39Z

lightning/src/ln/resource_manager.rs

+				let salt = salt.unwrap_or(entropy_source.get_secure_random_bytes());
+
+				let mut nonce = [0u8; 12];
+				nonce[..4].copy_from_slice(&self.scid.to_be_bytes()[..4]);


Bug: Nonce only uses the upper 4 bytes of self.scid, discarding the lower 4 bytes. Since Lightning SCIDs encode (block_height: 3 bytes, tx_index: 3 bytes, output_index: 2 bytes), the upper 4 bytes cover the block height + partial tx index. Two channels opened in the same block with different tx/output indices could share the same upper 4 bytes, producing identical nonces and thus identical slot assignments.

Consider using a different nonce construction that incorporates all 8 bytes of both SCIDs, e.g. XORing or hashing them into the 12-byte nonce.

ldk-claude-review-bot · 2026-03-23T17:34:43Z

lightning/src/ln/resource_manager.rs

+		);
+		outgoing_channel.outgoing_reputation.add_value(effective_fee, resolved_at)?;
+
+		let incoming_channel = channels_lock.get_mut(&incoming_channel_id).ok_or(())?;


Bug: Partial state corruption on error. If incoming_channel_id is not found in the map (e.g., the channel was removed while HTLCs were in flight), this returns Err(()) but outgoing_channel.outgoing_reputation was already updated on line 914. The HTLC also remains in outgoing_channel.pending_htlcs (line 941 was not reached), so any retry will update reputation again.

Consider either: (1) looking up both channels before mutating any state, or (2) removing the pending HTLC before returning the error so that at worst only reputation is slightly off, not permanently stuck.

ldk-claude-review-bot · 2026-03-23T17:34:48Z

lightning/src/ln/resource_manager.rs

+	pub fn remove_channel(&self, channel_id: u64) -> Result<(), ()> {
+		let mut channels_lock = self.channels.lock().unwrap();
+		channels_lock.remove(&channel_id);
+
+		// Remove slots assigned to channel being removed across all other channels.
+		for (_, channel) in channels_lock.iter_mut() {
+			channel.general_bucket.remove_channel_slots(channel_id);
+		}
+		Ok(())


Bug: remove_channel does not clean up pending HTLCs referencing the removed channel.

If an outgoing channel is removed, all PendingHTLC entries stored in it are silently lost. The incoming channels' bucket resources (general/congestion/protected) for those HTLCs are permanently leaked — resolve_htlc will fail to find the outgoing channel and return Err without releasing bucket resources.

If an incoming channel is removed, resolve_htlc will fail at the incoming channel lookup (line 916) after already updating outgoing reputation (line 914), as noted separately.

Consider iterating over other channels' pending_htlcs to release bucket resources for any HTLCs involving the removed channel.

ldk-claude-review-bot · 2026-03-23T17:34:52Z

lightning/src/ln/resource_manager.rs

+		let mut failed_replays = self.failed_replays.lock().unwrap();
+		if failed_replays.remove(&htlc_ref) {
+			return Ok(());
+		}


Issue: failed_replays lock is held for the entire function body, including all the channel operations below. Since resolve_htlc is expected to be called frequently, this serializes all resolve calls through the failed_replays mutex even after startup replay is done and failed_replays is empty.

Consider dropping the lock after the if check:

let htlc_ref = HtlcRef { incoming_channel_id, htlc_id }; { let mut failed_replays = self.failed_replays.lock().unwrap(); if failed_replays.remove(&htlc_ref) { return Ok(()); } } // failed_replays lock released here let mut channels_lock = self.channels.lock().unwrap();

ldk-claude-review-bot · 2026-03-23T17:34:57Z

lightning/src/ln/resource_manager.rs

+		let revenue_window_weeks_avg = self.config.revenue_window.as_secs() / (60 * 60 * 24 * 7);
+		match channels_lock.entry(channel_id) {
+			Entry::Vacant(entry) => {
+				let channel = Channel::new(
+					channel_id,
+					max_htlc_value_in_flight_msat,
+					max_accepted_htlcs,
+					self.config.general_allocation_pct,
+					self.config.congestion_allocation_pct,
+					self.config.revenue_window * self.config.reputation_multiplier.into(),
+					self.config.reputation_multiplier,
+					revenue_window_weeks_avg as u8,


Bug: Silent truncation. revenue_window_weeks_avg is u64 but is cast to u8 on line 757. If someone configures a revenue_window larger than ~255 weeks (~4.9 years), this silently wraps (e.g., 500 weeks → 244). This would produce incorrect AggregatedWindowAverage behavior.

Should either validate the range or use a wider type for revenue_week_avg in Channel::new / AggregatedWindowAverage::new.

ldk-claude-review-bot · 2026-03-23T17:35:00Z

lightning/src/ln/resource_manager.rs

+impl DefaultResourceManager {
+	pub fn add_channel(


Note: DefaultResourceManager has methods matching the ResourceManager trait signature but does not actually implement the trait (impl ResourceManager for DefaultResourceManager). This means the trait can't be used polymorphically. If this is intentional (trait will be implemented when integrating into ChannelManager), a TODO comment would help future readers. Otherwise, consider adding the trait impl.

ldk-claude-review-bot · 2026-03-23T17:35:04Z

lightning/src/ln/resource_manager.rs

+				let mut prng = ChaCha20::new(&salt, &nonce);
+				let mut buf = [0u8; 4];
+
+				let max_attempts = self.per_channel_slots * 2;


Bug: max_attempts = self.per_channel_slots * 2 can be insufficient when total_slots is close to per_channel_slots. For example, with total_slots = 8 and per_channel_slots = 5, each random draw has a 5/8 chance of hitting an already-assigned slot. With only 10 attempts, there's a meaningful probability of failing to find 5 unique slots. The birthday-problem collision rate increases significantly with higher per_channel_slots / total_slots ratios.

Consider using a higher multiplier (e.g., * 10) or a different approach (Fisher-Yates shuffle / reservoir sampling) for deterministic completion.

ldk-claude-review-bot · 2026-03-23T17:35:37Z

Re-review Summary for PR #4409 (Pass 3)

Three new inline comments posted on issues not covered by prior reviews:

New inline comments

resource_manager.rs:595-604 — Channel::read deserialization lacks validation of max_accepted_htlcs and max_htlc_value_in_flight_msat, making all division-by-zero bugs reachable from corrupted persistence data
resource_manager.rs:166 — Division by zero when per_slot_msat = 0 (distinct from the slots_allocated = 0 case at line 149; occurs when liquidity_allocated = 0 but slots_allocated > 0)
resource_manager.rs:1039-1047 — DecayingAverage::value_at_timestamp acts as a one-way timestamp ratchet; a single call with a future timestamp (clock skew, NTP step) permanently blocks all HTLC forwarding through the affected channel until the wall clock catches up

Prior inline comments (still open, 11 total)

resource_manager.rs:149 — Division by zero when slots_allocated is 0
resource_manager.rs:522 — Division by zero when congestion_bucket.slots_allocated is 0
resource_manager.rs:258 — Nonce only uses upper 4 of 8 bytes of incoming SCID
resource_manager.rs:894 — Partial state corruption in resolve_htlc (reputation updated before incoming channel lookup)
resource_manager.rs:736 — remove_channel doesn't clean up pending HTLCs (resource leak)
resource_manager.rs:877 — failed_replays lock held unnecessarily long in resolve_htlc
resource_manager.rs:716 — Silent u8 truncation of revenue_window_weeks_avg
resource_manager.rs:700 — DefaultResourceManager doesn't implement the ResourceManager trait
resource_manager.rs:263 — max_attempts may be insufficient for slot assignment
resource_manager.rs:1103 — Integer division by zero when avg_weeks is 0
resource_manager.rs:661 — u32 overflow in htlc_in_flight_risk

Cross-cutting concerns

Division-by-zero family (6 instances): Lines 149, 166, 522, 1103, and now two deserialization paths via Channel::read. All stem from missing minimum-value validation. A comprehensive validation pass in Channel::new, Channel::read, GeneralBucket::new, and AggregatedWindowAverage::new ensuring all denominators are non-zero would fix the entire family.
No runtime check that incoming_channel_id != outgoing_channel_id in add_htlc — only guarded by a debug_assert in assign_slots_for_channel which is stripped in release builds.
Clock-skew sensitivity: The DecayingAverage timestamp validation creates a one-way ratchet. Combined with the mutable-on-read pattern (value_at_timestamp updates last_updated_unix_secs), any non-monotonic timestamp permanently poisons the channel state. This is a systemic fragility across all reputation and revenue tracking.

Introduces the DefaultResourceManager struct. The core of methods that will be used to inform the HTLC forward decisions are add/resolve_htlc. - add_htlc: Based on resource availability and reputation, it evaluates whehther to forward or fail the HTLC. - resolve_htlc: Releases the bucket resources used from a HTLC previously added and updates the channel's reputation based on HTLC fees and resolution times.

Adds write and read implementations to persist the DefaultResourceManager.

ldk-claude-review-bot · 2026-03-23T17:54:24Z

lightning/src/ln/resource_manager.rs

+			return Err(());
+		}
+
+		let num_windows = (self.window_weeks / self.avg_weeks) as f64;


Bug: Integer division by zero panic when avg_weeks is 0.

self.window_weeks / self.avg_weeks is integer division and will panic if avg_weeks == 0. This happens in two ways:

Config-driven: In add_channel (line 705), revenue_window_weeks_avg = revenue_window.as_secs() / 604800. If someone configures revenue_window to less than 1 week, this produces 0, which is cast to u8 and passed here as avg_weeks.

Deserialization-driven: avg_weeks is read directly from persisted data (line 1117). A corrupted or tampered persistence file with avg_weeks = 0 will crash the node on the first value_at_timestamp call (triggered by any add_htlc → sufficient_reputation path).

Consider adding a validation check in AggregatedWindowAverage::new (and/or a deserialization guard) that avg_weeks > 0.

ldk-claude-review-bot · 2026-03-23T17:54:40Z

lightning/src/ln/resource_manager.rs

+	// To calculate the risk of pending HTLCs, we assume they will resolve in the worst
+	// possible case. Here we assume block times of 10 minutes.
+	fn htlc_in_flight_risk(&self, fee: u64, incoming_cltv_expiry: u32, height_added: u32) -> u64 {
+		let maximum_hold_time = (incoming_cltv_expiry.saturating_sub(height_added)) * 10 * 60;


Bug: u32 overflow in (incoming_cltv_expiry.saturating_sub(height_added)) * 10 * 60.

While saturating_sub prevents underflow, the subsequent * 10 * 60 is plain u32 multiplication. For CLTV deltas above ~7.1M blocks (u32::MAX / 600), this wraps silently in release mode, producing an incorrect (far too small) maximum_hold_time. This underestimates the in-flight risk, potentially allowing HTLCs through the protected bucket that should have been rejected.

While upstream validation should bound CLTV deltas to ~2016 blocks in practice, this function doesn't enforce that itself. Consider using u64 for the computation:

Suggested change

let maximum_hold_time = (incoming_cltv_expiry.saturating_sub(height_added)) * 10 * 60;

let cltv_delta = incoming_cltv_expiry.saturating_sub(height_added) as u64;

let maximum_hold_time = cltv_delta * 10 * 60;

elnosh · 2026-03-23T17:59:12Z

pushed changes addressing comments from last review. Changes to point out are to DecayingAverage and AggregatedWindowAverage addressing this comment #4409 (comment) I have added the test test_aggregated_window_average showing how the aggregate average approximates the real value.

ldk-claude-review-bot · 2026-03-23T18:04:11Z

lightning/src/ln/resource_manager.rs

+		let max_htlc_value_in_flight_msat: u64 = max_htlc_value_in_flight_msat.0.unwrap();
+		let max_accepted_htlcs: u16 = max_accepted_htlcs.0.unwrap();
+		let general_bucket_data: GeneralBucketData = general_bucket_data.0.unwrap();
+
+		let alloc = bucket_allocations(
+			max_accepted_htlcs,
+			max_htlc_value_in_flight_msat,
+			config.general_allocation_pct,
+			config.congestion_allocation_pct,
+		);


Bug: Channel::read doesn't validate deserialized max_accepted_htlcs and max_htlc_value_in_flight_msat before passing them to bucket_allocations and GeneralBucket::new. Unlike Channel::new (which validates max_accepted_htlcs <= 483 and the in-flight value), this deserialization path accepts any values.

Corrupted or tampered persistence data can crash the node during startup:

max_accepted_htlcs = 0 → general_slots = 0 → division by zero in GeneralBucket::new (line 149)

max_htlc_value_in_flight_msat = 0 with max_accepted_htlcs > 0 → per_slot_msat = 0 → division by zero in slots_for_amount (line 166)

Consider adding the same validation checks from Channel::new here, returning DecodeError::InvalidValue for out-of-range values.

ldk-claude-review-bot · 2026-03-23T18:04:19Z

lightning/src/ln/resource_manager.rs

+	fn slots_for_amount<ES: EntropySource>(
+		&mut self, outgoing_scid: u64, htlc_amount_msat: u64, entropy_source: &ES,
+	) -> Result<Option<Vec<u16>>, ()> {
+		let slots_needed = u64::max(1, htlc_amount_msat.div_ceil(self.per_slot_msat));


Bug: Division by zero when per_slot_msat = 0. div_ceil(0) panics.

This is a different path from the slots_allocated = 0 division-by-zero I flagged at line 149. This one occurs when slots_allocated > 0 but liquidity_allocated = 0, making per_slot_msat = 0. For example: max_htlc_value_in_flight_msat = 2, general_pct = 40 → general_liquidity = 2 * 40 / 100 = 0, but general_slots = max_accepted_htlcs * 40 / 100 can be > 0 for max_accepted_htlcs >= 3.

Also reachable via corrupted persistence data (see the deserialization comment on Channel::read).

ldk-claude-review-bot · 2026-03-23T18:04:31Z

lightning/src/ln/resource_manager.rs

+	fn value_at_timestamp(&mut self, timestamp_unix_secs: u64) -> Result<i64, ()> {
+		if timestamp_unix_secs < self.last_updated_unix_secs {
+			return Err(());
+		}
+
+		let elapsed_secs = (timestamp_unix_secs - self.last_updated_unix_secs) as f64;
+		let decay_rate = 0.5_f64.powf(elapsed_secs / self.half_life);
+		self.value = (self.value as f64 * decay_rate).round() as i64;
+		self.last_updated_unix_secs = timestamp_unix_secs;


Issue: value_at_timestamp mutates last_updated_unix_secs on every call, making it a one-way ratchet. If any caller passes a timestamp slightly ahead of the current time (e.g., due to clock skew, NTP adjustment, or VM migration), all subsequent calls with the "correct" time will return Err(()) until the wall clock catches up.

In resolve_htlc, this manifests as: one call with resolved_at slightly in the future poisons the channel's outgoing_reputation.last_updated_unix_secs. All subsequent add_htlc calls fail (line 768 calls value_at_timestamp(added_at) which returns Err), effectively blocking all HTLC forwarding through this outgoing channel until the system clock reaches the poisoned timestamp.

Consider clamping to max(timestamp, last_updated) instead of returning Err, or using monotonic timestamps internally. Alternatively, at minimum, document that callers must guarantee strictly non-decreasing timestamps to avoid bricking a channel's forwarding.

elnosh mentioned this pull request Feb 11, 2026

Proposed design for channel jamming mitigation #4384

Open

carlaKC self-requested a review February 11, 2026 07:04

carlaKC reviewed Feb 13, 2026

View reviewed changes

elnosh force-pushed the resource-mgr branch from 53a93ed to 056a65e Compare February 16, 2026 21:55

elnosh force-pushed the resource-mgr branch from 056a65e to 8fd724a Compare February 17, 2026 18:25

jkczyz assigned elnosh Feb 19, 2026

jkczyz added this to Weekly Goals Feb 19, 2026

carlaKC reviewed Feb 20, 2026

View reviewed changes

lightning/src/ln/resource_manager.rs Outdated Show resolved Hide resolved

TheBlueMatt reviewed Feb 23, 2026

View reviewed changes

Add DecayingAverage to resource manager

99b104d

Implements a decaying average over a rolling window. It will be used in upcoming commits by the resource manager to track reputation and revenue of channels.

elnosh force-pushed the resource-mgr branch from 8fd724a to 9094319 Compare March 3, 2026 13:50

elnosh marked this pull request as ready for review March 3, 2026 14:38

ldk-reviews-bot requested a review from valentinewallace March 3, 2026 14:38

valentinewallace requested review from carlaKC and removed request for valentinewallace March 3, 2026 14:40

elnosh force-pushed the resource-mgr branch from 9094319 to 36a79b7 Compare March 6, 2026 18:30

elnosh mentioned this pull request Mar 6, 2026

Integrate resource manager into ChannelManager #4468

Draft

carlaKC reviewed Mar 10, 2026

View reviewed changes

elnosh and others added 7 commits March 23, 2026 11:38

Add AggregatedWindowAverage to resource manager

09c93fe

The AggregatedWindowAverage implemented here will be used in upcoming commits to track the incoming revenue that channels have generated through HTLC forwards.

f - change aggregated window avg

b6f8893

Add GeneralBucket to resource manager

6912356

Resources available in the channel will be divided into general, congestion and protected resources. Here we implement the general bucket with basic denial of service protections. Co-authored-by: Carla Kirk-Cohen <kirkcohenc@gmail.com>

f - do not error if slots already assigned

0b75e6a

Add BucketResources to resource manager

0a6b6a8

Resources available in the channel will be divided into general, congestion and protected resources. Here we implement the bucket resources that will be used for congestion and protected.

f - fix comment and remove unnecessary channel id

de0a929

elnosh force-pushed the resource-mgr branch from 36a79b7 to 34ecfd4 Compare March 23, 2026 17:28

ldk-claude-review-bot reviewed Mar 23, 2026

View reviewed changes

elnosh force-pushed the resource-mgr branch from 34ecfd4 to 1b239db Compare March 23, 2026 17:45

elnosh added 3 commits March 23, 2026 13:49

f - remove htlc after all error checks

ec62220

impl write and read for resource manager

fa0127e

Adds write and read implementations to persist the DefaultResourceManager.

ldk-claude-review-bot reviewed Mar 23, 2026

View reviewed changes

f - stored failed replays and do not store config

41d3ba9

elnosh force-pushed the resource-mgr branch from 1b239db to 41d3ba9 Compare March 23, 2026 17:55

ldk-claude-review-bot reviewed Mar 23, 2026

View reviewed changes

		let general_liquidity_allocation =
		liquidity_allocated * general_slot_allocation as u64 / slots_allocated as u64;

		// TODO: could return the slots already assigned instead of erroring.
		Entry::Occupied(_) => Err(()),

		self.slots_used += 1;
		self.liquidity_used += htlc_amount_msat;

	let maximum_hold_time = (incoming_cltv_expiry.saturating_sub(height_added)) * 10 * 60;
	let cltv_delta = incoming_cltv_expiry.saturating_sub(height_added) as u64;
	let maximum_hold_time = cltv_delta * 10 * 60;

Conversation

elnosh commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integrating into the ChannelManager

Uh oh!

ldk-reviews-bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

carlaKC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ldk-reviews-bot commented Feb 13, 2026

Uh oh!

elnosh commented Feb 16, 2026

Uh oh!

elnosh commented Feb 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TheBlueMatt commented Feb 22, 2026

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elnosh Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elnosh commented Mar 3, 2026

Uh oh!

ldk-reviews-bot commented Mar 5, 2026

elnosh commented Feb 10, 2026 •

edited

Loading

Integrating into the `ChannelManager`

ldk-reviews-bot commented Feb 10, 2026 •

edited

Loading

codecov bot commented Feb 11, 2026 •

edited

Loading

elnosh Feb 25, 2026 •

edited

Loading