Skip to content

fix: improve packet scheduling for not-validated paths#444

Merged
flub merged 49 commits intomainfrom
flub/packet-scheduling-1
Mar 19, 2026
Merged

fix: improve packet scheduling for not-validated paths#444
flub merged 49 commits intomainfrom
flub/packet-scheduling-1

Conversation

@flub
Copy link
Copy Markdown
Collaborator

@flub flub commented Feb 20, 2026

Description

This improves the packet scheduling for non-validated paths. In particular the path-exclusive-only measure was wrong. We need to be aware of both frames that need to be sent on a path, and the validation status of the path to decide if data frames can be carried.

This isn't finished business, there will be more tweaks to packet scheduling. But it is the next decent step towards improving this and fixes a few bugs.

Breaking Changes

n/a

Notes & open questions

  • I've split off as much as possible from this, so this should only be the relevant changes.
  • I've spent a great deal of time trying to make the logic the most comprehensible, both in how they are described as in code layout. E.g. this is by far not the most compact way to express the logic. So please pay attention to how readable you find the logic and do not hesitate to nitpick on how to express it. You might have better ideas than me!

so let's check in totally broken code.
@flub flub changed the title --- fix: improve packet scheduling for not-validated paths Feb 20, 2026
@flub flub marked this pull request as draft February 20, 2026 18:14
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 20, 2026

Performance Comparison Report

98efdff3f249e69dee299311b4b3b82fb0adeee7 - artifacts

No results available

---
5de87e12310b73113f22a08f51f8e173c65aaa10 - artifacts

No results available

---
ff11b1c69b15c27f90963169076e9d868dbd2b9d - artifacts

No results available

---
cec40b266637af7778ae5095e77591f986d6980e - artifacts

No results available

---
cb9e52aa685a7606f179a832f5f143274b59fc73 - artifacts

No results available

---
e9832764982048515e9d70cde799c2ff72068814 - artifacts

No results available

---
580e0445f5cd1b35963c50c49df51322c9977498 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single N/A 7935.4 Mbps N/A N/A
medium-concurrent N/A 7741.4 Mbps N/A N/A
medium-single N/A 4749.1 Mbps N/A N/A
small-concurrent N/A 5259.7 Mbps N/A N/A
small-single N/A 4709.0 Mbps N/A N/A

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal N/A 4008.8 Mbps N/A
lan N/A 810.4 Mbps N/A
lossy N/A 55.9 Mbps N/A
wan N/A 83.8 Mbps N/A
---
b67d78083f447c69f2dc68dcc918955984ec5b2f - artifacts

No results available

---
c3eed892cf6ab907aeb98ecea15d44d7548c6654 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single N/A 7871.6 Mbps N/A N/A
medium-concurrent N/A 7738.9 Mbps N/A N/A
medium-single N/A 4612.9 Mbps N/A N/A
small-concurrent N/A 5073.2 Mbps N/A N/A
small-single N/A 4709.9 Mbps N/A N/A

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 748.4 Mbps 3980.5 Mbps -81.2%
lan 572.7 Mbps 800.6 Mbps -28.5%
lossy 69.9 Mbps 69.9 Mbps ~0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 70.1% slower on average

---
453fa89b88ef3d18fb15da390fd464e916bc5d1f - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single N/A 7916.7 Mbps N/A N/A
medium-concurrent N/A 7605.0 Mbps N/A N/A
medium-single N/A 4749.1 Mbps N/A N/A
small-concurrent N/A 5384.7 Mbps N/A N/A
small-single N/A 4844.6 Mbps N/A N/A

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 739.2 Mbps 3951.3 Mbps -81.3%
lan 572.7 Mbps 810.3 Mbps -29.3%
lossy 55.9 Mbps 55.9 Mbps ~0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 70.4% slower on average

---
23241b1cbef620696deb6eed31e88c2875ade235 - artifacts

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal N/A 3917.3 Mbps N/A
---
540afe0910af81463bea1dce451c719ab46e5cf2 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5435.8 Mbps 7740.6 Mbps -29.8% 91.3% / 97.5%
medium-concurrent 5287.0 Mbps 7859.0 Mbps -32.7% 88.8% / 96.3%
medium-single 3713.8 Mbps 4749.8 Mbps -21.8% 87.8% / 96.2%
small-concurrent 3727.7 Mbps 5307.4 Mbps -29.8% 87.7% / 97.7%
small-single 3376.9 Mbps 4832.8 Mbps -30.1% 98.8% / 130.0%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 2946.8 Mbps 4036.7 Mbps -27.0%
lan 782.5 Mbps 810.3 Mbps -3.4%
lossy 69.8 Mbps 57.8 Mbps +20.9%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 28.3% slower on average

---
68221360a206d7749e88c9d491a55ac126ea25c2 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5452.5 Mbps N/A N/A 94.8% / 108.0%
medium-concurrent 5614.5 Mbps N/A N/A 94.5% / 104.0%
medium-single 3955.3 Mbps N/A N/A 95.1% / 108.0%
small-concurrent 3778.8 Mbps N/A N/A 99.8% / 164.0%
small-single 3533.8 Mbps N/A N/A 93.0% / 107.0%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 2873.7 Mbps 3798.2 Mbps -24.3%
lan 771.7 Mbps 796.4 Mbps -3.1%
lossy 69.8 Mbps 55.9 Mbps +25.0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 19.8% slower on average

---
2fdae31c42c4594c1447347cf11020a531453fff - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5329.2 Mbps 7961.0 Mbps -33.1% 95.3% / 107.0%
medium-concurrent 5485.0 Mbps 7612.6 Mbps -27.9% 97.2% / 157.0%
medium-single 3812.0 Mbps 4624.5 Mbps -17.6% 91.2% / 105.0%
small-concurrent 3816.8 Mbps 5205.1 Mbps -26.7% 94.8% / 106.0%
small-single 3439.1 Mbps 4701.9 Mbps -26.9% 89.2% / 97.0%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 3003.2 Mbps 3911.3 Mbps -23.2%
lan 782.5 Mbps 810.3 Mbps -3.4%
lossy 69.8 Mbps 55.9 Mbps +25.0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 26.2% slower on average

---
b038976a960ff48253a89945012df65fdb46a1d2 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5379.2 Mbps 7906.2 Mbps -32.0% 90.5% / 96.6%
medium-concurrent 5608.8 Mbps 7729.6 Mbps -27.4% 94.1% / 101.0%
medium-single 4129.2 Mbps 4583.2 Mbps -9.9% 86.7% / 96.0%
small-concurrent 3777.1 Mbps 5175.9 Mbps -27.0% 95.5% / 130.0%
small-single 3536.5 Mbps 4769.1 Mbps -25.8% 87.7% / 97.0%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 3142.6 Mbps 3940.6 Mbps -20.3%
lan 782.4 Mbps 797.3 Mbps -1.9%
lossy 69.8 Mbps 69.8 Mbps ~0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 24.4% slower on average

---
2da4a66cac937c8e559921bc0e3b4eabdac73ea0 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5401.4 Mbps 7922.6 Mbps -31.8% 96.2% / 122.0%
medium-concurrent 5606.5 Mbps 7757.6 Mbps -27.7% 92.7% / 97.5%
medium-single 3969.8 Mbps 4469.9 Mbps -11.2% 88.2% / 96.6%
small-concurrent 3829.4 Mbps 4901.3 Mbps -21.9% 93.1% / 102.0%
small-single 3497.4 Mbps 4649.4 Mbps -24.8% 99.7% / 185.0%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 2907.7 Mbps 3897.0 Mbps -25.4%
lan 782.5 Mbps 810.3 Mbps -3.4%
lossy 69.9 Mbps 69.8 Mbps ~0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 24.3% slower on average

---
e5cb78dbb4de16df02916475f27a1c5851bc3d45 - artifacts

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 3185.7 Mbps 3952.9 Mbps -19.4%

Summary

noq is 19.4% slower on average

---
f07f4438c9b9cc392d362ec78e78316d19bb3782 - artifacts

No results available

---
16c9164f68c4d1a5f7e9c879bc5ccc4316c4435a - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5707.7 Mbps 7974.1 Mbps -28.4% 96.0% / 119.0%
medium-concurrent 5172.8 Mbps 7792.7 Mbps -33.6% 92.7% / 97.8%
medium-single 4036.8 Mbps 4749.3 Mbps -15.0% 87.4% / 97.2%
small-concurrent 3810.8 Mbps 5332.0 Mbps -28.5% 101.8% / 184.0%
small-single 3493.3 Mbps 4778.5 Mbps -26.9% 88.5% / 97.3%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 3179.8 Mbps 3967.0 Mbps -19.8%
lan 782.4 Mbps 810.4 Mbps -3.5%
lossy 69.8 Mbps 55.9 Mbps +25.0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 25.9% slower on average

---
298b7beb57ae2a5a545bc63b288e9c5fd9991cf5 - artifacts

No results available

---
391b4589b6cb2e72d9a23091c424c5439e49c5b4 - artifacts

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 2877.3 Mbps N/A N/A
lan 782.4 Mbps N/A N/A
wan 83.8 Mbps N/A N/A
---
6bbe83c1b69ff44094745b5c60d8406d9db41e87 - artifacts

Raw Benchmarks (localhost)

Scenario noq upstream Delta CPU (avg/max)
large-single 5338.7 Mbps 8007.9 Mbps -33.3% 90.4% / 96.7%
medium-concurrent 5360.2 Mbps 7990.0 Mbps -32.9% 97.1% / 131.0%
medium-single 4063.5 Mbps 4748.9 Mbps -14.4% 97.8% / 131.0%
small-concurrent 3841.4 Mbps 5180.2 Mbps -25.8% 87.9% / 97.5%
small-single 3413.9 Mbps 4773.9 Mbps -28.5% 92.7% / 100.0%

Netsim Benchmarks (network simulation)

Condition noq upstream Delta
ideal 3112.2 Mbps 3995.1 Mbps -22.1%
lan 782.4 Mbps 810.4 Mbps -3.5%
lossy 55.9 Mbps 55.9 Mbps ~0%
wan 83.8 Mbps 83.8 Mbps ~0%

Summary

noq is 26.9% slower on average

@n0bot n0bot bot added this to iroh Feb 20, 2026
@github-project-automation github-project-automation bot moved this to 🚑 Needs Triage in iroh Feb 20, 2026
Comment thread noq-proto/src/connection/mod.rs Outdated
Comment thread noq-proto/src/connection/mod.rs Outdated
@matheus23
Copy link
Copy Markdown
Member

I know this is WIP code, but I want to call out that this contains partly some really useful refactors (moving PollPathTransmitStatus etc. around, matching on rhs instead of using rhs.*), and partly some other unrelated changes (introducing a PathSchedulingInfo struct).

Stacking these two in separate stacked PRs might make it easier to review.

I know this is draft - so feel free to ignore me :)

@flub flub linked an issue Feb 23, 2026 that may be closed by this pull request
@dignifiedquire dignifiedquire moved this from 🚑 Needs Triage to 🏗 In progress in iroh Feb 23, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 2, 2026

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/noq/pr/444/docs/noq/

Last updated: 2026-03-19T10:42:08Z

@flub flub marked this pull request as ready for review March 11, 2026 16:56
@flub flub moved this from 🏗 In progress to 👀 In review in iroh Mar 12, 2026
@flub
Copy link
Copy Markdown
Collaborator Author

flub commented Mar 13, 2026

Hi, since no one is reviewing this: I know this is more like dark magic and takes a lot of effort to understand if anything is right. I'm suggesting you primarily review for readability and maintainability. And trust the passing tests for the correctness or how sane this approach is.

Copy link
Copy Markdown
Member

@matheus23 matheus23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally a huge fan of this PR :)

Just a nit that pulling stuff into a function would be nice.

Having @divagant-martian look over this might be cool! I can't really judge the scheduling bits too much.

I was hopeful this would magically fix #451, but turns out doesn't :( That's fine though! The current code will make debugging this much easier I think :)

Comment thread noq-proto/src/connection/mod.rs Outdated
Comment thread noq-proto/src/connection/mod.rs Outdated
Comment thread noq-proto/src/connection/mod.rs Outdated
Comment thread noq-proto/src/connection/spaces.rs Outdated
Comment on lines +872 to +876
/// Whether there are any frames that must be sent on this specific space ID.
///
/// These are ack-eliciting, and a subset of [`SendableFrames::other`]. This is useful
/// for QUIC-MULTIPATH, which may desire not to send any frames on a backup path, which
/// can also be sent on an active path.
pub(super) path_exclusive: bool,
/// These are ack-eliciting. Some frames are scheduled per path, e.g. PING,
/// IMMEDIATE_ACK, PATH_CHALLENGE or PATH_RESPONSE.
pub(super) space_id_only: bool,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"space ID" here in the sense of e.g. Initial, or Handshake, or Data on path 2, but not SpaceKind::Data (without a path ID), right?

Not sure if this is a helpful comment or not, but space_id_only was the most confusing part of this PR for me. Maybe because of the name? It seems like path_exclusive sounds simpler, but that might just be because it's technically incorrect. Maybe something like path_exclusive_only? I wish I didn't have to know what a "space ID" is :/ (and I assume "path" just isn't fully correct).

(I'm also getting increasingly unsure if I understand space_id_only correctly. It doesn't mean that we only have PING/IMMEDIATE_ACK pending, right? It might mean we have those pending among other things, right? So in the above I might've misunderstood that yet again...)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair comment. "space ID" here as in the eventual meaning we want to have for it: Initial, Handkshake or Data(PathId). I've been using "space kind" for just Initial, Handshake and Data as that's already existing in the code base.

I appreciate that this right now is not the best term. And indeed path is not fully the correct term either as that excludes the Initial and Handshake spaces. Though there it is also not fully wrong.

And yes, space_id_only: true means that there are some packets pending that can only be sent on this space+path combination. Irrespective of whatever else might be pending. This is kind of the same for all the fields in this struct, e.g. acks: true doesn't mean that there's nothing else to send.

I'm not entirely sure what to use instead of space_id_only right now. I can improve the doc comment on it though. Would it help if it said space_only? Probably not. path_and_space_only? I don't particularly like that either though, but maybe.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this a little, PTAL.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The doc comment is super helpful.
Personally I think space_only is slightly worse than space_id_only. I do like the distinguishing of SpaceKind and SpaceId.

After reading this and I think having all the context, I'd suggest space_id_specific instead? This avoids the confusion I had earlier, where I confused whether the only meant sending only these kinds of frames, or whether it meant that it sent frames that are only sendable on that space ID.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"space" is generally short for "Packet Number Space" I think. SpaceKind and SpaceId are two concrete types. SpaceId refers to a specific packet number space, which in this struct is more implicit. "space" is already used at various places throughout the codebase, even in original Quinn.

I agree on the suffix suggestion though: SendableFrames::space_specific makes sense.

Co-authored-by: Philipp Krüger <philipp.krueger1@gmail.com>
Comment thread noq-proto/src/connection/mod.rs Outdated
},
)
})
.collect()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allocations make me sad (not sure it can be avoided)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can, but at some cost as well:

  • I now have two loops that look for the PathId by doing next_path_id = self.path.keys().find(|i| **i > path_id).copied();. It might be possible to fold the MTU discovery in the main poll loop, MTU packets would get a slightly higher priority but probably not really harmful overall.
  • I now need to do the computation for have_validated_status_available_space many more times.

I'm not sure how much the compiler manages to remove all of that. Is it smart enough to figure out that have_validate_status_available_space won't change between the calls and does it move it out? Does it make the iteration as fast as the previous version?

On the other hand, we now have some situations where we don't have to compute the scheduling information, and no longer need to compute it for all paths if we don't send on the last path.

What do you think, which version is better (though I also adopted @matheus23's feedback about splitting it off to a function, but that doesn't affect this really. It does make the diff a little bit more though)?

As an aside, in working out of how scheduling should work it was really helpful to have to extremely explicit as a bunch of data that's computed up-front. But it's fair that now we know this is how it should work that we can implement it in the most optimal way.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new versions reads a lot clearer to me, so I like it a lot more

Two bits of PR review:

- Move the scheduling to a separate function.
- Avoid an allocation, but this has some tradeoffs.

- I now have two loops that look for the PathId by doing `next_path_id
  = self.path.keys().find(|i| **i > path_id).copied();`. It might be
  possible to fold the MTU discovery in the main poll loop, MTU packets
  would get a slightly higher priority but probably not really harmful
  overall.

- I now need to do the computation for
  `have_validated_status_available_space many more times.

I'm not sure how much the compiler manages to remove all of that. Is
it smart enough to figure out that
`have_validate_status_available_space` won't change between the calls
and does it move it out? Does it make the iteration as fast as the
previous version?

On the other hand, we now have some situations where we don't have to
compute the scheduling information, and no longer need to compute it
for all paths if we don't send on the last path.

What do you think, which version is better (though I also adopted
@matheus23's feedback about splitting it off to a function, but that
doesn't affect this really. It does make the diff a little bit more
though)?

As an aside, in working out of how scheduling should work it was
really helpful to have to extremely explicit as a bunch of data that's
computed up-front. But it's fair that now we know this is how it
should work that we can implement it in the most optimal way.
@flub
Copy link
Copy Markdown
Collaborator Author

flub commented Mar 16, 2026

I was hopeful this would magically fix #451, but turns out doesn't :( That's fine though! The current code will make debugging this much easier I think :)

Would you like me to take over #451? How important is this really? Does it happen in real life? Also, #507 would probably implicitly fix this because it would not send empty packets, while the current version adds the padding instead. This is more of another argument that #507 is a better design than anything else really.

@matheus23
Copy link
Copy Markdown
Member

Would you like me to take over #451?

No, I don't think so.

How important is this really?

Not very, I suspect.

Does it happen in real life?

Yeah - I haven't tested it, but IIRC the test cases in #451 aren't super crazy things.

Copy link
Copy Markdown
Contributor

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it from a code logic perspective and readability aspect

Copy link
Copy Markdown
Member

@matheus23 matheus23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you requested review on this from me again. I already accepted this - I'm happy with it, although I do want to point out I've commented on some recent changes: #444 (comment)

@flub
Copy link
Copy Markdown
Collaborator Author

flub commented Mar 17, 2026

I see you requested review on this from me again. I already accepted this

Well yes, I made changes to things you commented on or about so it feels natural to ask your thoughts on those changes :)

dignifiedquire added a commit that referenced this pull request Mar 18, 2026
PR #444 (packet scheduling): verify unvalidated paths don't carry
stream data — only PATH_CHALLENGE/RESPONSE. Stream data should flow
exclusively on validated Available paths.

PR #509 (PATH_ABANDON on self): verify PATH_ABANDON is sent on the
abandoned path itself when no other validated path exists. Also verify
PATH_ABANDON is delivered to remote after network change replaces the
only validated path.
@flub flub enabled auto-merge March 19, 2026 10:30
@flub flub added this pull request to the merge queue Mar 19, 2026
Merged via the queue into main with commit 4bf3dc9 Mar 19, 2026
36 checks passed
@flub flub deleted the flub/packet-scheduling-1 branch March 19, 2026 11:01
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in iroh Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Improve packet scheduling

3 participants