Skip to content

net: Revert unconditional tcp nodelay#30470

Open
StephanDollberg wants to merge 3 commits into
devfrom
stephan/revert-nodelay
Open

net: Revert unconditional tcp nodelay#30470
StephanDollberg wants to merge 3 commits into
devfrom
stephan/revert-nodelay

Conversation

@StephanDollberg
Copy link
Copy Markdown
Member

This reverts commit f3169e6.

The commit set TCP_NODELAY on the base client and hence it affected all
forms of clients.

This includes the internal RPC client.

In CPD we saw a throughput performance regression:

RedPandaOpenMessagingBenchmarkPerf pub50pct -1%
RpkBenchmarkPerf requests_per_sec -7%

Looking at metrics it becomes clear why. Avg reactor util is up ~5%.
recvfrom syscalls are up ~20%.

The delayed sending causes implicit batching on the TCP level when
multiple append entry requests are in flight at the same time. This
reduces recv calls on the followers and probably also has higher
efficiency at the application level (even if the amount of actual append
entry requests is unchanged or even increased).

While this sort of implicit reliance on NODELAY isn't great the
performance impact can't be neglected. Hence revert.

Longer team we'd probably want to replace this with more explicit
batching.

Set it only for HTTP.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reverts the previously-introduced unconditional TCP_NODELAY behavior in net::base_transport (which impacted all clients, including internal RPC) and limits TCP_NODELAY enablement to the HTTP client path to avoid the observed throughput regression.

Changes:

  • Remove unconditional fd.set_nodelay(true) from net::base_transport::do_connect.
  • Add net::base_transport::set_nodelay(bool) to allow opt-in socket configuration after connection establishment.
  • Enable TCP_NODELAY for HTTP connections by calling set_nodelay(true) after get_connected() succeeds.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/v/net/transport.h Adds set_nodelay(bool) to the base transport interface for opt-in use by specific clients.
src/v/net/transport.cc Removes unconditional TCP_NODELAY during connect and implements base_transport::set_nodelay.
src/v/http/client.cc Opts HTTP client connections into TCP_NODELAY after establishing a connected socket.

Comment thread src/v/net/transport.h

void set_keepalive_parameters(const ss::net::keepalive_params& params);
void set_keepalive(bool);
void set_nodelay(bool);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing doc

travisdowns
travisdowns previously approved these changes May 19, 2026
This reverts commit f3169e6.

The commit set TCP_NODELAY on the base client and hence it affected all
forms of clients.

This includes the internal RPC client.

In CPD we saw a throughput performance regression:

```
RedPandaOpenMessagingBenchmarkPerf pub50pct -1%
RpkBenchmarkPerf requests_per_sec -7%
```

Looking at metrics it becomes clear why. Avg reactor util is up ~5%.
recvfrom syscalls are up ~20%.

The delayed sending causes implicit batching on the TCP level when
multiple append entry requests are in flight at the same time. This
reduces recv calls on the followers and probably also has higher
efficiency at the application level (even if the amount of actual append
entry requests is unchanged or even increased).

While this sort of implicit reliance on NODELAY isn't great the
performance impact can't be neglected. Hence revert.

Longer team we'd probably want to replace this with more explicit
batching.
After previously enabling it unconditionally and then having to revert
it in 55d7346 this patch adds the API
such that sub-classes can enable it selectively.
HTTP doesn't allow for batching in the protocol (no one uses pipelining)
and hence it's effectively always bad to use it for HTTP especially
where client code might have nagle-bad patterns like send(headers) ->
send(body).
@StephanDollberg StephanDollberg force-pushed the stephan/revert-nodelay branch from d7759a2 to eba85eb Compare May 22, 2026 14:31
@vbotbuildovich
Copy link
Copy Markdown
Collaborator

CI test results

test results on build#84873
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ManualFinalizationTest test_manual_request_lost_on_leader_failover null integration https://buildkite.com/redpanda/redpanda/builds/84873#019e502f-ef98-4b1d-a033-c39d43edbd87 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ManualFinalizationTest&test_method=test_manual_request_lost_on_leader_failover

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants