Skip to content

[DNM] dt: bump franz-bench and client-swarm to current versions#30580

Draft
nguyen-andrew wants to merge 2 commits into
redpanda-data:devfrom
nguyen-andrew:CORE-16245
Draft

[DNM] dt: bump franz-bench and client-swarm to current versions#30580
nguyen-andrew wants to merge 2 commits into
redpanda-data:devfrom
nguyen-andrew:CORE-16245

Conversation

@nguyen-andrew
Copy link
Copy Markdown
Member

@nguyen-andrew nguyen-andrew commented May 21, 2026

Part of the effort to bring our test clients up to Kafka 4.x support
(ENG-1185):

  • franz-bench: franz-go pin v1.5.0v1.20.7.
  • client-swarm: rdkafka 0.37.00.39.0 (bundled librdkafka 2.12),
    plus a workaround in topic_creation_test.py for a librdkafka
    leader_epoch behavior gap that the bump exposes.

See the commit messages for more details.

[DNM] because the client-swarm SHA currently pins the bump-rdkafka-0.39
branch on redpanda-data/client-swarm. The client-swarm bump's commit
will be amended with a real main-branch SHA once that branch lands; this
PR shouldn't merge before that.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

Brings the franz-go bench example consumed by franzgo_bench.py up to
the v1.20.x line used by kgo-verifier and transform-verifier. The
bench progress output format ("X.XX MiB/s; Y.YYk records/s") and CLI
flags are unchanged between v1.5.0 and v1.20.7, so no wrapper changes
in franzgo_examples.py are needed.
NOTE: the SHA pins the client-swarm bump-rdkafka-0.39 branch HEAD for
now; this will be updated to a real client-swarm main-branch commit
once the PR is merged.

Bumps client-swarm to rdkafka 0.39 (bundled librdkafka 2.12) to align
with the broader non-Java client modernization for Kafka 4.x. rdkafka
0.39 pulls in librdkafka-sys 4.10+, whose build uses bindgen which
requires libclang at compile time, so the client-swarm Docker stage
now installs libclang-dev.

The bump causes test_topic_recreation_while_producing to fail due to a
behavior gap in librdkafka: after a topic is deleted and recreated,
the producer's per-partition leader_epoch cache isn't reset, so the
new topic's fresh leader_epoch is rejected as stale and the producer
keeps sending to the previous-incarnation's leader. Earlier librdkafka
tore down per-partition state on topic deletion; v2.10 changed that
path to keep the state in place across delete+recreate, and under
librdkafka's defaults (topic.metadata.refresh.interval.ms = 5min,
topic.metadata.propagation.max.ms = 30s) the producer doesn't
reconverge within the test's 30s wait_until.

This commit works around the gap by passing 2-second values for those
two settings via the producer properties, so the cache reconverges
within the test's 30s window.
@nguyen-andrew
Copy link
Copy Markdown
Member Author

/ci-repeat 1
/skip-unit
/skip-rebase

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

vbotbuildovich commented May 21, 2026

CI test results

test results on build#84833
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ShadowLinkingReplicationTests test_with_restart {"storage_mode": "cloud"} integration https://buildkite.com/redpanda/redpanda/builds/84833#019e4c43-d84e-42e2-bb82-435efa2b65e6 19/21 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0421, p0=0.5772, reject_threshold=0.0100. adj_baseline=0.1211, p1=0.2839, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_with_restart
FLAKY(PASS) RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": true} integration https://buildkite.com/redpanda/redpanda/builds/84833#019e4c44-1e25-4591-8b2e-8ccf7233732d 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0017, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
FLAKY(PASS) TopicRecreateTest test_topic_recreation_while_producing {"cleanup_policy": "compact", "workload": "ACKS_ALL"} integration https://buildkite.com/redpanda/redpanda/builds/84833#019e4c44-1e24-473c-9872-bdb02abd9220 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TopicRecreateTest&test_method=test_topic_recreation_while_producing
test results on build#84848
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(FAIL) TopicRecreateTest test_topic_recreation_while_producing {"cleanup_policy": "compact", "workload": "ACKS_1"} integration https://buildkite.com/redpanda/redpanda/builds/84848#019e4cdc-f8b9-4e6a-8b14-67298cd1b9e0 17/20 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TopicRecreateTest&test_method=test_topic_recreation_while_producing
FLAKY(PASS) TopicRecreateTest test_topic_recreation_while_producing {"cleanup_policy": "compact", "workload": "ACKS_ALL"} integration https://buildkite.com/redpanda/redpanda/builds/84848#019e4cdc-f8b6-4f2e-b17a-93fe4f943e8a 19/20 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TopicRecreateTest&test_method=test_topic_recreation_while_producing
FLAKY(PASS) TopicRecreateTest test_topic_recreation_while_producing {"cleanup_policy": "delete", "workload": "ACKS_ALL"} integration https://buildkite.com/redpanda/redpanda/builds/84848#019e4cdc-f8b6-4a1e-a32e-f8dc80667197 19/20 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TopicRecreateTest&test_method=test_topic_recreation_while_producing
FLAKY(PASS) TopicRecreateTest test_topic_recreation_while_producing {"cleanup_policy": "delete", "workload": "ACKS_ALL"} integration https://buildkite.com/redpanda/redpanda/builds/84848#019e4cdc-f8b9-4e6a-8b14-67298cd1b9e0 19/20 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TopicRecreateTest&test_method=test_topic_recreation_while_producing
FLAKY(FAIL) TopicRecreateTest test_topic_recreation_while_producing {"cleanup_policy": "compact", "workload": "IDEMPOTENT"} integration https://buildkite.com/redpanda/redpanda/builds/84848#019e4cdc-f8b9-4910-8f85-067d277ffa6b 18/20 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TopicRecreateTest&test_method=test_topic_recreation_while_producing
FLAKY(PASS) TopicRecreateTest test_topic_recreation_while_producing {"cleanup_policy": "delete", "workload": "IDEMPOTENT"} integration https://buildkite.com/redpanda/redpanda/builds/84848#019e4cdc-f8b6-4f2e-b17a-93fe4f943e8a 19/20 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TopicRecreateTest&test_method=test_topic_recreation_while_producing

@nguyen-andrew
Copy link
Copy Markdown
Member Author

/ci-repeat 1
skip-rebase
skip-redpanda-build
skip-units
dt-repeat=10
dt-nodes=3
tests/rptest/tests/topic_creation_test.py::TopicRecreateTest.test_topic_recreation_while_producing
tests/rptest/scale_tests/franzgo_bench.py

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

vbotbuildovich commented May 22, 2026

Retry command for Build#84848

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/topic_creation_test.py::TopicRecreateTest.test_topic_recreation_while_producing@{"cleanup_policy":"compact","workload":"IDEMPOTENT"}
tests/rptest/tests/topic_creation_test.py::TopicRecreateTest.test_topic_recreation_while_producing@{"cleanup_policy":"compact","workload":"ACKS_1"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants