rptest: some shadow indexing & linking CI fixes#30586
Conversation
This test would fail because segments from other irrelevant topics would be probed for a drop in size due to compaction. Limit the files we sample to only the segment files relevant to the test topic.
…edTopic` This test calls `_transfer_topic_leadership` and expects that a partition move is respected. However, the autobalancer can race with this command and result in a time out (as the partition is moved to a different node entirely). Disable it for this test.
Specifically, for cloud topics backed storage modes, which seem to need a bit more slack.
There was a problem hiding this comment.
Pull request overview
This PR improves CI stability for rptest shadow indexing and cluster linking end-to-end tests by reducing sensitivity to unrelated on-disk data and by tuning cluster/test configuration and timeouts for slower (cloud-backed) cases.
Changes:
- Restrict compacted-topic size accounting to the specific test topic’s
.logsegments (avoids interference from other topics/internal logs). - Disable leader balancing / partition autobalancing for the compacted-topic shadow indexing e2e test variant.
- Make cluster linking verifier progress timeout configurable and increase it for cloud-backed storage modes in the failure-injection scenario.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
tests/rptest/tests/shadow_indexing_compacted_topic_test.py |
Filters data_stat() sizes to only the test topic’s Kafka .log segments when checking compaction impact. |
tests/rptest/tests/e2e_shadow_indexing_test.py |
Adds an __init__ to the compacted-topic test to disable balancing features that can add noise/flakiness. |
tests/rptest/tests/cluster_linking_test_base.py |
Extends verify() to accept and forward a progress_timeout to the progress verifier. |
tests/rptest/tests/cluster_linking_e2e_test.py |
Uses a longer verification progress timeout when running in cloud-backed storage modes. |
| TopicSpec.STORAGE_MODE_CLOUD, | ||
| TopicSpec.STORAGE_MODE_TIERED_CLOUD, | ||
| ) | ||
| progress_timeout = 120 if cloud_backed else 60 |
There was a problem hiding this comment.
i would definitely run this a few hundred times in CDT. last time I was messing with similar timeouts i saw pretty huge variance. fairly minor s3 hiccups eat into these timeouts pretty quickly.
There was a problem hiding this comment.
i would definitely run this a few hundred times in CDT. last time I was messing with similar timeouts i saw pretty huge variance. fairly minor s3 hiccups eat into these timeouts pretty quickly.
Not sure its worth hitting CDT a million times on this PR to check that these timeouts are permissive, I'm in favour of letting it just sit in CI and re-adjust in the future if necessary.
See commits.
Backports Required
Release Notes