pandaproxy/rest: return every partition in iceberg translation state#30576
Open
mmaslankaprv wants to merge 5 commits into
Open
pandaproxy/rest: return every partition in iceberg translation state#30576mmaslankaprv wants to merge 5 commits into
mmaslankaprv wants to merge 5 commits into
Conversation
When using the aws_sigv4 authentication mode with the sts credentials source, AWS access/secret keys are not required — only the region is needed, mirroring the existing behavior for aws_instance_metadata. Extend the cluster config test to cover the new combination as well as the previously-uncovered aws_instance_metadata path.
Contributor
There was a problem hiding this comment.
Pull request overview
Exposes the Kafka topic partition count in the Iceberg translation state response, enabling clients to discover this information without an extra metadata lookup.
Changes:
- Add
topic_partition_countto the IcebergTopicStateprotobuf and regenerate Python bindings. - Populate
topic_partition_countinpandaproxy/resttranslation state handler from topic configuration. - Extend ducktape coverage to assert the partition count is returned; additionally adjusts Iceberg SigV4 config validation/tests to treat
stscredentials source as region-only.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| proto/redpanda/core/rest/iceberg.proto | Adds optional topic_partition_count to TopicState. |
| src/v/pandaproxy/rest/iceberg_handlers.cc | Populates topic_partition_count from topic configuration in get_translation_state. |
| tests/rptest/clients/admin/proto/redpanda/core/rest/iceberg_pb2.py | Regenerated Python protobuf bindings reflecting the new field. |
| tests/rptest/clients/admin/proto/redpanda/core/rest/iceberg_pb2.pyi | Regenerated Python type stubs reflecting the new field/presence API. |
| tests/rptest/tests/datalake/iceberg_translation_state_test.py | Asserts the translation state includes the expected partition count. |
| src/v/config/validators.cc | Treats sts credentials source like aws_instance_metadata for SigV4 validation (region-only). |
| tests/rptest/tests/cluster_config_test.py | Updates config validation tests to cover SigV4 + sts/aws_instance_metadata combinations. |
Comment on lines
328
to
333
| // When using aws_instance_metadata, AWS credentials are not required | ||
| if ( | ||
| effective_creds_source | ||
| == model::cloud_credentials_source::aws_instance_metadata) { | ||
| == model::cloud_credentials_source::aws_instance_metadata | ||
| || effective_creds_source == model::cloud_credentials_source::sts) { | ||
| // We still require the region of the Glue endpoint. |
Comment on lines
339
to
+341
| return fmt::format( | ||
| "Must set AWS region when using SigV4 authentication with " | ||
| "aws_instance_metadata credentials source."); | ||
| "aws_instance_metadata/sts credentials source."); |
Comment on lines
321
to
333
| case datalake_catalog_auth_mode::aws_sigv4: { | ||
| // Determine effective credentials source | ||
| auto effective_creds_source | ||
| = config.iceberg_rest_catalog_aws_credentials_source().has_value() | ||
| ? config.iceberg_rest_catalog_aws_credentials_source().value() | ||
| : config.cloud_storage_credentials_source(); | ||
|
|
||
| // When using aws_instance_metadata, AWS credentials are not required | ||
| if ( | ||
| effective_creds_source | ||
| == model::cloud_credentials_source::aws_instance_metadata) { | ||
| == model::cloud_credentials_source::aws_instance_metadata | ||
| || effective_creds_source == model::cloud_credentials_source::sts) { | ||
| // We still require the region of the Glue endpoint. |
Member
Author
|
/ci-repeat |
The translation state handler previously only emitted partitions that the datalake coordinator had already seen pending files for. Drive the partition state map off the topic configuration instead, so consumers always receive an entry for every partition of the topic, even when the coordinator has not yet observed it. Partitions without a known committed offset are returned with `last_catalog_committed_offset` unset.
Extend the smoke test to assert that the translation state response contains an entry for every partition of the topic, regardless of whether the coordinator has observed them yet.
7f0b3da to
7eabcba
Compare
Add a `partition_count` field to the iceberg `TopicState` message and populate it from the topic configuration when building the response. The field is redundant with the size of `partition_states`, but lets consumers read the partition count directly without iterating the map. Regenerate the ducktape proto bindings accordingly.
Assert that the new `partition_count` field reflects the partition count used when creating the iceberg-enabled topic.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Make the iceberg translation state REST response carry complete
per-topic information so callers can size their per-partition
processing without a separate metadata lookup:
partition_states, not only those the datalake coordinator hasalready seen pending files for. Partitions without a known committed
offset are returned with
last_catalog_committed_offsetunset.partition_countfield toTopicStateand populate it fromthe topic configuration.
Backports Required
Release Notes
Improvements
state entry for every partition of the topic and reports the topic's
partition_count.