Skip to content

pandaproxy/rest: return every partition in iceberg translation state#30576

Open
mmaslankaprv wants to merge 5 commits into
redpanda-data:devfrom
mmaslankaprv:proxy-iceberg-partitions
Open

pandaproxy/rest: return every partition in iceberg translation state#30576
mmaslankaprv wants to merge 5 commits into
redpanda-data:devfrom
mmaslankaprv:proxy-iceberg-partitions

Conversation

@mmaslankaprv
Copy link
Copy Markdown
Member

@mmaslankaprv mmaslankaprv commented May 21, 2026

Make the iceberg translation state REST response carry complete
per-topic information so callers can size their per-partition
processing without a separate metadata lookup:

  • Iterate over every partition of the topic when building
    partition_states, not only those the datalake coordinator has
    already seen pending files for. Partitions without a known committed
    offset are returned with last_catalog_committed_offset unset.
  • Add a partition_count field to TopicState and populate it from
    the topic configuration.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

Improvements

  • The iceberg translation state REST endpoint now returns a partition
    state entry for every partition of the topic and reports the topic's
    partition_count.

When using the aws_sigv4 authentication mode with the sts credentials
source, AWS access/secret keys are not required — only the region is
needed, mirroring the existing behavior for aws_instance_metadata.

Extend the cluster config test to cover the new combination as well as
the previously-uncovered aws_instance_metadata path.
Copilot AI review requested due to automatic review settings May 21, 2026 16:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Exposes the Kafka topic partition count in the Iceberg translation state response, enabling clients to discover this information without an extra metadata lookup.

Changes:

  • Add topic_partition_count to the Iceberg TopicState protobuf and regenerate Python bindings.
  • Populate topic_partition_count in pandaproxy/rest translation state handler from topic configuration.
  • Extend ducktape coverage to assert the partition count is returned; additionally adjusts Iceberg SigV4 config validation/tests to treat sts credentials source as region-only.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
proto/redpanda/core/rest/iceberg.proto Adds optional topic_partition_count to TopicState.
src/v/pandaproxy/rest/iceberg_handlers.cc Populates topic_partition_count from topic configuration in get_translation_state.
tests/rptest/clients/admin/proto/redpanda/core/rest/iceberg_pb2.py Regenerated Python protobuf bindings reflecting the new field.
tests/rptest/clients/admin/proto/redpanda/core/rest/iceberg_pb2.pyi Regenerated Python type stubs reflecting the new field/presence API.
tests/rptest/tests/datalake/iceberg_translation_state_test.py Asserts the translation state includes the expected partition count.
src/v/config/validators.cc Treats sts credentials source like aws_instance_metadata for SigV4 validation (region-only).
tests/rptest/tests/cluster_config_test.py Updates config validation tests to cover SigV4 + sts/aws_instance_metadata combinations.

Comment on lines 328 to 333
// When using aws_instance_metadata, AWS credentials are not required
if (
effective_creds_source
== model::cloud_credentials_source::aws_instance_metadata) {
== model::cloud_credentials_source::aws_instance_metadata
|| effective_creds_source == model::cloud_credentials_source::sts) {
// We still require the region of the Glue endpoint.
Comment on lines 339 to +341
return fmt::format(
"Must set AWS region when using SigV4 authentication with "
"aws_instance_metadata credentials source.");
"aws_instance_metadata/sts credentials source.");
Comment on lines 321 to 333
case datalake_catalog_auth_mode::aws_sigv4: {
// Determine effective credentials source
auto effective_creds_source
= config.iceberg_rest_catalog_aws_credentials_source().has_value()
? config.iceberg_rest_catalog_aws_credentials_source().value()
: config.cloud_storage_credentials_source();

// When using aws_instance_metadata, AWS credentials are not required
if (
effective_creds_source
== model::cloud_credentials_source::aws_instance_metadata) {
== model::cloud_credentials_source::aws_instance_metadata
|| effective_creds_source == model::cloud_credentials_source::sts) {
// We still require the region of the Glue endpoint.
@mmaslankaprv
Copy link
Copy Markdown
Member Author

/ci-repeat

The translation state handler previously only emitted partitions that
the datalake coordinator had already seen pending files for. Drive the
partition state map off the topic configuration instead, so consumers
always receive an entry for every partition of the topic, even when
the coordinator has not yet observed it. Partitions without a known
committed offset are returned with `last_catalog_committed_offset`
unset.
Extend the smoke test to assert that the translation state response
contains an entry for every partition of the topic, regardless of
whether the coordinator has observed them yet.
@mmaslankaprv mmaslankaprv force-pushed the proxy-iceberg-partitions branch from 7f0b3da to 7eabcba Compare May 21, 2026 16:38
@mmaslankaprv mmaslankaprv changed the title pandaproxy/rest: expose topic_partition_count in translation state pandaproxy/rest: return every partition in iceberg translation state May 21, 2026
Add a `partition_count` field to the iceberg `TopicState` message and
populate it from the topic configuration when building the response.
The field is redundant with the size of `partition_states`, but lets
consumers read the partition count directly without iterating the map.
Regenerate the ducktape proto bindings accordingly.
Assert that the new `partition_count` field reflects the partition
count used when creating the iceberg-enabled topic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants