Skip to content

Flink: Allow setting slot sharing group for fine-grained resource management#16065

Open
sqd wants to merge 2 commits intoapache:mainfrom
sqd:oss_slot_sharing_group
Open

Flink: Allow setting slot sharing group for fine-grained resource management#16065
sqd wants to merge 2 commits intoapache:mainfrom
sqd:oss_slot_sharing_group

Conversation

@sqd
Copy link
Copy Markdown
Contributor

@sqd sqd commented Apr 20, 2026

Currently all operators created by the dynamic sink are part of the default slot sharing group, and thus getting an equal share of the resources on taskmanagers. However, it is usually the case that the sink and the generator operators are far more resource-heavy than the rest of the operators, making the default resource allocation inefficient.

Flink already supports fine-grained resource management mechanism to support use cases exactly like this. This change adds support to wire the dynamic sink into that system, by allowing the users to set slot sharing groups for 1. the shuffle writer 2. the generator+the forward writer -- they need to share the same slot sharing group to enable operator chaining.

Currently all operators created by the dynamic sink are part of the
default slot sharing group, and thus getting an equal share of the
resources on taskmanagers. However, it is usually the case that the sink
and the generator operators are far more resource-heavy than the rest of
the operators, making the default resource allocation inefficient.

Flink already supports fine-grained resource management mechanism to
support use cases exactly like this. This change adds support to wire
the dynamic sink into that system, by allowing the users to set slot
sharing groups for 1. the shuffle writer 2. the generator+the forward
writer -- they need to share the same slot sharing group to enable
operator chaining.
@sqd
Copy link
Copy Markdown
Contributor Author

sqd commented Apr 20, 2026

@mxm @pvary Would appreciate if you could take a look

@sqd
Copy link
Copy Markdown
Contributor Author

sqd commented Apr 20, 2026

We have been running this internally for a while now. This allows Flink pipelines using the Iceberg dynamic sink to very flexibly slice the taskmanager resources like this:
image

@pvary
Copy link
Copy Markdown
Contributor

pvary commented Apr 21, 2026

CC: @mxm, @Guosmilesmile

Copy link
Copy Markdown
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How important is it for the slot sharing groups to be set explicitly? Could we add an option like disableSlotSharing() to put the components into different slots?

@sqd
Copy link
Copy Markdown
Contributor Author

sqd commented Apr 21, 2026

How important is it for the slot sharing groups to be set explicitly? Could we add an option like disableSlotSharing() to put the components into different slots?

The goal is to allow tailoring the resources allocated to each operator using Flink fine-grained resource management, so the user needs to pass in an SSG like this

sinkBuilder
  .generatorSlotSharingGroup(
    SlotSharingGroup.newBuilder('generator-ssg')
      .setCpuCores(1)
      .setTaskHeapMemoryMB(512)
      .build())
  .otherSinkBuilderOptions(...)
...

@sqd
Copy link
Copy Markdown
Contributor Author

sqd commented Apr 21, 2026

@mxm I also think it's a bit confusing that Flink uses "slot sharing group" which seems to imply some sort of resource isolation mechanism to manage resources, but here we are. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants