[flink] add split assignment mechanism w/o partitionId by zuston · Pull Request #3272 · apache/fluss

zuston · 2026-05-08T08:02:19Z

Purpose

Linked issue: close #3269

The motivation of this PR is to align with the sink channel selector. And another point is to make it possible that all bucket data could be routed into the same subtasks if having the partitionIds when numBucket % concurrency = 0 , that could emlinate the shuffle to improve the performance to reduce backfill time for large-scale data.

Brief change log

Tests

API and Format

Documentation

Copilot

Pull request overview

This PR adjusts Flink source split-to-subtask assignment to better align with the sink channel selection logic, aiming to co-locate buckets on the same subtasks (notably when numBuckets % parallelism == 0) to reduce shuffle and improve large-scale backfill performance (issue #3269).

Changes:

Update FlinkSourceEnumerator#getSplitOwner to use ChannelComputer.shouldCombinePartitionInSharding(...) and route either by bucket-only or by (partition,bucket).
Add ChannelComputer.select(Long partitionId, int bucket, int numChannels) to support partition-id-based channel selection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/source/enumerator/FlinkSourceEnumerator.java	Switch split ownership logic to reuse `ChannelComputer` sharding decisions and enable bucket-only routing when appropriate.
fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/sink/ChannelComputer.java	Add a partition-id-based `select(...)` overload for channel selection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

    @VisibleForTesting
    protected int getSplitOwner(SourceSplitBase split) {
        TableBucket tableBucket = split.getTableBucket();


+        Long partitionId = tableBucket.getPartitionId();
+        int bucketId = tableBucket.getBucket();
+        if (ChannelComputer.shouldCombinePartitionInSharding(
+                partitionId != null, tableInfo.getNumBuckets(), numChannels)) {
+            return ChannelComputer.select(partitionId, bucketId, numChannels);
+        }
+        return ChannelComputer.select(bucketId, numChannels);


+        if (ChannelComputer.shouldCombinePartitionInSharding(
+                partitionId != null, tableInfo.getNumBuckets(), numChannels)) {
+            return ChannelComputer.select(partitionId, bucketId, numChannels);


 import org.apache.fluss.flink.lake.LakeSplitGenerator;
 import org.apache.fluss.flink.lake.split.LakeSnapshotAndFlussLogSplit;
 import org.apache.fluss.flink.lake.split.LakeSnapshotSplit;
+import org.apache.fluss.flink.sink.ChannelComputer;


+    static int select(Long partitionId, int bucket, int numChannels) {
+        int startChannel = ((partitionId.hashCode() * 31) & 0x7FFFFFFF) % numChannels;
+        return (startChannel + bucket) % numChannels;
+    }


+        Long partitionId = tableBucket.getPartitionId();
+        int bucketId = tableBucket.getBucket();
+        if (ChannelComputer.shouldCombinePartitionInSharding(
+                partitionId != null, tableInfo.getNumBuckets(), numChannels)) {
+            return ChannelComputer.select(partitionId, bucketId, numChannels);
+        }
+        return ChannelComputer.select(bucketId, numChannels);


[flink] add split assignment mechanism w/o partitionId

3508482

luoyuxia requested a review from Copilot May 8, 2026 08:06

Copilot started reviewing on behalf of luoyuxia May 8, 2026 08:07 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flink] add split assignment mechanism w/o partitionId#3272

[flink] add split assignment mechanism w/o partitionId#3272
zuston wants to merge 1 commit intoapache:mainfrom
zuston:split-assign

zuston commented May 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zuston commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zuston commented May 8, 2026 •

edited

Loading