Skip to content

Add multidisk-jbod-balancing.md#175

Merged
SaltTan merged 4 commits into
mainfrom
multidisk-config
Apr 9, 2026
Merged

Add multidisk-jbod-balancing.md#175
SaltTan merged 4 commits into
mainfrom
multidisk-config

Conversation

@eyyu
Copy link
Copy Markdown
Contributor

@eyyu eyyu commented Mar 28, 2026

I have read the CLA Document and I hereby sign the CLA

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 28, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@eyyu eyyu force-pushed the multidisk-config branch from 2b310de to 98d8e61 Compare March 28, 2026 00:41
@eyyu
Copy link
Copy Markdown
Contributor Author

eyyu commented Mar 28, 2026

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request Mar 28, 2026
Comment thread content/en/altinity-kb-setup-and-maintenance/multidisk-jbod-balancing.md Outdated
Comment thread content/en/altinity-kb-setup-and-maintenance/multidisk-jbod-balancing.md Outdated
eyyu and others added 3 commits April 2, 2026 15:39
…lancing.md

Co-authored-by: SaltTan <20357526+SaltTan@users.noreply.github.com>
Clarify wording in the round robin
@SaltTan SaltTan merged commit 69b86fc into main Apr 9, 2026
2 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 9, 2026

Configuration to assist rebalancing:

- MergeTree setting: `min_bytes_to_rebalance_partition_over_jbod`. Setting is not about where the data is written on insert. This setting considers redistribution of parts across disks of the same volume on a merge.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and during a fetch, so if big art is inserted in one replica, the other one do a GET_PART and the code path is then through balancedReservation

<default>
<path>/var/lib/clickhouse/</path>
<keep_free_space_bytes>10737418240</keep_free_space_bytes>
</disk1>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad closing tag here, must be </default>


ClickHouse selects the disk with the most available space and writes to that disk.

Changing to least_used when even disk space consumption is desirable or when you have a JBOD volume with differing disk sizes. To prevent hot-spots, it is best to set this policy on a fresh volume or on a volume that has already been (re)balanced.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Changing to least_used when even disk space consumption is desirable or when you have a JBOD volume with differing disk sizes. To prevent hot-spots, it is best to set this policy on a fresh volume or on a volume that has already been (re)balanced.
Changing to `least_used` when even disk space consumption is desirable or when you have a JBOD volume with differing disk sizes. To prevent hot-spots, it is best to set this policy on a fresh volume or on a volume that has already been (re)balanced.


This is the default setting and is most effective when parts created on insert are roughly the same size.

Drawbacks: may lead to disk skew
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to explain what is disk skew, so newbies won't have to check on net and stay on kb more


Changing to least_used when even disk space consumption is desirable or when you have a JBOD volume with differing disk sizes. To prevent hot-spots, it is best to set this policy on a fresh volume or on a volume that has already been (re)balanced.

Drawbacks: may lead to hot-spots
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to explain what are hot-spots, so newbies won't have to check on net and stay on kb more

arrayJoin(arrayRemove(v.disks, parts.disk_name)) AS other_disk_candidate,
candidate_disks.free_space AS candidate_disk_free_space
FROM system.parts AS parts
INNER JOIN ( SELECT database, `table`, storage_policy FROM system.tables where (name LIKE target_tables) AND (database LIKE target_databases) group by 1, 2, 3 ) AS sp ON sp.`table` = parts.`table` AND sp.database = parts.database
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

system.tables got table on June 21, 2024, better use name

parts.disk_name as part_disk_name,
parts.bytes_on_disk AS part_bytes_on_disk,
sp.storage_policy as part_storage_policy,
arrayJoin(arrayRemove(v.disks, parts.disk_name)) AS other_disk_candidate,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query joins every JBOD volume in a storage policy and does not constrain candidates to the part's current volume. Because system.storage_policies has one row per volume, it can emit invalid cross-volume MOVE PART ... TO DISK statements.

arrayRemove(v.disks, parts.disk_name) only removes parts.disk_name from the disks of the currently joined v row. It does not prove that v is the part's current volume.

Example:

  - Policy p1
  - Volume hot_jbod: ['disk1','disk2']  
  - Volume cold_jbod: ['disk3','disk4'] 
  - Part is on disk1

Because the join is only:

ON sp.storage_policy = v.policy_name

that part joins to both volume rows.

Then:

  • For hot_jbod:
arrayRemove(['disk1','disk2'], 'disk1') = ['disk2']
  • For cold_jbod:
arrayRemove(['disk3','disk4'], 'disk1') = ['disk3','disk4']

So arrayJoin(...) produces these candidates:

  • disk2 [hot_jbod]
  • disk3 [cold_jbod]
  • disk4 [cold_jbod]

That means cross-volume disks still enter the candidate set.

Configurations that can affect disk selected:

- storage policy volume configuration: `least_used_ttl_ms`. Only applies to `least_used` policy, 60s default.
- disk setting: `keep_free_space_bytes` , `keep_free_space_ratio`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add notes on what affects volume selecting (because it happens before disk balancing):

  • volume_priority: earlier volumes are filled first, subject to other storage-policy rules
  • max_data_part_size_bytes: limits which volume can accept a part of a given size
  • move_factor: affects automatic movement to the next volume when free space drops below the configured threshold

linkTitle: "MultiDisk (JBOD) Balancing"
---

ClickHouse provides two options to balance an insert across disks in a volume with more than one disk: `round_robin` and `least_used` .
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ClickHouse provides two options to balance an insert across disks in a volume with more than one disk: `round_robin` and `least_used` .
ClickHouse provides two options to balance an insert across disks in a volume with more than one disk: `round_robin` and `least_used`.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants