Add multidisk-jbod-balancing.md#175
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
…lancing.md Co-authored-by: SaltTan <20357526+SaltTan@users.noreply.github.com>
Clarify wording in the round robin
|
|
||
| Configuration to assist rebalancing: | ||
|
|
||
| - MergeTree setting: `min_bytes_to_rebalance_partition_over_jbod`. Setting is not about where the data is written on insert. This setting considers redistribution of parts across disks of the same volume on a merge. |
There was a problem hiding this comment.
and during a fetch, so if big art is inserted in one replica, the other one do a GET_PART and the code path is then through balancedReservation
| <default> | ||
| <path>/var/lib/clickhouse/</path> | ||
| <keep_free_space_bytes>10737418240</keep_free_space_bytes> | ||
| </disk1> |
There was a problem hiding this comment.
bad closing tag here, must be </default>
|
|
||
| ClickHouse selects the disk with the most available space and writes to that disk. | ||
|
|
||
| Changing to least_used when even disk space consumption is desirable or when you have a JBOD volume with differing disk sizes. To prevent hot-spots, it is best to set this policy on a fresh volume or on a volume that has already been (re)balanced. |
There was a problem hiding this comment.
| Changing to least_used when even disk space consumption is desirable or when you have a JBOD volume with differing disk sizes. To prevent hot-spots, it is best to set this policy on a fresh volume or on a volume that has already been (re)balanced. | |
| Changing to `least_used` when even disk space consumption is desirable or when you have a JBOD volume with differing disk sizes. To prevent hot-spots, it is best to set this policy on a fresh volume or on a volume that has already been (re)balanced. |
|
|
||
| This is the default setting and is most effective when parts created on insert are roughly the same size. | ||
|
|
||
| Drawbacks: may lead to disk skew |
There was a problem hiding this comment.
good to explain what is disk skew, so newbies won't have to check on net and stay on kb more
|
|
||
| Changing to least_used when even disk space consumption is desirable or when you have a JBOD volume with differing disk sizes. To prevent hot-spots, it is best to set this policy on a fresh volume or on a volume that has already been (re)balanced. | ||
|
|
||
| Drawbacks: may lead to hot-spots |
There was a problem hiding this comment.
good to explain what are hot-spots, so newbies won't have to check on net and stay on kb more
| arrayJoin(arrayRemove(v.disks, parts.disk_name)) AS other_disk_candidate, | ||
| candidate_disks.free_space AS candidate_disk_free_space | ||
| FROM system.parts AS parts | ||
| INNER JOIN ( SELECT database, `table`, storage_policy FROM system.tables where (name LIKE target_tables) AND (database LIKE target_databases) group by 1, 2, 3 ) AS sp ON sp.`table` = parts.`table` AND sp.database = parts.database |
There was a problem hiding this comment.
system.tables got table on June 21, 2024, better use name
| parts.disk_name as part_disk_name, | ||
| parts.bytes_on_disk AS part_bytes_on_disk, | ||
| sp.storage_policy as part_storage_policy, | ||
| arrayJoin(arrayRemove(v.disks, parts.disk_name)) AS other_disk_candidate, |
There was a problem hiding this comment.
The query joins every JBOD volume in a storage policy and does not constrain candidates to the part's current volume. Because system.storage_policies has one row per volume, it can emit invalid cross-volume MOVE PART ... TO DISK statements.
arrayRemove(v.disks, parts.disk_name) only removes parts.disk_name from the disks of the currently joined v row. It does not prove that v is the part's current volume.
Example:
- Policy p1
- Volume hot_jbod: ['disk1','disk2']
- Volume cold_jbod: ['disk3','disk4']
- Part is on disk1
Because the join is only:
ON sp.storage_policy = v.policy_name
that part joins to both volume rows.
Then:
- For hot_jbod:
arrayRemove(['disk1','disk2'], 'disk1') = ['disk2']- For cold_jbod:
arrayRemove(['disk3','disk4'], 'disk1') = ['disk3','disk4']So arrayJoin(...) produces these candidates:
- disk2 [hot_jbod]
- disk3 [cold_jbod]
- disk4 [cold_jbod]
That means cross-volume disks still enter the candidate set.
| Configurations that can affect disk selected: | ||
|
|
||
| - storage policy volume configuration: `least_used_ttl_ms`. Only applies to `least_used` policy, 60s default. | ||
| - disk setting: `keep_free_space_bytes` , `keep_free_space_ratio` |
There was a problem hiding this comment.
Maybe add notes on what affects volume selecting (because it happens before disk balancing):
volume_priority: earlier volumes are filled first, subject to other storage-policy rulesmax_data_part_size_bytes: limits which volume can accept a part of a given sizemove_factor: affects automatic movement to the next volume when free space drops below the configured threshold
| linkTitle: "MultiDisk (JBOD) Balancing" | ||
| --- | ||
|
|
||
| ClickHouse provides two options to balance an insert across disks in a volume with more than one disk: `round_robin` and `least_used` . |
There was a problem hiding this comment.
| ClickHouse provides two options to balance an insert across disks in a volume with more than one disk: `round_robin` and `least_used` . | |
| ClickHouse provides two options to balance an insert across disks in a volume with more than one disk: `round_robin` and `least_used`. |
I have read the CLA Document and I hereby sign the CLA