@@ -561,12 +561,14 @@ paths:
561561 upon several possible configurable formulas as defined by their types.
562562 The supported types are:
563563
564- - `"tier"`: consolidate based on segment byte size and live
564+ - `"tier"`: consolidate based on segment byte size skew and live
565565 document count as dictated by the customization attributes.
566566 type: string
567567 default: tier
568568 segmentsBytesFloor:
569569 description: |
570+ This option is only available up to v3.12.6:
571+
570572 Defines the value (in bytes) to treat all smaller segments as equal for
571573 consolidation selection.
572574 type: integer
@@ -578,21 +580,86 @@ paths:
578580 default: 8589934592
579581 segmentsMax:
580582 description: |
583+ This option is only available up to v3.12.6:
584+
581585 The maximum number of segments that are evaluated as candidates for
582586 consolidation.
583587 type: integer
584588 default: 200
585589 segmentsMin:
586590 description: |
591+ This option is only available up to v3.12.6:
592+
587593 The minimum number of segments that are evaluated as candidates for
588594 consolidation.
589595 type: integer
590596 default: 50
591597 minScore:
592598 description: |
599+ This option is only available up to v3.12.6:
600+
593601 Filter out consolidation candidates with a score less than this.
594602 type: integer
595603 default: 0
604+ maxSkewThreshold:
605+ description: |
606+ This option is available from v3.12.7 onward:
607+
608+ The skew describes how much segment files vary in file size. It is a number
609+ between `0.0` and `1.0` and is calculated by dividing the largest file size
610+ of a set of segment files by the total size. For example, the skew of a
611+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
612+
613+ A large `maxSkewThreshold` value allows merging large segment files with
614+ smaller ones, consolidation occurs more frequently, and there are fewer
615+ segment files on disk at all times. While this may potentially improve the
616+ read performance and use fewer file descriptors, frequent consolidations
617+ cause a higher write load and thus a higher write amplification.
618+
619+ On the other hand, a small threshold value triggers the consolidation only
620+ when there are a large number of segment files that don't vary in size a lot.
621+ Consolidation occurs less frequently, reducing the write amplification, but
622+ it can result in a greater number of segment files on disk.
623+
624+ Multiple combinations of candidate segments are checked and the one with
625+ the lowest skew value is selected for consolidation. The selection process
626+ picks the greatest number of segments that together have the lowest skew value
627+ while ensuring that the size of the new consolidated segment remains under
628+ the configured `segmentsBytesMax`.
629+ type: number
630+ minimum: 0.0
631+ maximum: 1.0
632+ default: 0.4
633+ minDeletionRatio:
634+ description: |
635+ This option is available from v3.12.7 onward:
636+
637+ The `minDeletionRatio` represents the minimum required deletion ratio
638+ in one or more segments to perform a cleanup of those segments.
639+ It is a number between `0.0` and `1.0`.
640+
641+ The deletion ratio is the percentage of deleted documents across one or
642+ more segment files and is calculated by dividing the number of deleted
643+ documents by the total number of documents in a segment or a group of
644+ segments. For example, if there is a segment with 1000 documents of which
645+ 300 are deleted and another segment with 1000 documents of which 700 are
646+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
647+
648+ The `minDeletionRatio` threshold must be carefully selected. A smaller
649+ value leads to earlier cleanup of deleted documents from segments and
650+ thus reclamation of disk space but it generates a higher write load.
651+ A very large value lowers the write amplification but at the same time
652+ the system can be left with a large number of segment files with a high
653+ percentage of deleted documents that occupy disk space unnecessarily.
654+
655+ During cleanup, the segment files are first arranged in decreasing
656+ order of their individual deletion ratios. Then the largest subset of
657+ segments whose collective deletion ratio is greater than or equal to
658+ `minDeletionRatio` is picked.
659+ type: integer
660+ minimum: 0.0
661+ maximum: 1.0
662+ default: 0.5
596663 writebufferIdle:
597664 description: |
598665 Maximum number of writers (segments) cached in the pool
0 commit comments