Skip to content

Commit 523355b

Browse files
authored
DOC-754 | Added and removed consolidation options for inverted indexes and arangosearch Views (#850)
* Added and removed consolidation options for inverted indexes and arangosearch Views * Feedback * Update text in two more places * Update consolidation policy description
1 parent 19f41fe commit 523355b

File tree

12 files changed

+1402
-212
lines changed

12 files changed

+1402
-212
lines changed

site/content/arangodb/3.12/develop/http-api/indexes/inverted.md

Lines changed: 68 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -561,12 +561,14 @@ paths:
561561
upon several possible configurable formulas as defined by their types.
562562
The supported types are:
563563
564-
- `"tier"`: consolidate based on segment byte size and live
564+
- `"tier"`: consolidate based on segment byte size skew and live
565565
document count as dictated by the customization attributes.
566566
type: string
567567
default: tier
568568
segmentsBytesFloor:
569569
description: |
570+
This option is only available up to v3.12.6:
571+
570572
Defines the value (in bytes) to treat all smaller segments as equal for
571573
consolidation selection.
572574
type: integer
@@ -578,21 +580,86 @@ paths:
578580
default: 8589934592
579581
segmentsMax:
580582
description: |
583+
This option is only available up to v3.12.6:
584+
581585
The maximum number of segments that are evaluated as candidates for
582586
consolidation.
583587
type: integer
584588
default: 200
585589
segmentsMin:
586590
description: |
591+
This option is only available up to v3.12.6:
592+
587593
The minimum number of segments that are evaluated as candidates for
588594
consolidation.
589595
type: integer
590596
default: 50
591597
minScore:
592598
description: |
599+
This option is only available up to v3.12.6:
600+
593601
Filter out consolidation candidates with a score less than this.
594602
type: integer
595603
default: 0
604+
maxSkewThreshold:
605+
description: |
606+
This option is available from v3.12.7 onward:
607+
608+
The skew describes how much segment files vary in file size. It is a number
609+
between `0.0` and `1.0` and is calculated by dividing the largest file size
610+
of a set of segment files by the total size. For example, the skew of a
611+
200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
612+
613+
A large `maxSkewThreshold` value allows merging large segment files with
614+
smaller ones, consolidation occurs more frequently, and there are fewer
615+
segment files on disk at all times. While this may potentially improve the
616+
read performance and use fewer file descriptors, frequent consolidations
617+
cause a higher write load and thus a higher write amplification.
618+
619+
On the other hand, a small threshold value triggers the consolidation only
620+
when there are a large number of segment files that don't vary in size a lot.
621+
Consolidation occurs less frequently, reducing the write amplification, but
622+
it can result in a greater number of segment files on disk.
623+
624+
Multiple combinations of candidate segments are checked and the one with
625+
the lowest skew value is selected for consolidation. The selection process
626+
picks the greatest number of segments that together have the lowest skew value
627+
while ensuring that the size of the new consolidated segment remains under
628+
the configured `segmentsBytesMax`.
629+
type: number
630+
minimum: 0.0
631+
maximum: 1.0
632+
default: 0.4
633+
minDeletionRatio:
634+
description: |
635+
This option is available from v3.12.7 onward:
636+
637+
The `minDeletionRatio` represents the minimum required deletion ratio
638+
in one or more segments to perform a cleanup of those segments.
639+
It is a number between `0.0` and `1.0`.
640+
641+
The deletion ratio is the percentage of deleted documents across one or
642+
more segment files and is calculated by dividing the number of deleted
643+
documents by the total number of documents in a segment or a group of
644+
segments. For example, if there is a segment with 1000 documents of which
645+
300 are deleted and another segment with 1000 documents of which 700 are
646+
deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
647+
648+
The `minDeletionRatio` threshold must be carefully selected. A smaller
649+
value leads to earlier cleanup of deleted documents from segments and
650+
thus reclamation of disk space but it generates a higher write load.
651+
A very large value lowers the write amplification but at the same time
652+
the system can be left with a large number of segment files with a high
653+
percentage of deleted documents that occupy disk space unnecessarily.
654+
655+
During cleanup, the segment files are first arranged in decreasing
656+
order of their individual deletion ratios. Then the largest subset of
657+
segments whose collective deletion ratio is greater than or equal to
658+
`minDeletionRatio` is picked.
659+
type: integer
660+
minimum: 0.0
661+
maximum: 1.0
662+
default: 0.5
596663
writebufferIdle:
597664
description: |
598665
Maximum number of writers (segments) cached in the pool

0 commit comments

Comments
 (0)