diff --git a/site/content/arangodb/3.12/develop/http-api/indexes/inverted.md b/site/content/arangodb/3.12/develop/http-api/indexes/inverted.md
index d2c5939c25..1120c9dd4a 100644
--- a/site/content/arangodb/3.12/develop/http-api/indexes/inverted.md
+++ b/site/content/arangodb/3.12/develop/http-api/indexes/inverted.md
@@ -561,12 +561,14 @@ paths:
upon several possible configurable formulas as defined by their types.
The supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
type: string
default: tier
segmentsBytesFloor:
description: |
+ This option is only available up to v3.12.6:
+
Defines the value (in bytes) to treat all smaller segments as equal for
consolidation selection.
type: integer
@@ -578,21 +580,86 @@ paths:
default: 8589934592
segmentsMax:
description: |
+ This option is only available up to v3.12.6:
+
The maximum number of segments that are evaluated as candidates for
consolidation.
type: integer
default: 200
segmentsMin:
description: |
+ This option is only available up to v3.12.6:
+
The minimum number of segments that are evaluated as candidates for
consolidation.
type: integer
default: 50
minScore:
description: |
+ This option is only available up to v3.12.6:
+
Filter out consolidation candidates with a score less than this.
type: integer
default: 0
+ maxSkewThreshold:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.4
+ minDeletionRatio:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
+ type: integer
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.5
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool
diff --git a/site/content/arangodb/3.12/develop/http-api/views/arangosearch-views.md b/site/content/arangodb/3.12/develop/http-api/views/arangosearch-views.md
index 2f33e5c772..c8951af785 100644
--- a/site/content/arangodb/3.12/develop/http-api/views/arangosearch-views.md
+++ b/site/content/arangodb/3.12/develop/http-api/views/arangosearch-views.md
@@ -307,7 +307,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
_Background:_
@@ -330,7 +331,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -348,6 +349,8 @@ paths:
maximum: 1.0
segmentsBytesFloor:
description: |
+ This option is only available up to v3.12.6:
+
Defines the value (in bytes) to treat all smaller segments
as equal for consolidation selection.
type: integer
@@ -359,21 +362,86 @@ paths:
default: 8589934592
segmentsMax:
description: |
+ This option is only available up to v3.12.6:
+
The maximum number of segments that are evaluated as
candidates for consolidation.
type: integer
default: 200
segmentsMin:
description: |
+ This option is only available up to v3.12.6:
+
The minimum number of segments that are
evaluated as candidates for consolidation
type: integer
default: 50
minScore:
description: |
+ This option is only available up to v3.12.6:
+
Filter out consolidation candidates with a score less than this.
type: integer
default: 0
+ maxSkewThreshold:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.4
+ minDeletionRatio:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
+ type: integer
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.5
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool
@@ -544,7 +612,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
type: object
properties:
@@ -553,7 +622,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -569,6 +638,8 @@ paths:
maximum: 1.0
segmentsBytesFloor:
description: |
+ This option is only available up to v3.12.6:
+
Defines the value (in bytes) to treat all smaller segments
as equal for consolidation selection.
type: integer
@@ -578,18 +649,81 @@ paths:
type: integer
segmentsMax:
description: |
+ This option is only available up to v3.12.6:
+
The maximum number of segments that are evaluated as
candidates for consolidation.
type: integer
segmentsMin:
description: |
+ This option is only available up to v3.12.6:
+
The minimum number of segments that are
evaluated as candidates for consolidation
type: integer
minScore:
description: |
+ This option is only available up to v3.12.6:
+
Filter out consolidation candidates with a score less than this.
type: integer
+ maxSkewThreshold:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ minDeletionRatio:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
+ type: integer
+ minimum: 0.0
+ maximum: 1.0
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool (`0` = disabled).
@@ -1016,7 +1150,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
type: object
properties:
@@ -1025,7 +1160,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -1041,6 +1176,8 @@ paths:
maximum: 1.0
segmentsBytesFloor:
description: |
+ This option is only available up to v3.12.6:
+
Defines the value (in bytes) to treat all smaller segments
as equal for consolidation selection.
type: integer
@@ -1050,18 +1187,81 @@ paths:
type: integer
segmentsMax:
description: |
+ This option is only available up to v3.12.6:
+
The maximum number of segments that are evaluated as
candidates for consolidation.
type: integer
segmentsMin:
description: |
+ This option is only available up to v3.12.6:
+
The minimum number of segments that are
evaluated as candidates for consolidation
type: integer
minScore:
description: |
+ This option is only available up to v3.12.6:
+
Filter out consolidation candidates with a score less than this.
type: integer
+ maxSkewThreshold:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ minDeletionRatio:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
+ type: integer
+ minimum: 0.0
+ maximum: 1.0
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool (`0` = disabled).
@@ -1403,7 +1603,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
_Background:_
@@ -1426,7 +1627,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -1444,6 +1645,8 @@ paths:
maximum: 1.0
segmentsBytesFloor:
description: |
+ This option is only available up to v3.12.6:
+
Defines the value (in bytes) to treat all smaller segments
as equal for consolidation selection.
type: integer
@@ -1455,21 +1658,86 @@ paths:
default: 8589934592
segmentsMax:
description: |
+ This option is only available up to v3.12.6:
+
The maximum number of segments that are evaluated as
candidates for consolidation.
type: integer
default: 200
segmentsMin:
description: |
+ This option is only available up to v3.12.6:
+
The minimum number of segments that are
evaluated as candidates for consolidation
type: integer
default: 50
minScore:
description: |
+ This option is only available up to v3.12.6:
+
Filter out consolidation candidates with a score less than this.
type: integer
default: 0
+ maxSkewThreshold:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.4
+ minDeletionRatio:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
+ type: integer
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.5
responses:
'200':
description: |
@@ -1618,7 +1886,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
type: object
properties:
@@ -1627,7 +1896,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -1643,6 +1912,8 @@ paths:
maximum: 1.0
segmentsBytesFloor:
description: |
+ This option is only available up to v3.12.6:
+
Defines the value (in bytes) to treat all smaller segments
as equal for consolidation selection.
type: integer
@@ -1652,18 +1923,81 @@ paths:
type: integer
segmentsMax:
description: |
+ This option is only available up to v3.12.6:
+
The maximum number of segments that are evaluated as
candidates for consolidation.
type: integer
segmentsMin:
description: |
+ This option is only available up to v3.12.6:
+
The minimum number of segments that are
evaluated as candidates for consolidation
type: integer
minScore:
description: |
+ This option is only available up to v3.12.6:
+
Filter out consolidation candidates with a score less than this.
type: integer
+ maxSkewThreshold:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ minDeletionRatio:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
+ type: integer
+ minimum: 0.0
+ maximum: 1.0
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool (`0` = disabled).
@@ -1912,7 +2246,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
_Background:_
@@ -1935,7 +2270,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -1952,6 +2287,8 @@ paths:
maximum: 1.0
segmentsBytesFloor:
description: |
+ This option is only available up to v3.12.6:
+
Defines the value (in bytes) to treat all smaller segments
as equal for consolidation selection.
type: integer
@@ -1963,21 +2300,86 @@ paths:
default: 8589934592
segmentsMax:
description: |
+ This option is only available up to v3.12.6:
+
The maximum number of segments that are evaluated as
candidates for consolidation.
type: integer
default: 200
segmentsMin:
description: |
+ This option is only available up to v3.12.6:
+
The minimum number of segments that are
evaluated as candidates for consolidation
type: integer
default: 50
minScore:
description: |
+ This option is only available up to v3.12.6:
+
Filter out consolidation candidates with a score less than this.
type: integer
default: 0
+ maxSkewThreshold:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.4
+ minDeletionRatio:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
+ type: integer
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.5
responses:
'200':
description: |
@@ -2126,7 +2528,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
type: object
properties:
@@ -2135,7 +2538,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -2151,6 +2554,8 @@ paths:
maximum: 1.0
segmentsBytesFloor:
description: |
+ This option is only available up to v3.12.6:
+
Defines the value (in bytes) to treat all smaller segments
as equal for consolidation selection.
type: integer
@@ -2160,18 +2565,81 @@ paths:
type: integer
segmentsMax:
description: |
+ This option is only available up to v3.12.6:
+
The maximum number of segments that are evaluated as
candidates for consolidation.
type: integer
segmentsMin:
description: |
+ This option is only available up to v3.12.6:
+
The minimum number of segments that are
evaluated as candidates for consolidation
type: integer
minScore:
description: |
+ This option is only available up to v3.12.6:
+
Filter out consolidation candidates with a score less than this.
type: integer
+ maxSkewThreshold:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ minDeletionRatio:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
+ type: integer
+ minimum: 0.0
+ maximum: 1.0
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool (`0` = disabled).
@@ -2493,7 +2961,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
type: object
properties:
@@ -2502,7 +2971,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -2518,6 +2987,8 @@ paths:
maximum: 1.0
segmentsBytesFloor:
description: |
+ This option is only available up to v3.12.6:
+
Defines the value (in bytes) to treat all smaller segments
as equal for consolidation selection.
type: integer
@@ -2527,18 +2998,81 @@ paths:
type: integer
segmentsMax:
description: |
+ This option is only available up to v3.12.6:
+
The maximum number of segments that are evaluated as
candidates for consolidation.
type: integer
segmentsMin:
description: |
+ This option is only available up to v3.12.6:
+
The minimum number of segments that are
evaluated as candidates for consolidation
type: integer
minScore:
description: |
+ This option is only available up to v3.12.6:
+
Filter out consolidation candidates with a score less than this.
type: integer
+ maxSkewThreshold:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ minDeletionRatio:
+ description: |
+ This option is available from v3.12.7 onward:
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
+ type: integer
+ minimum: 0.0
+ maximum: 1.0
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool (`0` = disabled).
diff --git a/site/content/arangodb/3.12/indexes-and-search/arangosearch/arangosearch-views-reference.md b/site/content/arangodb/3.12/indexes-and-search/arangosearch/arangosearch-views-reference.md
index 036758127f..7c483e2737 100644
--- a/site/content/arangodb/3.12/indexes-and-search/arangosearch/arangosearch-views-reference.md
+++ b/site/content/arangodb/3.12/indexes-and-search/arangosearch/arangosearch-views-reference.md
@@ -462,7 +462,7 @@ is used by these writers (in terms of "writers pool") one can use
- `"bytes_accum"`: Consolidation is performed based on current memory
consumption of segments and `threshold` property value.
- - `"tier"`: Consolidate based on segment byte size and live document count
+ - `"tier"`: consolidate based on segment byte size skew and live document count
as dictated by the customization attributes.
{{< warning >}}
@@ -485,10 +485,14 @@ is used by these writers (in terms of "writers pool") one can use
- **segmentsMin** (_optional_; type: `integer`; default: `50`)
+ This option is only available up to v3.12.6:
+
The minimum number of segments that are evaluated as candidates for consolidation.
- **segmentsMax** (_optional_; type: `integer`; default: `200`)
+ This option is only available up to v3.12.6:
+
The maximum number of segments that are evaluated as candidates for consolidation.
- **segmentsBytesMax** (_optional_; type: `integer`; default: `8589934592`)
@@ -497,9 +501,66 @@ is used by these writers (in terms of "writers pool") one can use
- **segmentsBytesFloor** (_optional_; type: `integer`; default: `25165824`)
+ This option is only available up to v3.12.6:
+
Defines the value (in bytes) to treat all smaller segments as equal for consolidation
selection.
- **minScore** (_optional_; type: `integer`; default: `0`)
+ This option is only available up to v3.12.6:
+
Filter out consolidation candidates with a score less than this.
+
+ - **maxSkewThreshold** (_optional_; type: `number`; default: `0.4`)
+
+ This option is available from v3.12.7 onward:
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+
+ - **minDeletionRatio** (_optional_; type: `number`; default: `0.5`)
+
+ This option is available from v3.12.7 onward:
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
diff --git a/site/content/arangodb/3.12/release-notes/version-3.12/api-changes-in-3-12.md b/site/content/arangodb/3.12/release-notes/version-3.12/api-changes-in-3-12.md
index c00008ab07..16456aaa60 100644
--- a/site/content/arangodb/3.12/release-notes/version-3.12/api-changes-in-3-12.md
+++ b/site/content/arangodb/3.12/release-notes/version-3.12/api-changes-in-3-12.md
@@ -365,6 +365,25 @@ By consolidating less often and with more data, less file descriptors are used.
- `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB)
- `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB)
+##### Added and removed consolidation options for `arangosearch` Views
+
+Introduced in: v3.12.7
+
+The following options for consolidating `arangosearch` Views have been removed
+and are now ignored when specified in a request:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `segmentsMin`
+ - `segmentsMax`
+ - `segmentsBytesFloor`
+ - `minScore`
+
+The following new options have been added:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`)
+ - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`)
+
#### Document API
The following endpoints accept a new `versionAttribute` query parameter that adds
@@ -503,6 +522,25 @@ By consolidating less often and with more data, less file descriptors are used.
- `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB)
- `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB)
+##### Added and removed consolidation options for inverted indexes
+
+Introduced in: v3.12.7
+
+The following options for consolidating inverted indexes have been removed
+and are now ignored when specified in a request:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `segmentsMin`
+ - `segmentsMax`
+ - `segmentsBytesFloor`
+ - `minScore`
+
+The following new options have been added:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`)
+ - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`)
+
#### Optimizer rule descriptions
Introduced in: v3.10.9, v3.11.2
diff --git a/site/content/arangodb/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md b/site/content/arangodb/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md
index 959cea82cf..1fe5a31d18 100644
--- a/site/content/arangodb/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md
+++ b/site/content/arangodb/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md
@@ -994,6 +994,29 @@ more data, less file descriptors are used.
- `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB)
- `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB)
+## Added and removed consolidation options for inverted indexs and `arangosearch` Views
+
+Introduced in: v3.12.7
+
+The following options for consolidating inverted indexes as well as
+`arangosearch` Views have been removed and are now ignored when specified in a request:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `segmentsMin`
+ - `segmentsMax`
+ - `segmentsBytesFloor`
+ - `minScore`
+
+The consolidation works differently now and uses the new `maxSkewThreshold` and
+`minDeletionRatio` options together with the existing `segmentsBytesMax`. If you
+previously used customized settings for the removed options, check if the default
+values of the new options are acceptable or if you need to tune them according to
+your workload.
+
+For details, see:
+- [HTTP interface for inverted indexes](../../develop/http-api/indexes/inverted.md)
+- [`arangosearch` View properties](../../indexes-and-search/arangosearch/arangosearch-views-reference.md#view-properties)
+
## HTTP RESTful API
### JavaScript-based traversal using `/_api/traversal` removed
diff --git a/site/content/arangodb/3.12/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/arangodb/3.12/release-notes/version-3.12/whats-new-in-3-12.md
index e1e7bced88..e550246d34 100644
--- a/site/content/arangodb/3.12/release-notes/version-3.12/whats-new-in-3-12.md
+++ b/site/content/arangodb/3.12/release-notes/version-3.12/whats-new-in-3-12.md
@@ -2509,6 +2509,38 @@ environment variable `NAME`. If there is an environment variable called `PID` or
`TEMP_BASE_DIR`, then `@PID@` or `@TEMP_BASE_DIR@` is substituted with the
value of the respective environment variable.
+### New consolidation algorithm for inverted indexes and `arangosearch` Views
+
+Introduced in: v3.12.7
+
+The `tier` consolidation policy now uses a different algorithm for merging
+and cleaning up segments. Overall, it avoids consolidating segments where the
+cost of writing the new segment is high and the gain in read performance is low
+(e.g. combining a big segment file with a very small one).
+
+The following options have been removed for inverted indexes as well as
+`arangosearch` Views because the new consolidation algorithm doesn't use them:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `segmentsMin`
+ - `segmentsMax`
+ - `segmentsBytesFloor`
+ - `minScore`
+
+The following new options have been added:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`)
+ - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`)
+
+If you previously used customized settings for the removed options, check if the
+default values of the new options are acceptable or if you need to tune them
+according to your workload.
+
+For details, see:
+- [HTTP interface for inverted indexes](../../develop/http-api/indexes/inverted.md)
+- [`arangosearch` View properties](../../indexes-and-search/arangosearch/arangosearch-views-reference.md#view-properties)
+
### Deployment metadata metrics
Introduced in: v3.12.7
diff --git a/site/content/arangodb/4.0/develop/http-api/indexes/inverted.md b/site/content/arangodb/4.0/develop/http-api/indexes/inverted.md
index d2c5939c25..77e860c288 100644
--- a/site/content/arangodb/4.0/develop/http-api/indexes/inverted.md
+++ b/site/content/arangodb/4.0/develop/http-api/indexes/inverted.md
@@ -561,38 +561,74 @@ paths:
upon several possible configurable formulas as defined by their types.
The supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
type: string
default: tier
- segmentsBytesFloor:
- description: |
- Defines the value (in bytes) to treat all smaller segments as equal for
- consolidation selection.
- type: integer
- default: 25165824
segmentsBytesMax:
description: |
The maximum allowed size of all consolidated segments in bytes.
type: integer
default: 8589934592
- segmentsMax:
+ maxSkewThreshold:
description: |
- The maximum number of segments that are evaluated as candidates for
- consolidation.
- type: integer
- default: 200
- segmentsMin:
- description: |
- The minimum number of segments that are evaluated as candidates for
- consolidation.
- type: integer
- default: 50
- minScore:
+ This option is available from v3.12.7 onward:
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.4
+ minDeletionRatio:
description: |
- Filter out consolidation candidates with a score less than this.
+ This option is available from v3.12.7 onward:
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
type: integer
- default: 0
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.5
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool
diff --git a/site/content/arangodb/4.0/develop/http-api/views/arangosearch-views.md b/site/content/arangodb/4.0/develop/http-api/views/arangosearch-views.md
index 2f33e5c772..5c3d863fcb 100644
--- a/site/content/arangodb/4.0/develop/http-api/views/arangosearch-views.md
+++ b/site/content/arangodb/4.0/develop/http-api/views/arangosearch-views.md
@@ -307,7 +307,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
_Background:_
@@ -330,7 +331,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -346,34 +347,66 @@ paths:
default: 0
minimum: 0.0
maximum: 1.0
- segmentsBytesFloor:
- description: |
- Defines the value (in bytes) to treat all smaller segments
- as equal for consolidation selection.
- type: integer
- default: 25165824
segmentsBytesMax:
description: |
Maximum allowed size of all consolidated segments in bytes.
type: integer
default: 8589934592
- segmentsMax:
- description: |
- The maximum number of segments that are evaluated as
- candidates for consolidation.
- type: integer
- default: 200
- segmentsMin:
+ maxSkewThreshold:
description: |
- The minimum number of segments that are
- evaluated as candidates for consolidation
- type: integer
- default: 50
- minScore:
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.4
+ minDeletionRatio:
description: |
- Filter out consolidation candidates with a score less than this.
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
type: integer
- default: 0
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.5
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool
@@ -544,7 +577,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
type: object
properties:
@@ -553,7 +587,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -567,29 +601,63 @@ paths:
type: number
minimum: 0.0
maximum: 1.0
- segmentsBytesFloor:
- description: |
- Defines the value (in bytes) to treat all smaller segments
- as equal for consolidation selection.
- type: integer
segmentsBytesMax:
description: |
Maximum allowed size of all consolidated segments in bytes.
type: integer
- segmentsMax:
- description: |
- The maximum number of segments that are evaluated as
- candidates for consolidation.
- type: integer
- segmentsMin:
+ maxSkewThreshold:
description: |
- The minimum number of segments that are
- evaluated as candidates for consolidation
- type: integer
- minScore:
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ minDeletionRatio:
description: |
- Filter out consolidation candidates with a score less than this.
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
type: integer
+ minimum: 0.0
+ maximum: 1.0
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool (`0` = disabled).
@@ -1016,7 +1084,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
type: object
properties:
@@ -1025,7 +1094,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -1039,29 +1108,63 @@ paths:
type: number
minimum: 0.0
maximum: 1.0
- segmentsBytesFloor:
- description: |
- Defines the value (in bytes) to treat all smaller segments
- as equal for consolidation selection.
- type: integer
segmentsBytesMax:
description: |
Maximum allowed size of all consolidated segments in bytes.
type: integer
- segmentsMax:
- description: |
- The maximum number of segments that are evaluated as
- candidates for consolidation.
- type: integer
- segmentsMin:
+ maxSkewThreshold:
description: |
- The minimum number of segments that are
- evaluated as candidates for consolidation
- type: integer
- minScore:
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ minDeletionRatio:
description: |
- Filter out consolidation candidates with a score less than this.
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
type: integer
+ minimum: 0.0
+ maximum: 1.0
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool (`0` = disabled).
@@ -1403,7 +1506,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
_Background:_
@@ -1426,7 +1530,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -1442,34 +1546,66 @@ paths:
default: 0
minimum: 0.0
maximum: 1.0
- segmentsBytesFloor:
- description: |
- Defines the value (in bytes) to treat all smaller segments
- as equal for consolidation selection.
- type: integer
- default: 25165824
segmentsBytesMax:
description: |
Maximum allowed size of all consolidated segments in bytes.
type: integer
default: 8589934592
- segmentsMax:
- description: |
- The maximum number of segments that are evaluated as
- candidates for consolidation.
- type: integer
- default: 200
- segmentsMin:
+ maxSkewThreshold:
description: |
- The minimum number of segments that are
- evaluated as candidates for consolidation
- type: integer
- default: 50
- minScore:
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.4
+ minDeletionRatio:
description: |
- Filter out consolidation candidates with a score less than this.
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
type: integer
- default: 0
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.5
responses:
'200':
description: |
@@ -1618,7 +1754,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
type: object
properties:
@@ -1627,7 +1764,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -1641,29 +1778,63 @@ paths:
type: number
minimum: 0.0
maximum: 1.0
- segmentsBytesFloor:
- description: |
- Defines the value (in bytes) to treat all smaller segments
- as equal for consolidation selection.
- type: integer
segmentsBytesMax:
description: |
Maximum allowed size of all consolidated segments in bytes.
type: integer
- segmentsMax:
- description: |
- The maximum number of segments that are evaluated as
- candidates for consolidation.
- type: integer
- segmentsMin:
+ maxSkewThreshold:
description: |
- The minimum number of segments that are
- evaluated as candidates for consolidation
- type: integer
- minScore:
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ minDeletionRatio:
description: |
- Filter out consolidation candidates with a score less than this.
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
type: integer
+ minimum: 0.0
+ maximum: 1.0
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool (`0` = disabled).
@@ -1912,7 +2083,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
_Background:_
@@ -1935,7 +2107,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -1950,34 +2122,66 @@ paths:
default: 0
minimum: 0.0
maximum: 1.0
- segmentsBytesFloor:
- description: |
- Defines the value (in bytes) to treat all smaller segments
- as equal for consolidation selection.
- type: integer
- default: 25165824
segmentsBytesMax:
description: |
Maximum allowed size of all consolidated segments in bytes.
type: integer
default: 8589934592
- segmentsMax:
- description: |
- The maximum number of segments that are evaluated as
- candidates for consolidation.
- type: integer
- default: 200
- segmentsMin:
+ maxSkewThreshold:
description: |
- The minimum number of segments that are
- evaluated as candidates for consolidation
- type: integer
- default: 50
- minScore:
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.4
+ minDeletionRatio:
description: |
- Filter out consolidation candidates with a score less than this.
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
type: integer
- default: 0
+ minimum: 0.0
+ maximum: 1.0
+ default: 0.5
responses:
'200':
description: |
@@ -2126,7 +2330,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
type: object
properties:
@@ -2135,7 +2340,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -2149,29 +2354,63 @@ paths:
type: number
minimum: 0.0
maximum: 1.0
- segmentsBytesFloor:
- description: |
- Defines the value (in bytes) to treat all smaller segments
- as equal for consolidation selection.
- type: integer
segmentsBytesMax:
description: |
Maximum allowed size of all consolidated segments in bytes.
type: integer
- segmentsMax:
- description: |
- The maximum number of segments that are evaluated as
- candidates for consolidation.
- type: integer
- segmentsMin:
+ maxSkewThreshold:
description: |
- The minimum number of segments that are
- evaluated as candidates for consolidation
- type: integer
- minScore:
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ minDeletionRatio:
description: |
- Filter out consolidation candidates with a score less than this.
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
type: integer
+ minimum: 0.0
+ maximum: 1.0
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool (`0` = disabled).
@@ -2493,7 +2732,8 @@ paths:
description: |
The consolidation policy to apply for selecting which segments should be merged.
- - If the `tier` type is used, then the `segments*` and `minScore` properties are available.
+ - If the `tier` type is used, then the `maxSkewThreshold`,
+ `minDeletionRatio`, `segments*`, and `minScore` properties are available.
- If the `bytes_accum` type is used, then the `threshold` property is available.
type: object
properties:
@@ -2502,7 +2742,7 @@ paths:
The segment candidates for the "consolidation" operation are selected based
upon several possible configurable formulas as defined by their types.
The currently supported types are:
- - `"tier"`: consolidate based on segment byte size and live
+ - `"tier"`: consolidate based on segment byte size skew and live
document count as dictated by the customization attributes.
- `"bytes_accum"`: consolidate if and only if
`{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes`
@@ -2516,29 +2756,63 @@ paths:
type: number
minimum: 0.0
maximum: 1.0
- segmentsBytesFloor:
- description: |
- Defines the value (in bytes) to treat all smaller segments
- as equal for consolidation selection.
- type: integer
segmentsBytesMax:
description: |
Maximum allowed size of all consolidated segments in bytes.
type: integer
- segmentsMax:
- description: |
- The maximum number of segments that are evaluated as
- candidates for consolidation.
- type: integer
- segmentsMin:
+ maxSkewThreshold:
description: |
- The minimum number of segments that are
- evaluated as candidates for consolidation
- type: integer
- minScore:
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+ type: number
+ minimum: 0.0
+ maximum: 1.0
+ minDeletionRatio:
description: |
- Filter out consolidation candidates with a score less than this.
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
type: integer
+ minimum: 0.0
+ maximum: 1.0
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool (`0` = disabled).
diff --git a/site/content/arangodb/4.0/indexes-and-search/arangosearch/arangosearch-views-reference.md b/site/content/arangodb/4.0/indexes-and-search/arangosearch/arangosearch-views-reference.md
index 036758127f..e0c9ef42ed 100644
--- a/site/content/arangodb/4.0/indexes-and-search/arangosearch/arangosearch-views-reference.md
+++ b/site/content/arangodb/4.0/indexes-and-search/arangosearch/arangosearch-views-reference.md
@@ -462,7 +462,7 @@ is used by these writers (in terms of "writers pool") one can use
- `"bytes_accum"`: Consolidation is performed based on current memory
consumption of segments and `threshold` property value.
- - `"tier"`: Consolidate based on segment byte size and live document count
+ - `"tier"`: consolidate based on segment byte size skew and live document count
as dictated by the customization attributes.
{{< warning >}}
@@ -483,23 +483,55 @@ is used by these writers (in terms of "writers pool") one can use
`consolidationPolicy` properties for `"tier"` type:
- - **segmentsMin** (_optional_; type: `integer`; default: `50`)
-
- The minimum number of segments that are evaluated as candidates for consolidation.
-
- - **segmentsMax** (_optional_; type: `integer`; default: `200`)
-
- The maximum number of segments that are evaluated as candidates for consolidation.
-
- **segmentsBytesMax** (_optional_; type: `integer`; default: `8589934592`)
Maximum allowed size of all consolidated segments in bytes.
- - **segmentsBytesFloor** (_optional_; type: `integer`; default: `25165824`)
-
- Defines the value (in bytes) to treat all smaller segments as equal for consolidation
- selection.
-
- - **minScore** (_optional_; type: `integer`; default: `0`)
-
- Filter out consolidation candidates with a score less than this.
+ - **maxSkewThreshold** (_optional_; type: `number`; default: `0.4`)
+
+ The skew describes how much segment files vary in file size. It is a number
+ between `0.0` and `1.0` and is calculated by dividing the largest file size
+ of a set of segment files by the total size. For example, the skew of a
+ 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`).
+
+ A large `maxSkewThreshold` value allows merging large segment files with
+ smaller ones, consolidation occurs more frequently, and there are fewer
+ segment files on disk at all times. While this may potentially improve the
+ read performance and use fewer file descriptors, frequent consolidations
+ cause a higher write load and thus a higher write amplification.
+
+ On the other hand, a small threshold value triggers the consolidation only
+ when there are a large number of segment files that don't vary in size a lot.
+ Consolidation occurs less frequently, reducing the write amplification, but
+ it can result in a greater number of segment files on disk.
+
+ Multiple combinations of candidate segments are checked and the one with
+ the lowest skew value is selected for consolidation. The selection process
+ picks the greatest number of segments that together have the lowest skew value
+ while ensuring that the size of the new consolidated segment remains under
+ the configured `segmentsBytesMax`.
+
+ - **minDeletionRatio** (_optional_; type: `number`; default: `0.5`)
+
+ The `minDeletionRatio` represents the minimum required deletion ratio
+ in one or more segments to perform a cleanup of those segments.
+ It is a number between `0.0` and `1.0`.
+
+ The deletion ratio is the percentage of deleted documents across one or
+ more segment files and is calculated by dividing the number of deleted
+ documents by the total number of documents in a segment or a group of
+ segments. For example, if there is a segment with 1000 documents of which
+ 300 are deleted and another segment with 1000 documents of which 700 are
+ deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`).
+
+ The `minDeletionRatio` threshold must be carefully selected. A smaller
+ value leads to earlier cleanup of deleted documents from segments and
+ thus reclamation of disk space but it generates a higher write load.
+ A very large value lowers the write amplification but at the same time
+ the system can be left with a large number of segment files with a high
+ percentage of deleted documents that occupy disk space unnecessarily.
+
+ During cleanup, the segment files are first arranged in decreasing
+ order of their individual deletion ratios. Then the largest subset of
+ segments whose collective deletion ratio is greater than or equal to
+ `minDeletionRatio` is picked.
diff --git a/site/content/arangodb/4.0/release-notes/version-3.12/api-changes-in-3-12.md b/site/content/arangodb/4.0/release-notes/version-3.12/api-changes-in-3-12.md
index c00008ab07..16456aaa60 100644
--- a/site/content/arangodb/4.0/release-notes/version-3.12/api-changes-in-3-12.md
+++ b/site/content/arangodb/4.0/release-notes/version-3.12/api-changes-in-3-12.md
@@ -365,6 +365,25 @@ By consolidating less often and with more data, less file descriptors are used.
- `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB)
- `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB)
+##### Added and removed consolidation options for `arangosearch` Views
+
+Introduced in: v3.12.7
+
+The following options for consolidating `arangosearch` Views have been removed
+and are now ignored when specified in a request:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `segmentsMin`
+ - `segmentsMax`
+ - `segmentsBytesFloor`
+ - `minScore`
+
+The following new options have been added:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`)
+ - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`)
+
#### Document API
The following endpoints accept a new `versionAttribute` query parameter that adds
@@ -503,6 +522,25 @@ By consolidating less often and with more data, less file descriptors are used.
- `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB)
- `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB)
+##### Added and removed consolidation options for inverted indexes
+
+Introduced in: v3.12.7
+
+The following options for consolidating inverted indexes have been removed
+and are now ignored when specified in a request:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `segmentsMin`
+ - `segmentsMax`
+ - `segmentsBytesFloor`
+ - `minScore`
+
+The following new options have been added:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`)
+ - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`)
+
#### Optimizer rule descriptions
Introduced in: v3.10.9, v3.11.2
diff --git a/site/content/arangodb/4.0/release-notes/version-3.12/incompatible-changes-in-3-12.md b/site/content/arangodb/4.0/release-notes/version-3.12/incompatible-changes-in-3-12.md
index 959cea82cf..1fe5a31d18 100644
--- a/site/content/arangodb/4.0/release-notes/version-3.12/incompatible-changes-in-3-12.md
+++ b/site/content/arangodb/4.0/release-notes/version-3.12/incompatible-changes-in-3-12.md
@@ -994,6 +994,29 @@ more data, less file descriptors are used.
- `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB)
- `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB)
+## Added and removed consolidation options for inverted indexs and `arangosearch` Views
+
+Introduced in: v3.12.7
+
+The following options for consolidating inverted indexes as well as
+`arangosearch` Views have been removed and are now ignored when specified in a request:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `segmentsMin`
+ - `segmentsMax`
+ - `segmentsBytesFloor`
+ - `minScore`
+
+The consolidation works differently now and uses the new `maxSkewThreshold` and
+`minDeletionRatio` options together with the existing `segmentsBytesMax`. If you
+previously used customized settings for the removed options, check if the default
+values of the new options are acceptable or if you need to tune them according to
+your workload.
+
+For details, see:
+- [HTTP interface for inverted indexes](../../develop/http-api/indexes/inverted.md)
+- [`arangosearch` View properties](../../indexes-and-search/arangosearch/arangosearch-views-reference.md#view-properties)
+
## HTTP RESTful API
### JavaScript-based traversal using `/_api/traversal` removed
diff --git a/site/content/arangodb/4.0/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/arangodb/4.0/release-notes/version-3.12/whats-new-in-3-12.md
index e1e7bced88..e550246d34 100644
--- a/site/content/arangodb/4.0/release-notes/version-3.12/whats-new-in-3-12.md
+++ b/site/content/arangodb/4.0/release-notes/version-3.12/whats-new-in-3-12.md
@@ -2509,6 +2509,38 @@ environment variable `NAME`. If there is an environment variable called `PID` or
`TEMP_BASE_DIR`, then `@PID@` or `@TEMP_BASE_DIR@` is substituted with the
value of the respective environment variable.
+### New consolidation algorithm for inverted indexes and `arangosearch` Views
+
+Introduced in: v3.12.7
+
+The `tier` consolidation policy now uses a different algorithm for merging
+and cleaning up segments. Overall, it avoids consolidating segments where the
+cost of writing the new segment is high and the gain in read performance is low
+(e.g. combining a big segment file with a very small one).
+
+The following options have been removed for inverted indexes as well as
+`arangosearch` Views because the new consolidation algorithm doesn't use them:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `segmentsMin`
+ - `segmentsMax`
+ - `segmentsBytesFloor`
+ - `minScore`
+
+The following new options have been added:
+
+- `consolidationPolicy` (with `type` set to `tier`):
+ - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`)
+ - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`)
+
+If you previously used customized settings for the removed options, check if the
+default values of the new options are acceptable or if you need to tune them
+according to your workload.
+
+For details, see:
+- [HTTP interface for inverted indexes](../../develop/http-api/indexes/inverted.md)
+- [`arangosearch` View properties](../../indexes-and-search/arangosearch/arangosearch-views-reference.md#view-properties)
+
### Deployment metadata metrics
Introduced in: v3.12.7