diff --git a/site/content/arangodb/3.12/develop/http-api/indexes/inverted.md b/site/content/arangodb/3.12/develop/http-api/indexes/inverted.md index d2c5939c25..1120c9dd4a 100644 --- a/site/content/arangodb/3.12/develop/http-api/indexes/inverted.md +++ b/site/content/arangodb/3.12/develop/http-api/indexes/inverted.md @@ -561,12 +561,14 @@ paths: upon several possible configurable formulas as defined by their types. The supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. type: string default: tier segmentsBytesFloor: description: | + This option is only available up to v3.12.6: + Defines the value (in bytes) to treat all smaller segments as equal for consolidation selection. type: integer @@ -578,21 +580,86 @@ paths: default: 8589934592 segmentsMax: description: | + This option is only available up to v3.12.6: + The maximum number of segments that are evaluated as candidates for consolidation. type: integer default: 200 segmentsMin: description: | + This option is only available up to v3.12.6: + The minimum number of segments that are evaluated as candidates for consolidation. type: integer default: 50 minScore: description: | + This option is only available up to v3.12.6: + Filter out consolidation candidates with a score less than this. type: integer default: 0 + maxSkewThreshold: + description: | + This option is available from v3.12.7 onward: + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + default: 0.4 + minDeletionRatio: + description: | + This option is available from v3.12.7 onward: + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. + type: integer + minimum: 0.0 + maximum: 1.0 + default: 0.5 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool diff --git a/site/content/arangodb/3.12/develop/http-api/views/arangosearch-views.md b/site/content/arangodb/3.12/develop/http-api/views/arangosearch-views.md index 2f33e5c772..c8951af785 100644 --- a/site/content/arangodb/3.12/develop/http-api/views/arangosearch-views.md +++ b/site/content/arangodb/3.12/develop/http-api/views/arangosearch-views.md @@ -307,7 +307,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. _Background:_ @@ -330,7 +331,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -348,6 +349,8 @@ paths: maximum: 1.0 segmentsBytesFloor: description: | + This option is only available up to v3.12.6: + Defines the value (in bytes) to treat all smaller segments as equal for consolidation selection. type: integer @@ -359,21 +362,86 @@ paths: default: 8589934592 segmentsMax: description: | + This option is only available up to v3.12.6: + The maximum number of segments that are evaluated as candidates for consolidation. type: integer default: 200 segmentsMin: description: | + This option is only available up to v3.12.6: + The minimum number of segments that are evaluated as candidates for consolidation type: integer default: 50 minScore: description: | + This option is only available up to v3.12.6: + Filter out consolidation candidates with a score less than this. type: integer default: 0 + maxSkewThreshold: + description: | + This option is available from v3.12.7 onward: + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + default: 0.4 + minDeletionRatio: + description: | + This option is available from v3.12.7 onward: + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. + type: integer + minimum: 0.0 + maximum: 1.0 + default: 0.5 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool @@ -544,7 +612,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. type: object properties: @@ -553,7 +622,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -569,6 +638,8 @@ paths: maximum: 1.0 segmentsBytesFloor: description: | + This option is only available up to v3.12.6: + Defines the value (in bytes) to treat all smaller segments as equal for consolidation selection. type: integer @@ -578,18 +649,81 @@ paths: type: integer segmentsMax: description: | + This option is only available up to v3.12.6: + The maximum number of segments that are evaluated as candidates for consolidation. type: integer segmentsMin: description: | + This option is only available up to v3.12.6: + The minimum number of segments that are evaluated as candidates for consolidation type: integer minScore: description: | + This option is only available up to v3.12.6: + Filter out consolidation candidates with a score less than this. type: integer + maxSkewThreshold: + description: | + This option is available from v3.12.7 onward: + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + minDeletionRatio: + description: | + This option is available from v3.12.7 onward: + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. + type: integer + minimum: 0.0 + maximum: 1.0 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool (`0` = disabled). @@ -1016,7 +1150,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. type: object properties: @@ -1025,7 +1160,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -1041,6 +1176,8 @@ paths: maximum: 1.0 segmentsBytesFloor: description: | + This option is only available up to v3.12.6: + Defines the value (in bytes) to treat all smaller segments as equal for consolidation selection. type: integer @@ -1050,18 +1187,81 @@ paths: type: integer segmentsMax: description: | + This option is only available up to v3.12.6: + The maximum number of segments that are evaluated as candidates for consolidation. type: integer segmentsMin: description: | + This option is only available up to v3.12.6: + The minimum number of segments that are evaluated as candidates for consolidation type: integer minScore: description: | + This option is only available up to v3.12.6: + Filter out consolidation candidates with a score less than this. type: integer + maxSkewThreshold: + description: | + This option is available from v3.12.7 onward: + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + minDeletionRatio: + description: | + This option is available from v3.12.7 onward: + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. + type: integer + minimum: 0.0 + maximum: 1.0 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool (`0` = disabled). @@ -1403,7 +1603,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. _Background:_ @@ -1426,7 +1627,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -1444,6 +1645,8 @@ paths: maximum: 1.0 segmentsBytesFloor: description: | + This option is only available up to v3.12.6: + Defines the value (in bytes) to treat all smaller segments as equal for consolidation selection. type: integer @@ -1455,21 +1658,86 @@ paths: default: 8589934592 segmentsMax: description: | + This option is only available up to v3.12.6: + The maximum number of segments that are evaluated as candidates for consolidation. type: integer default: 200 segmentsMin: description: | + This option is only available up to v3.12.6: + The minimum number of segments that are evaluated as candidates for consolidation type: integer default: 50 minScore: description: | + This option is only available up to v3.12.6: + Filter out consolidation candidates with a score less than this. type: integer default: 0 + maxSkewThreshold: + description: | + This option is available from v3.12.7 onward: + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + default: 0.4 + minDeletionRatio: + description: | + This option is available from v3.12.7 onward: + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. + type: integer + minimum: 0.0 + maximum: 1.0 + default: 0.5 responses: '200': description: | @@ -1618,7 +1886,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. type: object properties: @@ -1627,7 +1896,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -1643,6 +1912,8 @@ paths: maximum: 1.0 segmentsBytesFloor: description: | + This option is only available up to v3.12.6: + Defines the value (in bytes) to treat all smaller segments as equal for consolidation selection. type: integer @@ -1652,18 +1923,81 @@ paths: type: integer segmentsMax: description: | + This option is only available up to v3.12.6: + The maximum number of segments that are evaluated as candidates for consolidation. type: integer segmentsMin: description: | + This option is only available up to v3.12.6: + The minimum number of segments that are evaluated as candidates for consolidation type: integer minScore: description: | + This option is only available up to v3.12.6: + Filter out consolidation candidates with a score less than this. type: integer + maxSkewThreshold: + description: | + This option is available from v3.12.7 onward: + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + minDeletionRatio: + description: | + This option is available from v3.12.7 onward: + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. + type: integer + minimum: 0.0 + maximum: 1.0 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool (`0` = disabled). @@ -1912,7 +2246,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. _Background:_ @@ -1935,7 +2270,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -1952,6 +2287,8 @@ paths: maximum: 1.0 segmentsBytesFloor: description: | + This option is only available up to v3.12.6: + Defines the value (in bytes) to treat all smaller segments as equal for consolidation selection. type: integer @@ -1963,21 +2300,86 @@ paths: default: 8589934592 segmentsMax: description: | + This option is only available up to v3.12.6: + The maximum number of segments that are evaluated as candidates for consolidation. type: integer default: 200 segmentsMin: description: | + This option is only available up to v3.12.6: + The minimum number of segments that are evaluated as candidates for consolidation type: integer default: 50 minScore: description: | + This option is only available up to v3.12.6: + Filter out consolidation candidates with a score less than this. type: integer default: 0 + maxSkewThreshold: + description: | + This option is available from v3.12.7 onward: + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + default: 0.4 + minDeletionRatio: + description: | + This option is available from v3.12.7 onward: + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. + type: integer + minimum: 0.0 + maximum: 1.0 + default: 0.5 responses: '200': description: | @@ -2126,7 +2528,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. type: object properties: @@ -2135,7 +2538,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -2151,6 +2554,8 @@ paths: maximum: 1.0 segmentsBytesFloor: description: | + This option is only available up to v3.12.6: + Defines the value (in bytes) to treat all smaller segments as equal for consolidation selection. type: integer @@ -2160,18 +2565,81 @@ paths: type: integer segmentsMax: description: | + This option is only available up to v3.12.6: + The maximum number of segments that are evaluated as candidates for consolidation. type: integer segmentsMin: description: | + This option is only available up to v3.12.6: + The minimum number of segments that are evaluated as candidates for consolidation type: integer minScore: description: | + This option is only available up to v3.12.6: + Filter out consolidation candidates with a score less than this. type: integer + maxSkewThreshold: + description: | + This option is available from v3.12.7 onward: + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + minDeletionRatio: + description: | + This option is available from v3.12.7 onward: + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. + type: integer + minimum: 0.0 + maximum: 1.0 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool (`0` = disabled). @@ -2493,7 +2961,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. type: object properties: @@ -2502,7 +2971,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -2518,6 +2987,8 @@ paths: maximum: 1.0 segmentsBytesFloor: description: | + This option is only available up to v3.12.6: + Defines the value (in bytes) to treat all smaller segments as equal for consolidation selection. type: integer @@ -2527,18 +2998,81 @@ paths: type: integer segmentsMax: description: | + This option is only available up to v3.12.6: + The maximum number of segments that are evaluated as candidates for consolidation. type: integer segmentsMin: description: | + This option is only available up to v3.12.6: + The minimum number of segments that are evaluated as candidates for consolidation type: integer minScore: description: | + This option is only available up to v3.12.6: + Filter out consolidation candidates with a score less than this. type: integer + maxSkewThreshold: + description: | + This option is available from v3.12.7 onward: + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + minDeletionRatio: + description: | + This option is available from v3.12.7 onward: + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. + type: integer + minimum: 0.0 + maximum: 1.0 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool (`0` = disabled). diff --git a/site/content/arangodb/3.12/indexes-and-search/arangosearch/arangosearch-views-reference.md b/site/content/arangodb/3.12/indexes-and-search/arangosearch/arangosearch-views-reference.md index 036758127f..7c483e2737 100644 --- a/site/content/arangodb/3.12/indexes-and-search/arangosearch/arangosearch-views-reference.md +++ b/site/content/arangodb/3.12/indexes-and-search/arangosearch/arangosearch-views-reference.md @@ -462,7 +462,7 @@ is used by these writers (in terms of "writers pool") one can use - `"bytes_accum"`: Consolidation is performed based on current memory consumption of segments and `threshold` property value. - - `"tier"`: Consolidate based on segment byte size and live document count + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. {{< warning >}} @@ -485,10 +485,14 @@ is used by these writers (in terms of "writers pool") one can use - **segmentsMin** (_optional_; type: `integer`; default: `50`) + This option is only available up to v3.12.6: + The minimum number of segments that are evaluated as candidates for consolidation. - **segmentsMax** (_optional_; type: `integer`; default: `200`) + This option is only available up to v3.12.6: + The maximum number of segments that are evaluated as candidates for consolidation. - **segmentsBytesMax** (_optional_; type: `integer`; default: `8589934592`) @@ -497,9 +501,66 @@ is used by these writers (in terms of "writers pool") one can use - **segmentsBytesFloor** (_optional_; type: `integer`; default: `25165824`) + This option is only available up to v3.12.6: + Defines the value (in bytes) to treat all smaller segments as equal for consolidation selection. - **minScore** (_optional_; type: `integer`; default: `0`) + This option is only available up to v3.12.6: + Filter out consolidation candidates with a score less than this. + + - **maxSkewThreshold** (_optional_; type: `number`; default: `0.4`) + + This option is available from v3.12.7 onward: + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + + - **minDeletionRatio** (_optional_; type: `number`; default: `0.5`) + + This option is available from v3.12.7 onward: + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. diff --git a/site/content/arangodb/3.12/release-notes/version-3.12/api-changes-in-3-12.md b/site/content/arangodb/3.12/release-notes/version-3.12/api-changes-in-3-12.md index c00008ab07..16456aaa60 100644 --- a/site/content/arangodb/3.12/release-notes/version-3.12/api-changes-in-3-12.md +++ b/site/content/arangodb/3.12/release-notes/version-3.12/api-changes-in-3-12.md @@ -365,6 +365,25 @@ By consolidating less often and with more data, less file descriptors are used. - `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB) - `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB) +##### Added and removed consolidation options for `arangosearch` Views + +Introduced in: v3.12.7 + +The following options for consolidating `arangosearch` Views have been removed +and are now ignored when specified in a request: + +- `consolidationPolicy` (with `type` set to `tier`): + - `segmentsMin` + - `segmentsMax` + - `segmentsBytesFloor` + - `minScore` + +The following new options have been added: + +- `consolidationPolicy` (with `type` set to `tier`): + - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`) + - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`) + #### Document API The following endpoints accept a new `versionAttribute` query parameter that adds @@ -503,6 +522,25 @@ By consolidating less often and with more data, less file descriptors are used. - `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB) - `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB) +##### Added and removed consolidation options for inverted indexes + +Introduced in: v3.12.7 + +The following options for consolidating inverted indexes have been removed +and are now ignored when specified in a request: + +- `consolidationPolicy` (with `type` set to `tier`): + - `segmentsMin` + - `segmentsMax` + - `segmentsBytesFloor` + - `minScore` + +The following new options have been added: + +- `consolidationPolicy` (with `type` set to `tier`): + - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`) + - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`) + #### Optimizer rule descriptions Introduced in: v3.10.9, v3.11.2 diff --git a/site/content/arangodb/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md b/site/content/arangodb/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md index 959cea82cf..1fe5a31d18 100644 --- a/site/content/arangodb/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md +++ b/site/content/arangodb/3.12/release-notes/version-3.12/incompatible-changes-in-3-12.md @@ -994,6 +994,29 @@ more data, less file descriptors are used. - `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB) - `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB) +## Added and removed consolidation options for inverted indexs and `arangosearch` Views + +Introduced in: v3.12.7 + +The following options for consolidating inverted indexes as well as +`arangosearch` Views have been removed and are now ignored when specified in a request: + +- `consolidationPolicy` (with `type` set to `tier`): + - `segmentsMin` + - `segmentsMax` + - `segmentsBytesFloor` + - `minScore` + +The consolidation works differently now and uses the new `maxSkewThreshold` and +`minDeletionRatio` options together with the existing `segmentsBytesMax`. If you +previously used customized settings for the removed options, check if the default +values of the new options are acceptable or if you need to tune them according to +your workload. + +For details, see: +- [HTTP interface for inverted indexes](../../develop/http-api/indexes/inverted.md) +- [`arangosearch` View properties](../../indexes-and-search/arangosearch/arangosearch-views-reference.md#view-properties) + ## HTTP RESTful API ### JavaScript-based traversal using `/_api/traversal` removed diff --git a/site/content/arangodb/3.12/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/arangodb/3.12/release-notes/version-3.12/whats-new-in-3-12.md index e1e7bced88..e550246d34 100644 --- a/site/content/arangodb/3.12/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/arangodb/3.12/release-notes/version-3.12/whats-new-in-3-12.md @@ -2509,6 +2509,38 @@ environment variable `NAME`. If there is an environment variable called `PID` or `TEMP_BASE_DIR`, then `@PID@` or `@TEMP_BASE_DIR@` is substituted with the value of the respective environment variable. +### New consolidation algorithm for inverted indexes and `arangosearch` Views + +Introduced in: v3.12.7 + +The `tier` consolidation policy now uses a different algorithm for merging +and cleaning up segments. Overall, it avoids consolidating segments where the +cost of writing the new segment is high and the gain in read performance is low +(e.g. combining a big segment file with a very small one). + +The following options have been removed for inverted indexes as well as +`arangosearch` Views because the new consolidation algorithm doesn't use them: + +- `consolidationPolicy` (with `type` set to `tier`): + - `segmentsMin` + - `segmentsMax` + - `segmentsBytesFloor` + - `minScore` + +The following new options have been added: + +- `consolidationPolicy` (with `type` set to `tier`): + - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`) + - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`) + +If you previously used customized settings for the removed options, check if the +default values of the new options are acceptable or if you need to tune them +according to your workload. + +For details, see: +- [HTTP interface for inverted indexes](../../develop/http-api/indexes/inverted.md) +- [`arangosearch` View properties](../../indexes-and-search/arangosearch/arangosearch-views-reference.md#view-properties) + ### Deployment metadata metrics Introduced in: v3.12.7 diff --git a/site/content/arangodb/4.0/develop/http-api/indexes/inverted.md b/site/content/arangodb/4.0/develop/http-api/indexes/inverted.md index d2c5939c25..77e860c288 100644 --- a/site/content/arangodb/4.0/develop/http-api/indexes/inverted.md +++ b/site/content/arangodb/4.0/develop/http-api/indexes/inverted.md @@ -561,38 +561,74 @@ paths: upon several possible configurable formulas as defined by their types. The supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. type: string default: tier - segmentsBytesFloor: - description: | - Defines the value (in bytes) to treat all smaller segments as equal for - consolidation selection. - type: integer - default: 25165824 segmentsBytesMax: description: | The maximum allowed size of all consolidated segments in bytes. type: integer default: 8589934592 - segmentsMax: + maxSkewThreshold: description: | - The maximum number of segments that are evaluated as candidates for - consolidation. - type: integer - default: 200 - segmentsMin: - description: | - The minimum number of segments that are evaluated as candidates for - consolidation. - type: integer - default: 50 - minScore: + This option is available from v3.12.7 onward: + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + default: 0.4 + minDeletionRatio: description: | - Filter out consolidation candidates with a score less than this. + This option is available from v3.12.7 onward: + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. type: integer - default: 0 + minimum: 0.0 + maximum: 1.0 + default: 0.5 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool diff --git a/site/content/arangodb/4.0/develop/http-api/views/arangosearch-views.md b/site/content/arangodb/4.0/develop/http-api/views/arangosearch-views.md index 2f33e5c772..5c3d863fcb 100644 --- a/site/content/arangodb/4.0/develop/http-api/views/arangosearch-views.md +++ b/site/content/arangodb/4.0/develop/http-api/views/arangosearch-views.md @@ -307,7 +307,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. _Background:_ @@ -330,7 +331,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -346,34 +347,66 @@ paths: default: 0 minimum: 0.0 maximum: 1.0 - segmentsBytesFloor: - description: | - Defines the value (in bytes) to treat all smaller segments - as equal for consolidation selection. - type: integer - default: 25165824 segmentsBytesMax: description: | Maximum allowed size of all consolidated segments in bytes. type: integer default: 8589934592 - segmentsMax: - description: | - The maximum number of segments that are evaluated as - candidates for consolidation. - type: integer - default: 200 - segmentsMin: + maxSkewThreshold: description: | - The minimum number of segments that are - evaluated as candidates for consolidation - type: integer - default: 50 - minScore: + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + default: 0.4 + minDeletionRatio: description: | - Filter out consolidation candidates with a score less than this. + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. type: integer - default: 0 + minimum: 0.0 + maximum: 1.0 + default: 0.5 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool @@ -544,7 +577,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. type: object properties: @@ -553,7 +587,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -567,29 +601,63 @@ paths: type: number minimum: 0.0 maximum: 1.0 - segmentsBytesFloor: - description: | - Defines the value (in bytes) to treat all smaller segments - as equal for consolidation selection. - type: integer segmentsBytesMax: description: | Maximum allowed size of all consolidated segments in bytes. type: integer - segmentsMax: - description: | - The maximum number of segments that are evaluated as - candidates for consolidation. - type: integer - segmentsMin: + maxSkewThreshold: description: | - The minimum number of segments that are - evaluated as candidates for consolidation - type: integer - minScore: + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + minDeletionRatio: description: | - Filter out consolidation candidates with a score less than this. + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. type: integer + minimum: 0.0 + maximum: 1.0 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool (`0` = disabled). @@ -1016,7 +1084,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. type: object properties: @@ -1025,7 +1094,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -1039,29 +1108,63 @@ paths: type: number minimum: 0.0 maximum: 1.0 - segmentsBytesFloor: - description: | - Defines the value (in bytes) to treat all smaller segments - as equal for consolidation selection. - type: integer segmentsBytesMax: description: | Maximum allowed size of all consolidated segments in bytes. type: integer - segmentsMax: - description: | - The maximum number of segments that are evaluated as - candidates for consolidation. - type: integer - segmentsMin: + maxSkewThreshold: description: | - The minimum number of segments that are - evaluated as candidates for consolidation - type: integer - minScore: + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + minDeletionRatio: description: | - Filter out consolidation candidates with a score less than this. + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. type: integer + minimum: 0.0 + maximum: 1.0 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool (`0` = disabled). @@ -1403,7 +1506,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. _Background:_ @@ -1426,7 +1530,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -1442,34 +1546,66 @@ paths: default: 0 minimum: 0.0 maximum: 1.0 - segmentsBytesFloor: - description: | - Defines the value (in bytes) to treat all smaller segments - as equal for consolidation selection. - type: integer - default: 25165824 segmentsBytesMax: description: | Maximum allowed size of all consolidated segments in bytes. type: integer default: 8589934592 - segmentsMax: - description: | - The maximum number of segments that are evaluated as - candidates for consolidation. - type: integer - default: 200 - segmentsMin: + maxSkewThreshold: description: | - The minimum number of segments that are - evaluated as candidates for consolidation - type: integer - default: 50 - minScore: + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + default: 0.4 + minDeletionRatio: description: | - Filter out consolidation candidates with a score less than this. + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. type: integer - default: 0 + minimum: 0.0 + maximum: 1.0 + default: 0.5 responses: '200': description: | @@ -1618,7 +1754,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. type: object properties: @@ -1627,7 +1764,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -1641,29 +1778,63 @@ paths: type: number minimum: 0.0 maximum: 1.0 - segmentsBytesFloor: - description: | - Defines the value (in bytes) to treat all smaller segments - as equal for consolidation selection. - type: integer segmentsBytesMax: description: | Maximum allowed size of all consolidated segments in bytes. type: integer - segmentsMax: - description: | - The maximum number of segments that are evaluated as - candidates for consolidation. - type: integer - segmentsMin: + maxSkewThreshold: description: | - The minimum number of segments that are - evaluated as candidates for consolidation - type: integer - minScore: + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + minDeletionRatio: description: | - Filter out consolidation candidates with a score less than this. + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. type: integer + minimum: 0.0 + maximum: 1.0 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool (`0` = disabled). @@ -1912,7 +2083,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. _Background:_ @@ -1935,7 +2107,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -1950,34 +2122,66 @@ paths: default: 0 minimum: 0.0 maximum: 1.0 - segmentsBytesFloor: - description: | - Defines the value (in bytes) to treat all smaller segments - as equal for consolidation selection. - type: integer - default: 25165824 segmentsBytesMax: description: | Maximum allowed size of all consolidated segments in bytes. type: integer default: 8589934592 - segmentsMax: - description: | - The maximum number of segments that are evaluated as - candidates for consolidation. - type: integer - default: 200 - segmentsMin: + maxSkewThreshold: description: | - The minimum number of segments that are - evaluated as candidates for consolidation - type: integer - default: 50 - minScore: + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + default: 0.4 + minDeletionRatio: description: | - Filter out consolidation candidates with a score less than this. + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. type: integer - default: 0 + minimum: 0.0 + maximum: 1.0 + default: 0.5 responses: '200': description: | @@ -2126,7 +2330,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. type: object properties: @@ -2135,7 +2340,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -2149,29 +2354,63 @@ paths: type: number minimum: 0.0 maximum: 1.0 - segmentsBytesFloor: - description: | - Defines the value (in bytes) to treat all smaller segments - as equal for consolidation selection. - type: integer segmentsBytesMax: description: | Maximum allowed size of all consolidated segments in bytes. type: integer - segmentsMax: - description: | - The maximum number of segments that are evaluated as - candidates for consolidation. - type: integer - segmentsMin: + maxSkewThreshold: description: | - The minimum number of segments that are - evaluated as candidates for consolidation - type: integer - minScore: + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + minDeletionRatio: description: | - Filter out consolidation candidates with a score less than this. + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. type: integer + minimum: 0.0 + maximum: 1.0 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool (`0` = disabled). @@ -2493,7 +2732,8 @@ paths: description: | The consolidation policy to apply for selecting which segments should be merged. - - If the `tier` type is used, then the `segments*` and `minScore` properties are available. + - If the `tier` type is used, then the `maxSkewThreshold`, + `minDeletionRatio`, `segments*`, and `minScore` properties are available. - If the `bytes_accum` type is used, then the `threshold` property is available. type: object properties: @@ -2502,7 +2742,7 @@ paths: The segment candidates for the "consolidation" operation are selected based upon several possible configurable formulas as defined by their types. The currently supported types are: - - `"tier"`: consolidate based on segment byte size and live + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. - `"bytes_accum"`: consolidate if and only if `{threshold} > (segment_bytes + sum_of_merge_candidate_segment_bytes) / all_segment_bytes` @@ -2516,29 +2756,63 @@ paths: type: number minimum: 0.0 maximum: 1.0 - segmentsBytesFloor: - description: | - Defines the value (in bytes) to treat all smaller segments - as equal for consolidation selection. - type: integer segmentsBytesMax: description: | Maximum allowed size of all consolidated segments in bytes. type: integer - segmentsMax: - description: | - The maximum number of segments that are evaluated as - candidates for consolidation. - type: integer - segmentsMin: + maxSkewThreshold: description: | - The minimum number of segments that are - evaluated as candidates for consolidation - type: integer - minScore: + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + type: number + minimum: 0.0 + maximum: 1.0 + minDeletionRatio: description: | - Filter out consolidation candidates with a score less than this. + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. type: integer + minimum: 0.0 + maximum: 1.0 writebufferIdle: description: | Maximum number of writers (segments) cached in the pool (`0` = disabled). diff --git a/site/content/arangodb/4.0/indexes-and-search/arangosearch/arangosearch-views-reference.md b/site/content/arangodb/4.0/indexes-and-search/arangosearch/arangosearch-views-reference.md index 036758127f..e0c9ef42ed 100644 --- a/site/content/arangodb/4.0/indexes-and-search/arangosearch/arangosearch-views-reference.md +++ b/site/content/arangodb/4.0/indexes-and-search/arangosearch/arangosearch-views-reference.md @@ -462,7 +462,7 @@ is used by these writers (in terms of "writers pool") one can use - `"bytes_accum"`: Consolidation is performed based on current memory consumption of segments and `threshold` property value. - - `"tier"`: Consolidate based on segment byte size and live document count + - `"tier"`: consolidate based on segment byte size skew and live document count as dictated by the customization attributes. {{< warning >}} @@ -483,23 +483,55 @@ is used by these writers (in terms of "writers pool") one can use `consolidationPolicy` properties for `"tier"` type: - - **segmentsMin** (_optional_; type: `integer`; default: `50`) - - The minimum number of segments that are evaluated as candidates for consolidation. - - - **segmentsMax** (_optional_; type: `integer`; default: `200`) - - The maximum number of segments that are evaluated as candidates for consolidation. - - **segmentsBytesMax** (_optional_; type: `integer`; default: `8589934592`) Maximum allowed size of all consolidated segments in bytes. - - **segmentsBytesFloor** (_optional_; type: `integer`; default: `25165824`) - - Defines the value (in bytes) to treat all smaller segments as equal for consolidation - selection. - - - **minScore** (_optional_; type: `integer`; default: `0`) - - Filter out consolidation candidates with a score less than this. + - **maxSkewThreshold** (_optional_; type: `number`; default: `0.4`) + + The skew describes how much segment files vary in file size. It is a number + between `0.0` and `1.0` and is calculated by dividing the largest file size + of a set of segment files by the total size. For example, the skew of a + 200 MiB, 300 MiB, and 500 MiB segment file is `0.5` (`500 / 1000`). + + A large `maxSkewThreshold` value allows merging large segment files with + smaller ones, consolidation occurs more frequently, and there are fewer + segment files on disk at all times. While this may potentially improve the + read performance and use fewer file descriptors, frequent consolidations + cause a higher write load and thus a higher write amplification. + + On the other hand, a small threshold value triggers the consolidation only + when there are a large number of segment files that don't vary in size a lot. + Consolidation occurs less frequently, reducing the write amplification, but + it can result in a greater number of segment files on disk. + + Multiple combinations of candidate segments are checked and the one with + the lowest skew value is selected for consolidation. The selection process + picks the greatest number of segments that together have the lowest skew value + while ensuring that the size of the new consolidated segment remains under + the configured `segmentsBytesMax`. + + - **minDeletionRatio** (_optional_; type: `number`; default: `0.5`) + + The `minDeletionRatio` represents the minimum required deletion ratio + in one or more segments to perform a cleanup of those segments. + It is a number between `0.0` and `1.0`. + + The deletion ratio is the percentage of deleted documents across one or + more segment files and is calculated by dividing the number of deleted + documents by the total number of documents in a segment or a group of + segments. For example, if there is a segment with 1000 documents of which + 300 are deleted and another segment with 1000 documents of which 700 are + deleted, the deletion ratio is `0.5` (50%, calculated as `1000 / 2000`). + + The `minDeletionRatio` threshold must be carefully selected. A smaller + value leads to earlier cleanup of deleted documents from segments and + thus reclamation of disk space but it generates a higher write load. + A very large value lowers the write amplification but at the same time + the system can be left with a large number of segment files with a high + percentage of deleted documents that occupy disk space unnecessarily. + + During cleanup, the segment files are first arranged in decreasing + order of their individual deletion ratios. Then the largest subset of + segments whose collective deletion ratio is greater than or equal to + `minDeletionRatio` is picked. diff --git a/site/content/arangodb/4.0/release-notes/version-3.12/api-changes-in-3-12.md b/site/content/arangodb/4.0/release-notes/version-3.12/api-changes-in-3-12.md index c00008ab07..16456aaa60 100644 --- a/site/content/arangodb/4.0/release-notes/version-3.12/api-changes-in-3-12.md +++ b/site/content/arangodb/4.0/release-notes/version-3.12/api-changes-in-3-12.md @@ -365,6 +365,25 @@ By consolidating less often and with more data, less file descriptors are used. - `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB) - `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB) +##### Added and removed consolidation options for `arangosearch` Views + +Introduced in: v3.12.7 + +The following options for consolidating `arangosearch` Views have been removed +and are now ignored when specified in a request: + +- `consolidationPolicy` (with `type` set to `tier`): + - `segmentsMin` + - `segmentsMax` + - `segmentsBytesFloor` + - `minScore` + +The following new options have been added: + +- `consolidationPolicy` (with `type` set to `tier`): + - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`) + - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`) + #### Document API The following endpoints accept a new `versionAttribute` query parameter that adds @@ -503,6 +522,25 @@ By consolidating less often and with more data, less file descriptors are used. - `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB) - `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB) +##### Added and removed consolidation options for inverted indexes + +Introduced in: v3.12.7 + +The following options for consolidating inverted indexes have been removed +and are now ignored when specified in a request: + +- `consolidationPolicy` (with `type` set to `tier`): + - `segmentsMin` + - `segmentsMax` + - `segmentsBytesFloor` + - `minScore` + +The following new options have been added: + +- `consolidationPolicy` (with `type` set to `tier`): + - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`) + - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`) + #### Optimizer rule descriptions Introduced in: v3.10.9, v3.11.2 diff --git a/site/content/arangodb/4.0/release-notes/version-3.12/incompatible-changes-in-3-12.md b/site/content/arangodb/4.0/release-notes/version-3.12/incompatible-changes-in-3-12.md index 959cea82cf..1fe5a31d18 100644 --- a/site/content/arangodb/4.0/release-notes/version-3.12/incompatible-changes-in-3-12.md +++ b/site/content/arangodb/4.0/release-notes/version-3.12/incompatible-changes-in-3-12.md @@ -994,6 +994,29 @@ more data, less file descriptors are used. - `segmentsBytesMax` increased from `5368709120` (5 GiB) to `8589934592` (8 GiB) - `segmentsBytesFloor` increased from `2097152` (2 MiB) to `25165824` (24 MiB) +## Added and removed consolidation options for inverted indexs and `arangosearch` Views + +Introduced in: v3.12.7 + +The following options for consolidating inverted indexes as well as +`arangosearch` Views have been removed and are now ignored when specified in a request: + +- `consolidationPolicy` (with `type` set to `tier`): + - `segmentsMin` + - `segmentsMax` + - `segmentsBytesFloor` + - `minScore` + +The consolidation works differently now and uses the new `maxSkewThreshold` and +`minDeletionRatio` options together with the existing `segmentsBytesMax`. If you +previously used customized settings for the removed options, check if the default +values of the new options are acceptable or if you need to tune them according to +your workload. + +For details, see: +- [HTTP interface for inverted indexes](../../develop/http-api/indexes/inverted.md) +- [`arangosearch` View properties](../../indexes-and-search/arangosearch/arangosearch-views-reference.md#view-properties) + ## HTTP RESTful API ### JavaScript-based traversal using `/_api/traversal` removed diff --git a/site/content/arangodb/4.0/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/arangodb/4.0/release-notes/version-3.12/whats-new-in-3-12.md index e1e7bced88..e550246d34 100644 --- a/site/content/arangodb/4.0/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/arangodb/4.0/release-notes/version-3.12/whats-new-in-3-12.md @@ -2509,6 +2509,38 @@ environment variable `NAME`. If there is an environment variable called `PID` or `TEMP_BASE_DIR`, then `@PID@` or `@TEMP_BASE_DIR@` is substituted with the value of the respective environment variable. +### New consolidation algorithm for inverted indexes and `arangosearch` Views + +Introduced in: v3.12.7 + +The `tier` consolidation policy now uses a different algorithm for merging +and cleaning up segments. Overall, it avoids consolidating segments where the +cost of writing the new segment is high and the gain in read performance is low +(e.g. combining a big segment file with a very small one). + +The following options have been removed for inverted indexes as well as +`arangosearch` Views because the new consolidation algorithm doesn't use them: + +- `consolidationPolicy` (with `type` set to `tier`): + - `segmentsMin` + - `segmentsMax` + - `segmentsBytesFloor` + - `minScore` + +The following new options have been added: + +- `consolidationPolicy` (with `type` set to `tier`): + - `maxSkewThreshold` (number in range `[0.0, 1.0]`, default: `0.4`) + - `minDeletionRatio` (number in range `[0.0, 1.0]`, default: `0.5`) + +If you previously used customized settings for the removed options, check if the +default values of the new options are acceptable or if you need to tune them +according to your workload. + +For details, see: +- [HTTP interface for inverted indexes](../../develop/http-api/indexes/inverted.md) +- [`arangosearch` View properties](../../indexes-and-search/arangosearch/arangosearch-views-reference.md#view-properties) + ### Deployment metadata metrics Introduced in: v3.12.7