Skip to content

Commit 9ebf271

Browse files
authored
DOC-806 | Vector index: storedValues and indexHint (#848)
* Vector index: storedValues and indexHint * Add introduced in remarks * Add push-filter-into-enumerate-near optimizer rule to release notes
1 parent 1e95984 commit 9ebf271

File tree

8 files changed

+168
-0
lines changed

8 files changed

+168
-0
lines changed

site/content/arangodb/3.12/develop/http-api/indexes/vector.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,23 @@ paths:
6565
maxItems: 1
6666
items:
6767
type: string
68+
storedValues:
69+
description: |
70+
Store additional attributes in the index (introduced in v3.12.7).
71+
Unlike with other index types, this is not for covering projections
72+
with the index but for adding attributes that you filter on.
73+
This lets you make the lookup in the vector index more efficient
74+
because it avoids materializing documents twice, once for the
75+
filtering and once for the matches.
76+
77+
The maximum number of attributes that you can use in `storedValues` is 32.
78+
type: array
79+
uniqueItems: true
80+
items:
81+
description: |
82+
A list of attribute paths. The `.` character denotes sub-attributes.
83+
type: string
84+
type: string
6885
sparse:
6986
description: |
7087
Whether to create a sparse index that excludes documents with

site/content/arangodb/3.12/indexes-and-search/indexing/working-with-indexes/vector-indexes.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,14 @@ centroids and the quality of vector search thus degrades.
7474
Set this option to `true` to keep the collection/shards available for
7575
write operations by not using an exclusive write lock for the duration
7676
of the index creation. Default: `false`.
77+
- **storedValues** (array of strings, introduced in v3.12.7):
78+
Store additional attributes in the index. Unlike with other index types, this
79+
is not for covering projections with the index but for adding attributes that
80+
you filter on. This lets you make the lookup in the vector index more efficient
81+
because it avoids materializing documents twice, once for the filtering and
82+
once for the matches.
83+
84+
The maximum number of attributes that you can use in `storedValues` is 32.
7785
- **params**: The parameters as used by the Faiss library.
7886
- **metric** (string): The measure for calculating the vector similarity:
7987
- `"cosine"`: Angular similarity. Vectors are automatically

site/content/arangodb/3.12/release-notes/version-3.12/api-changes-in-3-12.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,8 @@ A `replace-entries-with-object-iteration` rule has been added in v3.12.3.
101101

102102
A `use-index-for-collect` and a `use-vector-index` rule have been added in v3.12.4.
103103

104+
A `push-filter-into-enumerate-near` rule has been added in v3.12.7.
105+
104106
The affected endpoints are `POST /_api/cursor`, `POST /_api/explain`, and
105107
`GET /_api/query/rules`.
106108

site/content/arangodb/3.12/release-notes/version-3.12/whats-new-in-3-12.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1541,6 +1541,8 @@ FOR doc IN coll
15411541
RETURN doc
15421542
```
15431543

1544+
The filtering is handled by the `use-vector-index` optimizer rule in v3.12.6.
1545+
15441546
Vector indexes can now be sparse to exclude documents with the embedding attribute
15451547
for indexing missing or set to `null`.
15461548

@@ -1551,6 +1553,61 @@ The accompanying AQL function is the following:
15511553

15521554
- `APPROX_NEAR_INNER_PRODUCT()`
15531555

1556+
---
1557+
1558+
<small>Introduced in: v3.12.7</small>
1559+
1560+
Vector indexes now support `storedValues` to store additional attributes in the
1561+
index. Unlike with other index types, this is not for covering projections with
1562+
the index but for adding attributes that you filter on. This lets you make the
1563+
lookup in the vector index more efficient because it avoids materializing
1564+
documents twice, once for the filtering and once for the matches.
1565+
1566+
For example, if you set `storedValues` to `["val"]` in a vector index over
1567+
`["vector"]`, then the following query can utilize this index for the
1568+
filtering by `val` and the lookup using `vector`, but not for the projection of
1569+
`attr` even if you added it to `storedValues` as well:
1570+
1571+
```aql
1572+
FOR doc IN coll
1573+
FILTER doc.val > 3
1574+
SORT APPROX_NEAR_INNER_PRODUCT(doc.vector, @q) DESC
1575+
LIMIT 3
1576+
RETURN doc.attr
1577+
```
1578+
1579+
The query execution plan, the utilization of `storedValues` for filtering is
1580+
indicated by `/* covered by storedValues */`:
1581+
1582+
```aql
1583+
Execution plan:
1584+
Id NodeType Par Est. Comment
1585+
1 SingletonNode 1 * ROOT
1586+
10 CalculationNode 1 - LET #4 = [ ... ] /* json expression */ /* const assignment */
1587+
11 EnumerateNearVectorNode 3 - FOR doc OF coll IN TOP 3 NEAR #4 DISTANCE INTO #2 FILTER (doc.`val` > 3) /* early pruning */ /* covered by storedValues */
1588+
7 LimitNode 3 - LIMIT 0, 3
1589+
12 MaterializeNode 3 - MATERIALIZE doc INTO #5 /* (projections: `attr`) */ LET #6 = #5.`attr`
1590+
9 ReturnNode 3 - RETURN #6
1591+
1592+
Indexes used:
1593+
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
1594+
11 foo vector coll false false false n/a [ `vector` ] [ `val` ] #4
1595+
```
1596+
1597+
The new `push-filter-into-enumerate-near` optimizer rule now handles everything
1598+
related to vector index filtering (with and without `storedValues`).
1599+
1600+
The `FOR` operation now supports `indexHint` and `forceIndexHint` for vector
1601+
indexes to make the AQL optimizer prefer respectively require specific
1602+
vector indexes:
1603+
1604+
```aql
1605+
FOR doc IN c OPTIONS { indexHint: ["vec_idx_1", "vec_idx_2"], forceIndexHint: true }
1606+
SORT APPROX_NEAR_COSINE(doc.vector, @q) DESC
1607+
LIMIT 3
1608+
RETURN doc
1609+
```
1610+
15541611
## Server options
15551612

15561613
### Effective and available startup options

site/content/arangodb/4.0/develop/http-api/indexes/vector.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,23 @@ paths:
6565
maxItems: 1
6666
items:
6767
type: string
68+
storedValues:
69+
description: |
70+
Store additional attributes in the index (introduced in v3.12.7).
71+
Unlike with other index types, this is not for covering projections
72+
with the index but for adding attributes that you filter on.
73+
This lets you make the lookup in the vector index more efficient
74+
because it avoids materializing documents twice, once for the
75+
filtering and once for the matches.
76+
77+
The maximum number of attributes that you can use in `storedValues` is 32.
78+
type: array
79+
uniqueItems: true
80+
items:
81+
description: |
82+
A list of attribute paths. The `.` character denotes sub-attributes.
83+
type: string
84+
type: string
6885
sparse:
6986
description: |
7087
Whether to create a sparse index that excludes documents with

site/content/arangodb/4.0/indexes-and-search/indexing/working-with-indexes/vector-indexes.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,14 @@ centroids and the quality of vector search thus degrades.
7474
Set this option to `true` to keep the collection/shards available for
7575
write operations by not using an exclusive write lock for the duration
7676
of the index creation. Default: `false`.
77+
- **storedValues** (array of strings, introduced in v3.12.7):
78+
Store additional attributes in the index. Unlike with other index types, this
79+
is not for covering projections with the index but for adding attributes that
80+
you filter on. This lets you make the lookup in the vector index more efficient
81+
because it avoids materializing documents twice, once for the filtering and
82+
once for the matches.
83+
84+
The maximum number of attributes that you can use in `storedValues` is 32.
7785
- **params**: The parameters as used by the Faiss library.
7886
- **metric** (string): The measure for calculating the vector similarity:
7987
- `"cosine"`: Angular similarity. Vectors are automatically

site/content/arangodb/4.0/release-notes/version-3.12/api-changes-in-3-12.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,8 @@ A `replace-entries-with-object-iteration` rule has been added in v3.12.3.
101101

102102
A `use-index-for-collect` and a `use-vector-index` rule have been added in v3.12.4.
103103

104+
A `push-filter-into-enumerate-near` rule has been added in v3.12.7.
105+
104106
The affected endpoints are `POST /_api/cursor`, `POST /_api/explain`, and
105107
`GET /_api/query/rules`.
106108

site/content/arangodb/4.0/release-notes/version-3.12/whats-new-in-3-12.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1541,6 +1541,8 @@ FOR doc IN coll
15411541
RETURN doc
15421542
```
15431543

1544+
The filtering is handled by the `use-vector-index` optimizer rule in v3.12.6.
1545+
15441546
Vector indexes can now be sparse to exclude documents with the embedding attribute
15451547
for indexing missing or set to `null`.
15461548

@@ -1551,6 +1553,61 @@ The accompanying AQL function is the following:
15511553

15521554
- `APPROX_NEAR_INNER_PRODUCT()`
15531555

1556+
---
1557+
1558+
<small>Introduced in: v3.12.7</small>
1559+
1560+
Vector indexes now support `storedValues` to store additional attributes in the
1561+
index. Unlike with other index types, this is not for covering projections with
1562+
the index but for adding attributes that you filter on. This lets you make the
1563+
lookup in the vector index more efficient because it avoids materializing
1564+
documents twice, once for the filtering and once for the matches.
1565+
1566+
For example, if you set `storedValues` to `["val"]` in a vector index over
1567+
`["vector"]`, then the following query can utilize this index for the
1568+
filtering by `val` and the lookup using `vector`, but not for the projection of
1569+
`attr` even if you added it to `storedValues` as well:
1570+
1571+
```aql
1572+
FOR doc IN coll
1573+
FILTER doc.val > 3
1574+
SORT APPROX_NEAR_INNER_PRODUCT(doc.vector, @q) DESC
1575+
LIMIT 3
1576+
RETURN doc.attr
1577+
```
1578+
1579+
The query execution plan, the utilization of `storedValues` for filtering is
1580+
indicated by `/* covered by storedValues */`:
1581+
1582+
```aql
1583+
Execution plan:
1584+
Id NodeType Par Est. Comment
1585+
1 SingletonNode 1 * ROOT
1586+
10 CalculationNode 1 - LET #4 = [ ... ] /* json expression */ /* const assignment */
1587+
11 EnumerateNearVectorNode 3 - FOR doc OF coll IN TOP 3 NEAR #4 DISTANCE INTO #2 FILTER (doc.`val` > 3) /* early pruning */ /* covered by storedValues */
1588+
7 LimitNode 3 - LIMIT 0, 3
1589+
12 MaterializeNode 3 - MATERIALIZE doc INTO #5 /* (projections: `attr`) */ LET #6 = #5.`attr`
1590+
9 ReturnNode 3 - RETURN #6
1591+
1592+
Indexes used:
1593+
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
1594+
11 foo vector coll false false false n/a [ `vector` ] [ `val` ] #4
1595+
```
1596+
1597+
The new `push-filter-into-enumerate-near` optimizer rule now handles everything
1598+
related to vector index filtering (with and without `storedValues`).
1599+
1600+
The `FOR` operation now supports `indexHint` and `forceIndexHint` for vector
1601+
indexes to make the AQL optimizer prefer respectively require specific
1602+
vector indexes:
1603+
1604+
```aql
1605+
FOR doc IN c OPTIONS { indexHint: ["vec_idx_1", "vec_idx_2"], forceIndexHint: true }
1606+
SORT APPROX_NEAR_COSINE(doc.vector, @q) DESC
1607+
LIMIT 3
1608+
RETURN doc
1609+
```
1610+
15541611
## Server options
15551612

15561613
### Effective and available startup options

0 commit comments

Comments
 (0)