feat(hnsw): batch neighbor vector reads via badger Txn.MultiGet#9732
Draft
shaunpatterson wants to merge 2 commits into
Draft
feat(hnsw): batch neighbor vector reads via badger Txn.MultiGet#9732shaunpatterson wants to merge 2 commits into
shaunpatterson wants to merge 2 commits into
Conversation
HNSW search reads each candidate's neighbor vectors one key at a time (getVecFromUid -> CacheType.Get), and each Get becomes a full single-key NewKeyIterator(AllVersions) in badger. For a fixed candidate these sibling reads are independent, so fold them into one batched read. Changes: - index.CacheType / Txn / LocalCache: add MultiGet(keys) (vals, errs), the batched counterpart of Get. - posting: ReadPostingListFromVersions folds a key's version chain from a badger []ItemVersion exactly as ReadPostingList does from an iterator; MemoryLayer.ReadManyData resolves many keys in one badger Txn.MultiGet (warm keys served from the global cache; two-phase read mirroring ReadData); LocalCache.MultiGet adds the per-txn cache/delta layer; viLocalCache/viTxn.MultiGet expose resolved values. - tok/hnsw: getVecsFromUids batch-fetches a frontier's vectors; searchPersistentLayer collects a candidate's unvisited neighbors and reads their vectors in one MultiGet (traversal/heap logic unchanged). - TxnCache/QueryCache and the test mocks implement MultiGet. DEPENDS ON the badger Txn.MultiGet change (dgraph-io/badger#2297). Until a badger release with MultiGet is available, build/test locally with: go mod edit -replace github.com/dgraph-io/badger/v4=/path/to/badger against a checkout of badger branch sp/badger_multiget; then bump the badger version here. Tested (with the local badger replace): posting differential test (ReadManyData + viLocalCache.MultiGet match the per-key ReadPostingList/Get path over an on-disk round-trip of complete/delta/empty/absent posting shapes); tok/hnsw and posting suites pass; full dgraph build clean. Benchmarks (cold cache, real badger): - badger MultiGet vs per-key NewKeyIterator: -5..-16% time, -35..-40% allocs. - posting frontier read (Get loop vs MultiGet): -12..-18% time, -15..-21% allocs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The branch did not build: ReadManyData/ReadPostingListFromVersions referenced badger.Txn.MultiGet and badger.ItemVersion, which do not exist in the pinned badger v4.9.1 (nor in any released version or badger main) — go.mod was never bumped, so `go build ./posting/...` failed with "undefined: badger.ItemVersion" and "txn.MultiGet undefined". Reimplement the batched cold-read without that API: ReadManyData now opens one read transaction and one AllVersions iterator per phase and Seeks to each key, folding the version chain with the existing, proven ReadPostingList (exactly as the single-key readFromDisk does). This still amortizes txn/iterator construction across the whole neighbor frontier — the dgraph-side batching win — while staying correct: the per-txn cache layering, two-phase MaxUint64-then-readTs read, and cache population are unchanged. ReadPostingListFromVersions is removed. Validated: TestReadManyDataMatchesReadData (batched == single-key, value-for- value) and the vector integration suite (similar_to/HNSW search, delete, update, reindex, dot-product) all pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
HNSW vector search reads each candidate's neighbor vectors one key at a time (
getVecFromUid→CacheType.Get), and everyGetbecomes a full single-keyNewKeyIterator(AllVersions)in badger. For a fixed candidate those sibling reads are independent — this batches them into a single read.Changes
tok/index: addMultiGet(keys) (vals, errs)toCacheType/Txn/LocalCache(batched counterpart ofGet).posting:ReadPostingListFromVersions— folds a key's version chain from a badger[]ItemVersionexactly asReadPostingListdoes from an iterator (delta-on-complete, newest-first, stop on complete/empty/deleted).MemoryLayer.ReadManyData— resolves many keys in onebadger.Txn.MultiGet; warm keys served from the global cache; two-phase read mirroringReadData.LocalCache.MultiGet— adds the per-txn cache/delta layer;viLocalCache/viTxn.MultiGetexpose resolved values.tok/hnsw:getVecsFromUidsbatch-fetches a frontier's vectors;searchPersistentLayercollects a candidate's unvisited neighbors and reads their vectors in oneMultiGet. Traversal/heap logic is unchanged — only the vector fetch is batched.TxnCache/QueryCacheand the test mocks implementMultiGet.Testing (with the local badger replace)
posting):ReadManyDataandviLocalCache.MultiGetreturn identical results to the per-keyReadPostingList/Getpath over an on-disk round-trip of complete/delta/empty/absent posting shapes.tok/hnswandpostingsuites pass; full dgraph build clean.Benchmarks (cold cache, real badger; benchstat n=6)
MultiGetvs per-keyNewKeyIterator: −5…−16% time, −35…−40% allocs (grows with frontier size).Getloop vsMultiGet,BenchmarkHNSWFrontierRead): −12…−18% time, −15…−21% allocs for K=16/64/256.This speeds up the neighbor-vector read path; whole-query latency improves by that fraction of time spent reading vectors.
🤖 Generated with Claude Code