feat(hnsw): opt-in int8 scalar quantization for vector indexes#9733
Open
shaunpatterson wants to merge 4 commits into
Open
feat(hnsw): opt-in int8 scalar quantization for vector indexes#9733shaunpatterson wants to merge 4 commits into
shaunpatterson wants to merge 4 commits into
Conversation
Self-contained affine (per-vector min/step) uint8 quantizer with an asymmetric distance API (full-precision query vs quantized vector): encode to a self-describing blob (~3.9x smaller than float32 at 384 dims), dequantize, and AsymSquaredL2/AsymDot/AsymCosine. Tested: round-trip error bounds, distance approximation vs exact, edge cases (empty/constant/malformed), and recall@10 = 0.99 over a 2k x 384 synthetic corpus. Benchmarks: 3.92x smaller; asym scalar distance ~450ns vs ~124ns SIMD float32 baseline (memory win, distance is scalar for now). This is the reusable core; HNSW integration follows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Incorporates 3-round gpt5/gemini design review of the codec: - Versioned self-describing header [magic|version|codec|flags|u16 dim| f32 min|f32 step|codes] with strict validation (magic/version/codec/ exact-length) so a malformed/foreign blob is rejected, never coerced. - Sanitize NaN/Inf at encode time (NaN->0, +/-Inf->+/-MaxFloat32) so a non-finite value never reaches a distance and corrupts HNSW ordering. - Affine reconstruction done in float64 then cast, avoiding float32 intermediate overflow on huge ranges; distances may still legitimately overflow to +Inf for absurd inputs but never NaN. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wires the int8 scalar quantization codec into the persistent HNSW index as a per-index opt-in (`quantize:"int8"` index option). Off by default — existing indexes are byte-for-byte unchanged and never read the new predicate. - Factory/options: new "quantize" string option (AddStringOption); applyOptions validates it (only "int8") and requires 32-bit float vectors; GetPersistantOptions round-trips it. New per-index predicate __vector_q (vecQKey) holds the quantized blobs. - Write: insertHelper persists QuantizeFloat32(inVec) to __vector_q[uid] up front (before the node can be read as a neighbor), atomically via the txn — raw float vectors in the user predicate are left untouched. - Read: getVecFromUid, when quantized, reads __vector_q and dequantizes into the caller's reused buffer, then all existing SIMD distance/HNSW code runs unchanged on float32. Falls back to the raw vector on a missing/undecodable blob (graceful degradation). - index.DequantizeInto[T] is the generic hot-path decoder. Design validated over 3 review rounds (gpt5 + gemini): separate predicate keeping raw, dequant->SIMD (not scalar asym), asymmetric build distance, versioned/validated blob, NaN/Inf sanitize, fallback-not-fail on miss. Tests: option parsing + float-width guard; quantized SearchWithOptions finds the true nearest neighbor (same graph/vectors as the non-quantized recall test); Insert persists a round-tripping __vector_q blob. Full tok suite and dgraph build pass. Codec recall@10=0.99, ~3.8x smaller. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hardening from a deep gpt5+gemini review of the int8 quantization feature. Lifecycle for the new __vector_q predicate (so it isn't leaked/lost): - posting/index.go: drop its prefix on index rebuild (distinct predicate from __vector_, not covered by that prefix). - schema/schema.go: include it in a predicate's internal-pred enumeration. - worker/backup.go: include it in the per-predicate vector-pred set. - worker/online_restore.go, worker/restore_map.go: recognize its suffix. (The strings.Contains(__vector_) sites in export/mutation/predicate_move already match "__vector_q" as a substring, intentionally — they must also catch leveled "__vector_<i>" predicates, which HasSuffix would miss.) Correctness/safety: - Dimension guard: persistentHNSW learns the vector dim lazily from full-precision (raw) vectors only; a quantized blob is accepted solely when its decoded length matches that trusted dim, else it falls back to raw. A corrupt/wrong-dim blob can neither poison the dim nor feed a wrong-length slice to the SIMD distance kernels. - quantize option value is validated at the option layer (AddCustomOption) so a bad value (e.g. "int4") is rejected when the schema is altered. - writeQuantizedVec: avoid the []T->[]float32 copy when T is already float32 (the only supported width). Schema: `@index(hnsw(quantize:"int8"))` flows end-to-end via the generic option mechanism (verified by a schema-parse test; "int4" rejected). Tests: schema parse (int8 accepted / int4 rejected); corrupt-and wrong-dim __vector_q blobs (incl. the entry node) fall back to raw and search still returns the true NN without panic; suites pass under -race. Cluster-only validations still pending (noted for review): export/backup/ restore round-trip, drop-index leak scan, restart persistence, scale recall. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds opt-in int8 scalar quantization to the persistent HNSW vector index, enabled per-index with
@index(hnsw(..., quantize:"int8")). Off by default — existing indexes are byte-for-byte unchanged.A per-vector affine uint8 quantization of each vector is stored in a new internal predicate
__vector_q; the rawfloat32vectors in the user predicate are left untouched (no lossy reads, exact re-rank remains possible later). During search/build the quantized blob is dequantized into a reused buffer and fed to the existingvekSIMD distance kernels, so distance/HNSW code is unchanged.Why
Raw
float32vectors dominate the on-disk/read-cache footprint of large vector indexes. int8 scalar quantization is the standard first step to scale ANN indexes (Faiss SQ8, HNSWlib), trading a small, well-bounded recall cost for a ~4× reduction.Design (validated over multiple gpt5+gemini review rounds)
__vector_qpredicate, raw kept — never lossy for users.[magic|version|codec|flags|u16 dim|f32 min|f32 step|codes]; NaN/Inf sanitized at encode; malformed/foreign blobs rejected.__vector_qis dropped on rebuild, enumerated for backup, and recognized by restore/move alongside__vector_/__vector_entry/__vector_dead.int4etc. rejected at schema-alter time);int8currently requires 32-bit vectors.Tests
tok/index): round-trip error bounds, distance-vs-exact, recall@10=0.99, header/version/NaN/Inf validation, benchmarks (3.8× smaller).tok/hnsw): option parsing + float-width guard; quantizedSearchWithOptionsreturns the true nearest neighbor with neighbors served only from quantized blobs;Insertpersists a round-tripping__vector_q; corrupt + wrong-dim blobs (incl. the entry node) fall back to raw and search still returns the true NN without panic.schema):quantize:"int8"parses end-to-end;int4rejected at parse.-race; full build clean.Not yet validated (needs a cluster — flagging for reviewers)
export/backup/restore round-trip, drop-index leak scan, restart persistence, and end-to-end recall at production scale. The lifecycle code is in place; these paths should be exercised on a live cluster before release.
Notes
This is the scale/disk lever; it is not a fix for the separate in-memory index-build OOM (that needs a disk-backed build). Exact final re-ranking is deferred to a follow-up (flag-gated).
🤖 Generated with Claude Code