feat(hnsw): opt-in int8 scalar quantization for vector indexes by shaunpatterson · Pull Request #9733 · dgraph-io/dgraph

shaunpatterson · 2026-06-08T18:03:42Z

What

Adds opt-in int8 scalar quantization to the persistent HNSW vector index, enabled per-index with @index(hnsw(..., quantize:"int8")). Off by default — existing indexes are byte-for-byte unchanged.

A per-vector affine uint8 quantization of each vector is stored in a new internal predicate __vector_q; the raw float32 vectors in the user predicate are left untouched (no lossy reads, exact re-rank remains possible later). During search/build the quantized blob is dequantized into a reused buffer and fed to the existing vek SIMD distance kernels, so distance/HNSW code is unchanged.

~3.8× smaller vectors on disk and in the read cache (392 vs 1536 bytes at 384 dims).
Quantization codec recall@10 = 0.99 on a 2k×384 synthetic corpus.

Why

Raw float32 vectors dominate the on-disk/read-cache footprint of large vector indexes. int8 scalar quantization is the standard first step to scale ANN indexes (Faiss SQ8, HNSWlib), trading a small, well-bounded recall cost for a ~4× reduction.

Design (validated over multiple gpt5+gemini review rounds)

Separate __vector_q predicate, raw kept — never lossy for users.
Dequantize → reused buffer → existing SIMD (asymmetric: full-precision query × quantized vector; build uses the same path as search).
Per-vector affine uint8, stateless (no training pass; robust to distribution shift).
Versioned, validated blob: [magic|version|codec|flags|u16 dim|f32 min|f32 step|codes]; NaN/Inf sanitized at encode; malformed/foreign blobs rejected.
Dimension guard: the index dim is learned only from full-precision vectors; a quantized blob is used only when its decoded length matches — a corrupt/wrong-dim blob can neither poison the dim nor reach the SIMD kernels (falls back to raw).
Lifecycle: __vector_q is dropped on rebuild, enumerated for backup, and recognized by restore/move alongside __vector_/__vector_entry/__vector_dead.
Option value validated at the option layer (int4 etc. rejected at schema-alter time); int8 currently requires 32-bit vectors.

Tests

Codec (tok/index): round-trip error bounds, distance-vs-exact, recall@10=0.99, header/version/NaN/Inf validation, benchmarks (3.8× smaller).
HNSW (tok/hnsw): option parsing + float-width guard; quantized SearchWithOptions returns the true nearest neighbor with neighbors served only from quantized blobs; Insert persists a round-tripping __vector_q; corrupt + wrong-dim blobs (incl. the entry node) fall back to raw and search still returns the true NN without panic.
Schema (schema): quantize:"int8" parses end-to-end; int4 rejected at parse.
All suites pass under -race; full build clean.

Not yet validated (needs a cluster — flagging for reviewers)

export/backup/restore round-trip, drop-index leak scan, restart persistence, and end-to-end recall at production scale. The lifecycle code is in place; these paths should be exercised on a live cluster before release.

Notes

This is the scale/disk lever; it is not a fix for the separate in-memory index-build OOM (that needs a disk-backed build). Exact final re-ranking is deferred to a follow-up (flag-gated).

🤖 Generated with Claude Code

Self-contained affine (per-vector min/step) uint8 quantizer with an asymmetric distance API (full-precision query vs quantized vector): encode to a self-describing blob (~3.9x smaller than float32 at 384 dims), dequantize, and AsymSquaredL2/AsymDot/AsymCosine. Tested: round-trip error bounds, distance approximation vs exact, edge cases (empty/constant/malformed), and recall@10 = 0.99 over a 2k x 384 synthetic corpus. Benchmarks: 3.92x smaller; asym scalar distance ~450ns vs ~124ns SIMD float32 baseline (memory win, distance is scalar for now). This is the reusable core; HNSW integration follows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Incorporates 3-round gpt5/gemini design review of the codec: - Versioned self-describing header [magic|version|codec|flags|u16 dim| f32 min|f32 step|codes] with strict validation (magic/version/codec/ exact-length) so a malformed/foreign blob is rejected, never coerced. - Sanitize NaN/Inf at encode time (NaN->0, +/-Inf->+/-MaxFloat32) so a non-finite value never reaches a distance and corrupts HNSW ordering. - Affine reconstruction done in float64 then cast, avoiding float32 intermediate overflow on huge ranges; distances may still legitimately overflow to +Inf for absurd inputs but never NaN. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Wires the int8 scalar quantization codec into the persistent HNSW index as a per-index opt-in (`quantize:"int8"` index option). Off by default — existing indexes are byte-for-byte unchanged and never read the new predicate. - Factory/options: new "quantize" string option (AddStringOption); applyOptions validates it (only "int8") and requires 32-bit float vectors; GetPersistantOptions round-trips it. New per-index predicate __vector_q (vecQKey) holds the quantized blobs. - Write: insertHelper persists QuantizeFloat32(inVec) to __vector_q[uid] up front (before the node can be read as a neighbor), atomically via the txn — raw float vectors in the user predicate are left untouched. - Read: getVecFromUid, when quantized, reads __vector_q and dequantizes into the caller's reused buffer, then all existing SIMD distance/HNSW code runs unchanged on float32. Falls back to the raw vector on a missing/undecodable blob (graceful degradation). - index.DequantizeInto[T] is the generic hot-path decoder. Design validated over 3 review rounds (gpt5 + gemini): separate predicate keeping raw, dequant->SIMD (not scalar asym), asymmetric build distance, versioned/validated blob, NaN/Inf sanitize, fallback-not-fail on miss. Tests: option parsing + float-width guard; quantized SearchWithOptions finds the true nearest neighbor (same graph/vectors as the non-quantized recall test); Insert persists a round-tripping __vector_q blob. Full tok suite and dgraph build pass. Codec recall@10=0.99, ~3.8x smaller. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Hardening from a deep gpt5+gemini review of the int8 quantization feature. Lifecycle for the new __vector_q predicate (so it isn't leaked/lost): - posting/index.go: drop its prefix on index rebuild (distinct predicate from __vector_, not covered by that prefix). - schema/schema.go: include it in a predicate's internal-pred enumeration. - worker/backup.go: include it in the per-predicate vector-pred set. - worker/online_restore.go, worker/restore_map.go: recognize its suffix. (The strings.Contains(__vector_) sites in export/mutation/predicate_move already match "__vector_q" as a substring, intentionally — they must also catch leveled "__vector_<i>" predicates, which HasSuffix would miss.) Correctness/safety: - Dimension guard: persistentHNSW learns the vector dim lazily from full-precision (raw) vectors only; a quantized blob is accepted solely when its decoded length matches that trusted dim, else it falls back to raw. A corrupt/wrong-dim blob can neither poison the dim nor feed a wrong-length slice to the SIMD distance kernels. - quantize option value is validated at the option layer (AddCustomOption) so a bad value (e.g. "int4") is rejected when the schema is altered. - writeQuantizedVec: avoid the []T->[]float32 copy when T is already float32 (the only supported width). Schema: `@index(hnsw(quantize:"int8"))` flows end-to-end via the generic option mechanism (verified by a schema-parse test; "int4" rejected). Tests: schema parse (int8 accepted / int4 rejected); corrupt-and wrong-dim __vector_q blobs (incl. the entry node) fall back to raw and search still returns the true NN without panic; suites pass under -race. Cluster-only validations still pending (noted for review): export/backup/ restore round-trip, drop-index leak scan, restart persistence, scale recall. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

shaunpatterson and others added 4 commits June 8, 2026 12:58

shaunpatterson requested a review from a team as a code owner June 8, 2026 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hnsw): opt-in int8 scalar quantization for vector indexes#9733

feat(hnsw): opt-in int8 scalar quantization for vector indexes#9733
shaunpatterson wants to merge 4 commits into
dgraph-io:mainfrom
shaunpatterson:sp/vector-quantization

shaunpatterson commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

shaunpatterson commented Jun 8, 2026

What

Why

Design (validated over multiple gpt5+gemini review rounds)

Tests

Not yet validated (needs a cluster — flagging for reviewers)

Notes

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant