Skip to content

feat(hnsw): opt-in int8 scalar quantization for vector indexes#9733

Open
shaunpatterson wants to merge 4 commits into
dgraph-io:mainfrom
shaunpatterson:sp/vector-quantization
Open

feat(hnsw): opt-in int8 scalar quantization for vector indexes#9733
shaunpatterson wants to merge 4 commits into
dgraph-io:mainfrom
shaunpatterson:sp/vector-quantization

Conversation

@shaunpatterson

Copy link
Copy Markdown
Contributor

What

Adds opt-in int8 scalar quantization to the persistent HNSW vector index, enabled per-index with @index(hnsw(..., quantize:"int8")). Off by default — existing indexes are byte-for-byte unchanged.

A per-vector affine uint8 quantization of each vector is stored in a new internal predicate __vector_q; the raw float32 vectors in the user predicate are left untouched (no lossy reads, exact re-rank remains possible later). During search/build the quantized blob is dequantized into a reused buffer and fed to the existing vek SIMD distance kernels, so distance/HNSW code is unchanged.

  • ~3.8× smaller vectors on disk and in the read cache (392 vs 1536 bytes at 384 dims).
  • Quantization codec recall@10 = 0.99 on a 2k×384 synthetic corpus.

Why

Raw float32 vectors dominate the on-disk/read-cache footprint of large vector indexes. int8 scalar quantization is the standard first step to scale ANN indexes (Faiss SQ8, HNSWlib), trading a small, well-bounded recall cost for a ~4× reduction.

Design (validated over multiple gpt5+gemini review rounds)

  • Separate __vector_q predicate, raw kept — never lossy for users.
  • Dequantize → reused buffer → existing SIMD (asymmetric: full-precision query × quantized vector; build uses the same path as search).
  • Per-vector affine uint8, stateless (no training pass; robust to distribution shift).
  • Versioned, validated blob: [magic|version|codec|flags|u16 dim|f32 min|f32 step|codes]; NaN/Inf sanitized at encode; malformed/foreign blobs rejected.
  • Dimension guard: the index dim is learned only from full-precision vectors; a quantized blob is used only when its decoded length matches — a corrupt/wrong-dim blob can neither poison the dim nor reach the SIMD kernels (falls back to raw).
  • Lifecycle: __vector_q is dropped on rebuild, enumerated for backup, and recognized by restore/move alongside __vector_/__vector_entry/__vector_dead.
  • Option value validated at the option layer (int4 etc. rejected at schema-alter time); int8 currently requires 32-bit vectors.

Tests

  • Codec (tok/index): round-trip error bounds, distance-vs-exact, recall@10=0.99, header/version/NaN/Inf validation, benchmarks (3.8× smaller).
  • HNSW (tok/hnsw): option parsing + float-width guard; quantized SearchWithOptions returns the true nearest neighbor with neighbors served only from quantized blobs; Insert persists a round-tripping __vector_q; corrupt + wrong-dim blobs (incl. the entry node) fall back to raw and search still returns the true NN without panic.
  • Schema (schema): quantize:"int8" parses end-to-end; int4 rejected at parse.
  • All suites pass under -race; full build clean.

Not yet validated (needs a cluster — flagging for reviewers)

export/backup/restore round-trip, drop-index leak scan, restart persistence, and end-to-end recall at production scale. The lifecycle code is in place; these paths should be exercised on a live cluster before release.

Notes

This is the scale/disk lever; it is not a fix for the separate in-memory index-build OOM (that needs a disk-backed build). Exact final re-ranking is deferred to a follow-up (flag-gated).

🤖 Generated with Claude Code

shaunpatterson and others added 4 commits June 8, 2026 12:58
Self-contained affine (per-vector min/step) uint8 quantizer with an
asymmetric distance API (full-precision query vs quantized vector): encode
to a self-describing blob (~3.9x smaller than float32 at 384 dims),
dequantize, and AsymSquaredL2/AsymDot/AsymCosine.

Tested: round-trip error bounds, distance approximation vs exact, edge
cases (empty/constant/malformed), and recall@10 = 0.99 over a 2k x 384
synthetic corpus. Benchmarks: 3.92x smaller; asym scalar distance ~450ns
vs ~124ns SIMD float32 baseline (memory win, distance is scalar for now).

This is the reusable core; HNSW integration follows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Incorporates 3-round gpt5/gemini design review of the codec:
- Versioned self-describing header [magic|version|codec|flags|u16 dim|
  f32 min|f32 step|codes] with strict validation (magic/version/codec/
  exact-length) so a malformed/foreign blob is rejected, never coerced.
- Sanitize NaN/Inf at encode time (NaN->0, +/-Inf->+/-MaxFloat32) so a
  non-finite value never reaches a distance and corrupts HNSW ordering.
- Affine reconstruction done in float64 then cast, avoiding float32
  intermediate overflow on huge ranges; distances may still legitimately
  overflow to +Inf for absurd inputs but never NaN.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wires the int8 scalar quantization codec into the persistent HNSW index as
a per-index opt-in (`quantize:"int8"` index option). Off by default —
existing indexes are byte-for-byte unchanged and never read the new
predicate.

- Factory/options: new "quantize" string option (AddStringOption);
  applyOptions validates it (only "int8") and requires 32-bit float
  vectors; GetPersistantOptions round-trips it. New per-index predicate
  __vector_q (vecQKey) holds the quantized blobs.
- Write: insertHelper persists QuantizeFloat32(inVec) to __vector_q[uid]
  up front (before the node can be read as a neighbor), atomically via the
  txn — raw float vectors in the user predicate are left untouched.
- Read: getVecFromUid, when quantized, reads __vector_q and dequantizes
  into the caller's reused buffer, then all existing SIMD distance/HNSW
  code runs unchanged on float32. Falls back to the raw vector on a
  missing/undecodable blob (graceful degradation).
- index.DequantizeInto[T] is the generic hot-path decoder.

Design validated over 3 review rounds (gpt5 + gemini): separate predicate
keeping raw, dequant->SIMD (not scalar asym), asymmetric build distance,
versioned/validated blob, NaN/Inf sanitize, fallback-not-fail on miss.

Tests: option parsing + float-width guard; quantized SearchWithOptions
finds the true nearest neighbor (same graph/vectors as the non-quantized
recall test); Insert persists a round-tripping __vector_q blob. Full tok
suite and dgraph build pass. Codec recall@10=0.99, ~3.8x smaller.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hardening from a deep gpt5+gemini review of the int8 quantization feature.

Lifecycle for the new __vector_q predicate (so it isn't leaked/lost):
- posting/index.go: drop its prefix on index rebuild (distinct predicate
  from __vector_, not covered by that prefix).
- schema/schema.go: include it in a predicate's internal-pred enumeration.
- worker/backup.go: include it in the per-predicate vector-pred set.
- worker/online_restore.go, worker/restore_map.go: recognize its suffix.
  (The strings.Contains(__vector_) sites in export/mutation/predicate_move
  already match "__vector_q" as a substring, intentionally — they must also
  catch leveled "__vector_<i>" predicates, which HasSuffix would miss.)

Correctness/safety:
- Dimension guard: persistentHNSW learns the vector dim lazily from
  full-precision (raw) vectors only; a quantized blob is accepted solely
  when its decoded length matches that trusted dim, else it falls back to
  raw. A corrupt/wrong-dim blob can neither poison the dim nor feed a
  wrong-length slice to the SIMD distance kernels.
- quantize option value is validated at the option layer (AddCustomOption)
  so a bad value (e.g. "int4") is rejected when the schema is altered.
- writeQuantizedVec: avoid the []T->[]float32 copy when T is already
  float32 (the only supported width).

Schema: `@index(hnsw(quantize:"int8"))` flows end-to-end via the generic
option mechanism (verified by a schema-parse test; "int4" rejected).

Tests: schema parse (int8 accepted / int4 rejected); corrupt-and
wrong-dim __vector_q blobs (incl. the entry node) fall back to raw and
search still returns the true NN without panic; suites pass under -race.

Cluster-only validations still pending (noted for review): export/backup/
restore round-trip, drop-index leak scan, restart persistence, scale recall.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shaunpatterson shaunpatterson requested a review from a team as a code owner June 8, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant