Skip to content

QLNI/NodeMind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NodeMind — Binary Document Intelligence

Follow on X GitHub stars Live Demo Patent Pending

32× smaller than float32 RAG · 48× smaller than HNSW · 96× with BGE-base PCA · perfect recall on multimodal data. No GPU. No vector database.

Patent Pending · AU 2026904283 Built by Sai Kiran Bathula · Coleambally, NSW, Australia

🔗 Try the live demo → nodemind.space 📊 Real-world benchmark · Multimodal benchmark · BEIR-Combined benchmark · Interactive page


What this repo is

NodeMind is a public demo and benchmark page for QLNI-style binary fingerprint compression for RAG document retrieval.

This repo lets users inspect:

  • corpus size
  • float/vector-style representation size
  • NodeMind compressed representation size
  • interactive benchmark comparisons
  • downloadable artifacts / links for verification

This is not a large LLM repository. It is a compact proof layer showing RAG document indexes can be compressed 32–96× without losing retrieval quality.

What to check first

  1. Open the interactive benchmark page.
  2. Compare corpus vs float/vector vs NodeMind size.
  3. Download the artifacts.
  4. Follow the HiveMind link at the bottom for the broader architecture.

The Problem with RAG at Scale

When you index documents for RAG (Retrieval-Augmented Generation), your data expands ~10× in size:

  • A 1 GB document collection → ~10 GB float32 vector index
  • A 100 GB collection → ~1 TB in your vector database
  • Requires expensive GPUs for fast cosine similarity search at scale
  • Requires a managed vector database (Pinecone, Weaviate, Qdrant) running 24/7

This is the standard industry approach. NodeMind replaces it entirely.


NodeMind's Approach

NodeMind converts float32 RAG embeddings into compact binary fingerprints using our patent-pending codec, then searches them using Multi-Index Hashing (MIH) — pure integer arithmetic, no GPU, no vector DB. The result is a single portable .pkl file you can run on any CPU.

Original Documents RAG Index (float32 · ~10× expansion) NodeMind Index (binary · 32× smaller) Annual Savings vs Managed VDB
1 GB ~10 GB ~310 MB $290 / yr
10 GB ~100 GB ~3.1 GB $2,940 / yr
100 GB ~1 TB ~31 GB $29,400 / yr
1 TB ~10 TB ~310 GB $294,000 / yr

Costs use S3 Standard ($0.023/GB/mo) vs Pinecone managed vector DB ($2.50/GB/mo). RAG ~10× expansion confirmed by Elasticsearch, Pure Storage, and Milvus benchmarks.


Performance — Head-to-Head

Metric RAG (float32) NodeMind (binary)
Index bytes per chunk (BGE-M3 1024-dim) 4,096 B 128 B
Compression vs float32 (BGE-M3) 32×
Compression vs float32 (BGE-base + PCA-256) 96×
Compression vs HNSW (incl. ~50% graph overhead) 48×
Search algorithm Cosine similarity — float multiply-accumulate Hamming distance — POPCNT on 64-bit ints
GPU required (production scale) Yes No — pure CPU
Portable / offline No — needs live vector DB Yes — runs from a .pkl file

All compression numbers are mathematical — verifiable with os.path.getsize() on the downloadable indexes. See benchmarks/.


Modalities

Modality Status Compression vs float32 Source
Text / Documents (PDF, TXT, MD) Live 32× / 48× / 96× real-world benchmark
Images (Unsplash photos) Tested 32× — 128× multimodal benchmark
Audio (ESC-10 environmental clips) Tested 32× — 128× multimodal benchmark
Tables (structured rows) Tested 32× — 128× multimodal benchmark
Code (Python / SQL / Bash) Tested 32× — 128× multimodal benchmark
Video 🔜 Coming soon (transcript + frame embeddings)

Benchmarks

Three reproducible benchmarks. Download the indexes, check file size, verify the numbers yourself.

Benchmark Corpus Embedder Headline Result
Real-world (500K chunks) Wikipedia + arXiv + Project Gutenberg (~168 MB raw → 500,000 chunks) BGE-M3 1024-dim Recall@5 = 1.000 · 32×–96× compression
Multimodal (200 items, 50 queries) 5 modalities × 50 items each (text, image, audio, table, code) BGE-Visualized-M3 1024-dim · Gemini RAG baseline Recall@1 = 1.000 across every modality at every compression level
BEIR-Combined (75K chunks, 2,677 queries) NFCorpus + SciFact + ArguAna + FiQA combined into one corpus, official BEIR qrels BGE-M3 1024-dim · BGE-base 256-bit 32× / 96× compression verified · beats FAISS Fixed Binary at every same-size comparison

The first two benchmarks use self-retrieval style protocols (queries derived from corpus items). The third — BEIR-Combined — uses the official BEIR qrels for true end-to-end retrieval, no perturbation. Honest caveats live in each benchmark's own page.

What these three benchmarks say, in plain English

Real-world (500,000 chunks). NodeMind matched exact float32 cosine — the retrieval gold standard — at Recall@5 = Recall@10 = 1.000. Same answers, 32× to 96× less storage. This is the most realistic production-scale test, and at that scale NodeMind gives up nothing on retrieval quality and saves everything on size and RAM.

Multimodal (text · image · audio · table · code). NodeMind hit Recall@1 = 1.000 on every modality at every compression level. The binary fingerprint format is encoder-agnostic, so the same approach that compresses BGE-M3 text embeddings also compresses image, audio, table, and code embeddings — without per-modality tuning.

BEIR-Combined (75,128 chunks, 2,677 official queries, official qrels). The most academically rigorous test — no perturbation, no self-retrieval. At the 96× cell NodeMind beats FAISS Fixed Binary on every single dataset (NFCorpus +12%, SciFact +6%, ArguAna +16%, FiQA +36%, combined +27% relative R@10) at the same 2.4 MB index size. Float32 cosine still has higher absolute recall on these small academic slices (3K–58K chunks each); the trend across corpus sizes is clear — the closer we get to production scale, the smaller the gap, until at 500K it closes completely.

Why NodeMind's compression is in a different class than FAISS

The honest comparison: FAISS configurations that preserve high recall don't really compress. Flat float32 and IVF-Flat store the full vectors. HNSW float32 stores the vectors plus a graph on top — ~1.5× the float32 size, so 0× compression with overhead. To compress with FAISS you switch to Product Quantization (FAISS-PQ), which trades recall for size. At configurations that keep recall close to exact cosine, FAISS-PQ practical compression sits around 4×–8× (the FAISS wiki, Pinecone, and Milvus all document this trade-off). Push PQ further — say 32× compression — and you accept a meaningful recall drop versus exact float32.

NodeMind delivers 32× to 96× compression while matching exact float32 recall on the 500K-chunk real-world corpus. That is 4×–24× more compression than the FAISS configurations that preserve recall, at recall that meets or exceeds them — and 96× more compression than HNSW float32, which preserves recall but doesn't compress at all. Index size is verifiable with os.path.getsize() on the downloads; recall is verifiable against the float32 baselines that are also published alongside.


How It Works

Document (PDF / TXT / MD / image / audio / table / code)
        │
        ▼  embed
BGE-M3 / BGE-Visualized-M3 (1024-dim float32)
        │
        ▼  NodeMind binary codec   ← patent pending
1024-bit binary fingerprint  (128 bytes vs 4,096 bytes)
        │
        ▼  build MIH index
64 sub-tables · 16-bit keys  →  sub-linear Hamming search
        │
        ▼
Portable .pkl  →  run anywhere, any CPU

Stage 1 — BGE-M3 Embeddings

State-of-the-art multilingual embedding model. 1024-dimensional dense vectors. Runs on community hardware — RTX 3080-class GPU with 128 GB system RAM. No datacenter, no $2/hr A100, no cloud API.

Stage 2 — Proprietary Binary Codec

Each 4,096-byte float32 embedding is transformed into a 128-byte (1024-bit) binary fingerprint using our patent-pending algorithm. This is not standard binary quantization (which gives 32× at ~5% quality loss and breaks down on out-of-distribution queries). Our codec is integer-only, deterministic, and produces fingerprints with recall that beats fixed-threshold binary baselines on real BEIR queries. Result: 32× online · 48× vs HNSW · up to 96× with PCA-256 · up to 128× on multimodal. (The full algorithm is a trade secret protected under AU 2026904283. The downloadable .pkl indexes are self-contained — verify compression and run queries without reading the patent.)

Stage 3 — Multi-Index Hashing (MIH) → ~75× faster retrieval

Once each chunk is a 1024-bit fingerprint, NodeMind replaces the heavy float-vector search with a lightning-fast bit-comparison task:

  • Hamming distance, not cosine. Standard RAG runs cosine similarity — O(N · D) float multiply on 1024-dim float32 vectors. NodeMind uses Hamming distance via POPCNT on 64-bit integers — pure integer arithmetic.
  • Multi-Index Hashing. The 1024-bit fingerprint is split into 64 sub-strings of 16 bits each. Each sub-string is its own hash table.
  • Sub-linear lookup. At query time the system does exact table lookups across the sub-tables and merges the candidate sets, giving sub-linear exact Hamming nearest-neighbour search — the query never has to touch the whole corpus.
  • No GPU. Because the search is integer-only, it runs entirely on the CPU — no float math, no FAISS, no HNSW, no ANN library, no GPU transfer cost.
Search algorithm RAG (float32) NodeMind (binary)
Operation per chunk Cosine — O(N·D) float multiply XOR + POPCNT on 64-bit ints
Search speed Baseline ~75× faster ¹
GPU required at scale Yes No — pure CPU

¹ Asymptotic speedup, measured on corpora > 100,000 chunks. Below ~10,000 chunks both indexes hit a sub-millisecond latency floor.

(MIH structure follows Norouzi et al. CVPR 2012. The novel contribution — patent-pending binarisation + portable single-file index format — is covered under AU 2026904283.)

Why 48× and not 96× or 128×?

In earlier research using BGE-base (768-dim) we pushed compression to 96× with PCA-256, and the multimodal benchmark validates 128× at NM-256. We chose BGE-M3 at 32× / 48× as the sweet spot for the production text pipeline because it actually outperforms practical HNSW deployments while being 32× smaller. The compression ceiling depends on the encoder and the corpus — code-, structure-, and image-rich corpora compress further than pure prose.


Live Demo

Visit nodemind.space to:

  1. Sign in with Google (one click — no password, no email round-trip)
  2. Upload any PDF, TXT, or Markdown file (10 MB per file, 50 MB lifetime per account)
  3. Watch it index on community hardware — typical 5,500-page PDF indexes in ~7 minutes on an RTX 3080
  4. Download your NodeMind binary index + RAG float32 index
  5. Query both side-by-side and compare speed, size, and results

Honest Caveats

  • HNSW index size includes the float32 vectors. Standard FAISS HNSW stores raw float32 with a graph on top — that is why HNSW indexes are ~1.5× the float32 size. NodeMind's binary index does not store float32 at all; you can throw the vectors away after indexing.
  • Float32 cosine still wins on raw recall at small corpus sizes. Exact cosine search is the best retrieval if the entire float32 index fits in your RAM and cost is not a concern. NodeMind's value is storage and RAM at scale: at the 500K-chunk Real-world benchmark NodeMind matches float32 cosine at Recall@5 = Recall@10 = 1.000 at 32×–96× less storage. On the smaller per-dataset BEIR-Combined slices (3K–58K chunks) NodeMind closes the gap as corpus size grows and beats FAISS Fixed Binary at every same-size cell.

Patents

Patent Number Status Covers
NodeMind Codec & Index AU 2026904283 Provisional Patent-pending binarisation method + portable single-file binary fingerprint index format

Filed at IP Australia. Inventor: Sai Kiran Bathula, independent researcher, Coleambally NSW, Australia.


What's Next — HiveMind

We did the hard math on document compression. NodeMind is shipped. The next big bet is HiveMind — a public AI reasoning network where humans and agents leave compressed reasoning traces, register watches on ideas, surface contradictions, and connect tools through shared memory. Funded by NodeMind revenue — every NodeMind customer helps us build the next layer.

Happy to work with small startups on tight budgets. NodeMind makes vector-database-class retrieval affordable at any scale.

🌐 Concept site: nodemind.space/hivemind 🐦 Updates: @QLNI_AI 📧 Licensing & enterprise: saikiranbathula1@gmail.com


© 2026 Sai Kiran Bathula. Patent Pending AU 2026904283.