32× smaller than float32 RAG · 48× smaller than HNSW · 96× with BGE-base PCA · perfect recall on multimodal data. No GPU. No vector database.
Patent Pending · AU 2026904283 Built by Sai Kiran Bathula · Coleambally, NSW, Australia
🔗 Try the live demo → nodemind.space 📊 Real-world benchmark · Multimodal benchmark · BEIR-Combined benchmark · Interactive page
NodeMind is a public demo and benchmark page for QLNI-style binary fingerprint compression for RAG document retrieval.
This repo lets users inspect:
- corpus size
- float/vector-style representation size
- NodeMind compressed representation size
- interactive benchmark comparisons
- downloadable artifacts / links for verification
This is not a large LLM repository. It is a compact proof layer showing RAG document indexes can be compressed 32–96× without losing retrieval quality.
- Open the interactive benchmark page.
- Compare corpus vs float/vector vs NodeMind size.
- Download the artifacts.
- Follow the HiveMind link at the bottom for the broader architecture.
When you index documents for RAG (Retrieval-Augmented Generation), your data expands ~10× in size:
- A 1 GB document collection → ~10 GB float32 vector index
- A 100 GB collection → ~1 TB in your vector database
- Requires expensive GPUs for fast cosine similarity search at scale
- Requires a managed vector database (Pinecone, Weaviate, Qdrant) running 24/7
This is the standard industry approach. NodeMind replaces it entirely.
NodeMind converts float32 RAG embeddings into compact binary fingerprints using our patent-pending codec, then searches them using Multi-Index Hashing (MIH) — pure integer arithmetic, no GPU, no vector DB. The result is a single portable .pkl file you can run on any CPU.
| Original Documents | RAG Index (float32 · ~10× expansion) | NodeMind Index (binary · 32× smaller) | Annual Savings vs Managed VDB |
|---|---|---|---|
| 1 GB | ~10 GB | ~310 MB | $290 / yr |
| 10 GB | ~100 GB | ~3.1 GB | $2,940 / yr |
| 100 GB | ~1 TB | ~31 GB | $29,400 / yr |
| 1 TB | ~10 TB | ~310 GB | $294,000 / yr |
Costs use S3 Standard ($0.023/GB/mo) vs Pinecone managed vector DB ($2.50/GB/mo). RAG ~10× expansion confirmed by Elasticsearch, Pure Storage, and Milvus benchmarks.
| Metric | RAG (float32) | NodeMind (binary) |
|---|---|---|
| Index bytes per chunk (BGE-M3 1024-dim) | 4,096 B | 128 B |
| Compression vs float32 (BGE-M3) | 1× | 32× |
| Compression vs float32 (BGE-base + PCA-256) | 1× | 96× |
| Compression vs HNSW (incl. ~50% graph overhead) | 1× | 48× |
| Search algorithm | Cosine similarity — float multiply-accumulate | Hamming distance — POPCNT on 64-bit ints |
| GPU required (production scale) | Yes | No — pure CPU |
| Portable / offline | No — needs live vector DB | Yes — runs from a .pkl file |
All compression numbers are mathematical — verifiable with
os.path.getsize()on the downloadable indexes. Seebenchmarks/.
| Modality | Status | Compression vs float32 | Source |
|---|---|---|---|
| Text / Documents (PDF, TXT, MD) | ✅ Live | 32× / 48× / 96× | real-world benchmark |
| Images (Unsplash photos) | ✅ Tested | 32× — 128× | multimodal benchmark |
| Audio (ESC-10 environmental clips) | ✅ Tested | 32× — 128× | multimodal benchmark |
| Tables (structured rows) | ✅ Tested | 32× — 128× | multimodal benchmark |
| Code (Python / SQL / Bash) | ✅ Tested | 32× — 128× | multimodal benchmark |
| Video | 🔜 Coming soon | (transcript + frame embeddings) | — |
Three reproducible benchmarks. Download the indexes, check file size, verify the numbers yourself.
| Benchmark | Corpus | Embedder | Headline Result |
|---|---|---|---|
| Real-world (500K chunks) | Wikipedia + arXiv + Project Gutenberg (~168 MB raw → 500,000 chunks) | BGE-M3 1024-dim | Recall@5 = 1.000 · 32×–96× compression |
| Multimodal (200 items, 50 queries) | 5 modalities × 50 items each (text, image, audio, table, code) | BGE-Visualized-M3 1024-dim · Gemini RAG baseline | Recall@1 = 1.000 across every modality at every compression level |
| BEIR-Combined (75K chunks, 2,677 queries) | NFCorpus + SciFact + ArguAna + FiQA combined into one corpus, official BEIR qrels | BGE-M3 1024-dim · BGE-base 256-bit | 32× / 96× compression verified · beats FAISS Fixed Binary at every same-size comparison |
The first two benchmarks use self-retrieval style protocols (queries derived from corpus items). The third — BEIR-Combined — uses the official BEIR qrels for true end-to-end retrieval, no perturbation. Honest caveats live in each benchmark's own page.
Real-world (500,000 chunks). NodeMind matched exact float32 cosine — the retrieval gold standard — at Recall@5 = Recall@10 = 1.000. Same answers, 32× to 96× less storage. This is the most realistic production-scale test, and at that scale NodeMind gives up nothing on retrieval quality and saves everything on size and RAM.
Multimodal (text · image · audio · table · code). NodeMind hit Recall@1 = 1.000 on every modality at every compression level. The binary fingerprint format is encoder-agnostic, so the same approach that compresses BGE-M3 text embeddings also compresses image, audio, table, and code embeddings — without per-modality tuning.
BEIR-Combined (75,128 chunks, 2,677 official queries, official qrels). The most academically rigorous test — no perturbation, no self-retrieval. At the 96× cell NodeMind beats FAISS Fixed Binary on every single dataset (NFCorpus +12%, SciFact +6%, ArguAna +16%, FiQA +36%, combined +27% relative R@10) at the same 2.4 MB index size. Float32 cosine still has higher absolute recall on these small academic slices (3K–58K chunks each); the trend across corpus sizes is clear — the closer we get to production scale, the smaller the gap, until at 500K it closes completely.
The honest comparison: FAISS configurations that preserve high recall don't really compress. Flat float32 and IVF-Flat store the full vectors. HNSW float32 stores the vectors plus a graph on top — ~1.5× the float32 size, so 0× compression with overhead. To compress with FAISS you switch to Product Quantization (FAISS-PQ), which trades recall for size. At configurations that keep recall close to exact cosine, FAISS-PQ practical compression sits around 4×–8× (the FAISS wiki, Pinecone, and Milvus all document this trade-off). Push PQ further — say 32× compression — and you accept a meaningful recall drop versus exact float32.
NodeMind delivers 32× to 96× compression while matching exact float32 recall on the 500K-chunk real-world corpus. That is 4×–24× more compression than the FAISS configurations that preserve recall, at recall that meets or exceeds them — and 96× more compression than HNSW float32, which preserves recall but doesn't compress at all. Index size is verifiable with os.path.getsize() on the downloads; recall is verifiable against the float32 baselines that are also published alongside.
Document (PDF / TXT / MD / image / audio / table / code)
│
▼ embed
BGE-M3 / BGE-Visualized-M3 (1024-dim float32)
│
▼ NodeMind binary codec ← patent pending
1024-bit binary fingerprint (128 bytes vs 4,096 bytes)
│
▼ build MIH index
64 sub-tables · 16-bit keys → sub-linear Hamming search
│
▼
Portable .pkl → run anywhere, any CPU
State-of-the-art multilingual embedding model. 1024-dimensional dense vectors. Runs on community hardware — RTX 3080-class GPU with 128 GB system RAM. No datacenter, no $2/hr A100, no cloud API.
Each 4,096-byte float32 embedding is transformed into a 128-byte (1024-bit) binary fingerprint using our patent-pending algorithm. This is not standard binary quantization (which gives 32× at ~5% quality loss and breaks down on out-of-distribution queries). Our codec is integer-only, deterministic, and produces fingerprints with recall that beats fixed-threshold binary baselines on real BEIR queries. Result: 32× online · 48× vs HNSW · up to 96× with PCA-256 · up to 128× on multimodal.
(The full algorithm is a trade secret protected under AU 2026904283. The downloadable .pkl indexes are self-contained — verify compression and run queries without reading the patent.)
Once each chunk is a 1024-bit fingerprint, NodeMind replaces the heavy float-vector search with a lightning-fast bit-comparison task:
- Hamming distance, not cosine. Standard RAG runs cosine similarity —
O(N · D)float multiply on 1024-dim float32 vectors. NodeMind uses Hamming distance via POPCNT on 64-bit integers — pure integer arithmetic. - Multi-Index Hashing. The 1024-bit fingerprint is split into 64 sub-strings of 16 bits each. Each sub-string is its own hash table.
- Sub-linear lookup. At query time the system does exact table lookups across the sub-tables and merges the candidate sets, giving sub-linear exact Hamming nearest-neighbour search — the query never has to touch the whole corpus.
- No GPU. Because the search is integer-only, it runs entirely on the CPU — no float math, no FAISS, no HNSW, no ANN library, no GPU transfer cost.
| Search algorithm | RAG (float32) | NodeMind (binary) |
|---|---|---|
| Operation per chunk | Cosine — O(N·D) float multiply | XOR + POPCNT on 64-bit ints |
| Search speed | Baseline | ~75× faster ¹ |
| GPU required at scale | Yes | No — pure CPU |
¹ Asymptotic speedup, measured on corpora > 100,000 chunks. Below ~10,000 chunks both indexes hit a sub-millisecond latency floor.
(MIH structure follows Norouzi et al. CVPR 2012. The novel contribution — patent-pending binarisation + portable single-file index format — is covered under AU 2026904283.)
In earlier research using BGE-base (768-dim) we pushed compression to 96× with PCA-256, and the multimodal benchmark validates 128× at NM-256. We chose BGE-M3 at 32× / 48× as the sweet spot for the production text pipeline because it actually outperforms practical HNSW deployments while being 32× smaller. The compression ceiling depends on the encoder and the corpus — code-, structure-, and image-rich corpora compress further than pure prose.
Visit nodemind.space to:
- Sign in with Google (one click — no password, no email round-trip)
- Upload any PDF, TXT, or Markdown file (10 MB per file, 50 MB lifetime per account)
- Watch it index on community hardware — typical 5,500-page PDF indexes in ~7 minutes on an RTX 3080
- Download your NodeMind binary index + RAG float32 index
- Query both side-by-side and compare speed, size, and results
- HNSW index size includes the float32 vectors. Standard FAISS HNSW stores raw float32 with a graph on top — that is why HNSW indexes are ~1.5× the float32 size. NodeMind's binary index does not store float32 at all; you can throw the vectors away after indexing.
- Float32 cosine still wins on raw recall at small corpus sizes. Exact cosine search is the best retrieval if the entire float32 index fits in your RAM and cost is not a concern. NodeMind's value is storage and RAM at scale: at the 500K-chunk Real-world benchmark NodeMind matches float32 cosine at Recall@5 = Recall@10 = 1.000 at 32×–96× less storage. On the smaller per-dataset BEIR-Combined slices (3K–58K chunks) NodeMind closes the gap as corpus size grows and beats FAISS Fixed Binary at every same-size cell.
| Patent | Number | Status | Covers |
|---|---|---|---|
| NodeMind Codec & Index | AU 2026904283 | Provisional | Patent-pending binarisation method + portable single-file binary fingerprint index format |
Filed at IP Australia. Inventor: Sai Kiran Bathula, independent researcher, Coleambally NSW, Australia.
We did the hard math on document compression. NodeMind is shipped. The next big bet is HiveMind — a public AI reasoning network where humans and agents leave compressed reasoning traces, register watches on ideas, surface contradictions, and connect tools through shared memory. Funded by NodeMind revenue — every NodeMind customer helps us build the next layer.
Happy to work with small startups on tight budgets. NodeMind makes vector-database-class retrieval affordable at any scale.
🌐 Concept site: nodemind.space/hivemind 🐦 Updates: @QLNI_AI 📧 Licensing & enterprise: saikiranbathula1@gmail.com
© 2026 Sai Kiran Bathula. Patent Pending AU 2026904283.