NodeMind — Binary Document Intelligence

32× smaller than float32 RAG · 48× smaller than HNSW · 96× with BGE-base PCA · perfect recall on multimodal data. No GPU. No vector database.

Patent Pending · AU 2026904283 Built by Sai Kiran Bathula · Coleambally, NSW, Australia

🔗 Try the live demo → nodemind.space 📊 Real-world benchmark · Multimodal benchmark · BEIR-Combined benchmark · Interactive page

What this repo is

NodeMind is a public demo and benchmark page for QLNI-style binary fingerprint compression for RAG document retrieval.

This repo lets users inspect:

corpus size
float/vector-style representation size
NodeMind compressed representation size
interactive benchmark comparisons
downloadable artifacts / links for verification

This is not a large LLM repository. It is a compact proof layer showing RAG document indexes can be compressed 32–96× without losing retrieval quality.

What to check first

Open the interactive benchmark page.
Compare corpus vs float/vector vs NodeMind size.
Download the artifacts.
Follow the HiveMind link at the bottom for the broader architecture.

The Problem with RAG at Scale

When you index documents for RAG (Retrieval-Augmented Generation), your data expands ~10× in size:

A 1 GB document collection → ~10 GB float32 vector index
A 100 GB collection → ~1 TB in your vector database
Requires expensive GPUs for fast cosine similarity search at scale
Requires a managed vector database (Pinecone, Weaviate, Qdrant) running 24/7

This is the standard industry approach. NodeMind replaces it entirely.

NodeMind's Approach

NodeMind converts float32 RAG embeddings into compact binary fingerprints using our patent-pending codec, then searches them using Multi-Index Hashing (MIH) — pure integer arithmetic, no GPU, no vector DB. The result is a single portable .pkl file you can run on any CPU.

Original Documents	RAG Index (float32 · ~10× expansion)	NodeMind Index (binary · 32× smaller)	Annual Savings vs Managed VDB
1 GB	~10 GB	~310 MB	$290 / yr
10 GB	~100 GB	~3.1 GB	$2,940 / yr
100 GB	~1 TB	~31 GB	$29,400 / yr
1 TB	~10 TB	~310 GB	$294,000 / yr

Costs use S3 Standard ($0.023/GB/mo) vs Pinecone managed vector DB ($2.50/GB/mo). RAG ~10× expansion confirmed by Elasticsearch, Pure Storage, and Milvus benchmarks.

Performance — Head-to-Head

Metric	RAG (float32)	NodeMind (binary)
Index bytes per chunk (BGE-M3 1024-dim)	4,096 B	128 B
Compression vs float32 (BGE-M3)	1×	32×
Compression vs float32 (BGE-base + PCA-256)	1×	96×
Compression vs HNSW (incl. ~50% graph overhead)	1×	48×
Search algorithm	Cosine similarity — float multiply-accumulate	Hamming distance — POPCNT on 64-bit ints
GPU required (production scale)	Yes	No — pure CPU
Portable / offline	No — needs live vector DB	Yes — runs from a `.pkl` file

All compression numbers are mathematical — verifiable with os.path.getsize() on the downloadable indexes. See benchmarks/.

Modalities

Modality	Status	Compression vs float32	Source
Text / Documents (PDF, TXT, MD)	✅ Live	32× / 48× / 96×	real-world benchmark
Images (Unsplash photos)	✅ Tested	32× — 128×	multimodal benchmark
Audio (ESC-10 environmental clips)	✅ Tested	32× — 128×	multimodal benchmark
Tables (structured rows)	✅ Tested	32× — 128×	multimodal benchmark
Code (Python / SQL / Bash)	✅ Tested	32× — 128×	multimodal benchmark
Video	🔜 Coming soon	(transcript + frame embeddings)	—

Benchmarks

Three reproducible benchmarks. Download the indexes, check file size, verify the numbers yourself.

Benchmark	Corpus	Embedder	Headline Result
Real-world (500K chunks)	Wikipedia + arXiv + Project Gutenberg (~168 MB raw → 500,000 chunks)	BGE-M3 1024-dim	Recall@5 = 1.000 · 32×–96× compression
Multimodal (200 items, 50 queries)	5 modalities × 50 items each (text, image, audio, table, code)	BGE-Visualized-M3 1024-dim · Gemini RAG baseline	Recall@1 = 1.000 across every modality at every compression level
BEIR-Combined (75K chunks, 2,677 queries)	NFCorpus + SciFact + ArguAna + FiQA combined into one corpus, official BEIR qrels	BGE-M3 1024-dim · BGE-base 256-bit	32× / 96× compression verified · beats FAISS Fixed Binary at every same-size comparison

The first two benchmarks use self-retrieval style protocols (queries derived from corpus items). The third — BEIR-Combined — uses the official BEIR qrels for true end-to-end retrieval, no perturbation. Honest caveats live in each benchmark's own page.

What these three benchmarks say, in plain English

Real-world (500,000 chunks). NodeMind matched exact float32 cosine — the retrieval gold standard — at Recall@5 = Recall@10 = 1.000. Same answers, 32× to 96× less storage. This is the most realistic production-scale test, and at that scale NodeMind gives up nothing on retrieval quality and saves everything on size and RAM.

Multimodal (text · image · audio · table · code). NodeMind hit Recall@1 = 1.000 on every modality at every compression level. The binary fingerprint format is encoder-agnostic, so the same approach that compresses BGE-M3 text embeddings also compresses image, audio, table, and code embeddings — without per-modality tuning.

BEIR-Combined (75,128 chunks, 2,677 official queries, official qrels). The most academically rigorous test — no perturbation, no self-retrieval. At the 96× cell NodeMind beats FAISS Fixed Binary on every single dataset (NFCorpus +12%, SciFact +6%, ArguAna +16%, FiQA +36%, combined +27% relative R@10) at the same 2.4 MB index size. Float32 cosine still has higher absolute recall on these small academic slices (3K–58K chunks each); the trend across corpus sizes is clear — the closer we get to production scale, the smaller the gap, until at 500K it closes completely.

Why NodeMind's compression is in a different class than FAISS

The honest comparison: FAISS configurations that preserve high recall don't really compress. Flat float32 and IVF-Flat store the full vectors. HNSW float32 stores the vectors plus a graph on top — ~1.5× the float32 size, so 0× compression with overhead. To compress with FAISS you switch to Product Quantization (FAISS-PQ), which trades recall for size. At configurations that keep recall close to exact cosine, FAISS-PQ practical compression sits around 4×–8× (the FAISS wiki, Pinecone, and Milvus all document this trade-off). Push PQ further — say 32× compression — and you accept a meaningful recall drop versus exact float32.

NodeMind delivers 32× to 96× compression while matching exact float32 recall on the 500K-chunk real-world corpus. That is 4×–24× more compression than the FAISS configurations that preserve recall, at recall that meets or exceeds them — and 96× more compression than HNSW float32, which preserves recall but doesn't compress at all. Index size is verifiable with os.path.getsize() on the downloads; recall is verifiable against the float32 baselines that are also published alongside.

How It Works

Document (PDF / TXT / MD / image / audio / table / code)
        │
        ▼  embed
BGE-M3 / BGE-Visualized-M3 (1024-dim float32)
        │
        ▼  NodeMind binary codec   ← patent pending
1024-bit binary fingerprint  (128 bytes vs 4,096 bytes)
        │
        ▼  build MIH index
64 sub-tables · 16-bit keys  →  sub-linear Hamming search
        │
        ▼
Portable .pkl  →  run anywhere, any CPU

Stage 1 — BGE-M3 Embeddings

State-of-the-art multilingual embedding model. 1024-dimensional dense vectors. Runs on community hardware — RTX 3080-class GPU with 128 GB system RAM. No datacenter, no $2/hr A100, no cloud API.

Stage 2 — Proprietary Binary Codec

Each 4,096-byte float32 embedding is transformed into a 128-byte (1024-bit) binary fingerprint using our patent-pending algorithm. This is not standard binary quantization (which gives 32× at ~5% quality loss and breaks down on out-of-distribution queries). Our codec is integer-only, deterministic, and produces fingerprints with recall that beats fixed-threshold binary baselines on real BEIR queries. Result: 32× online · 48× vs HNSW · up to 96× with PCA-256 · up to 128× on multimodal. (The full algorithm is a trade secret protected under AU 2026904283. The downloadable .pkl indexes are self-contained — verify compression and run queries without reading the patent.)

Stage 3 — Multi-Index Hashing (MIH) → ~75× faster retrieval

Once each chunk is a 1024-bit fingerprint, NodeMind replaces the heavy float-vector search with a lightning-fast bit-comparison task:

Hamming distance, not cosine. Standard RAG runs cosine similarity — O(N · D) float multiply on 1024-dim float32 vectors. NodeMind uses Hamming distance via POPCNT on 64-bit integers — pure integer arithmetic.
Multi-Index Hashing. The 1024-bit fingerprint is split into 64 sub-strings of 16 bits each. Each sub-string is its own hash table.
Sub-linear lookup. At query time the system does exact table lookups across the sub-tables and merges the candidate sets, giving sub-linear exact Hamming nearest-neighbour search — the query never has to touch the whole corpus.
No GPU. Because the search is integer-only, it runs entirely on the CPU — no float math, no FAISS, no HNSW, no ANN library, no GPU transfer cost.

Search algorithm	RAG (float32)	NodeMind (binary)
Operation per chunk	Cosine — O(N·D) float multiply	XOR + POPCNT on 64-bit ints
Search speed	Baseline	~75× faster ¹
GPU required at scale	Yes	No — pure CPU

¹ Asymptotic speedup, measured on corpora > 100,000 chunks. Below ~10,000 chunks both indexes hit a sub-millisecond latency floor.

(MIH structure follows Norouzi et al. CVPR 2012. The novel contribution — patent-pending binarisation + portable single-file index format — is covered under AU 2026904283.)

Why 48× and not 96× or 128×?

In earlier research using BGE-base (768-dim) we pushed compression to 96× with PCA-256, and the multimodal benchmark validates 128× at NM-256. We chose BGE-M3 at 32× / 48× as the sweet spot for the production text pipeline because it actually outperforms practical HNSW deployments while being 32× smaller. The compression ceiling depends on the encoder and the corpus — code-, structure-, and image-rich corpora compress further than pure prose.

Live Demo

Visit nodemind.space to:

Sign in with Google (one click — no password, no email round-trip)
Upload any PDF, TXT, or Markdown file (10 MB per file, 50 MB lifetime per account)
Watch it index on community hardware — typical 5,500-page PDF indexes in ~7 minutes on an RTX 3080
Download your NodeMind binary index + RAG float32 index
Query both side-by-side and compare speed, size, and results

Honest Caveats

HNSW index size includes the float32 vectors. Standard FAISS HNSW stores raw float32 with a graph on top — that is why HNSW indexes are ~1.5× the float32 size. NodeMind's binary index does not store float32 at all; you can throw the vectors away after indexing.
Float32 cosine still wins on raw recall at small corpus sizes. Exact cosine search is the best retrieval if the entire float32 index fits in your RAM and cost is not a concern. NodeMind's value is storage and RAM at scale: at the 500K-chunk Real-world benchmark NodeMind matches float32 cosine at Recall@5 = Recall@10 = 1.000 at 32×–96× less storage. On the smaller per-dataset BEIR-Combined slices (3K–58K chunks) NodeMind closes the gap as corpus size grows and beats FAISS Fixed Binary at every same-size cell.

Patents

Patent	Number	Status	Covers
NodeMind Codec & Index	AU 2026904283	Provisional	Patent-pending binarisation method + portable single-file binary fingerprint index format

Filed at IP Australia. Inventor: Sai Kiran Bathula, independent researcher, Coleambally NSW, Australia.

What's Next — HiveMind

We did the hard math on document compression. NodeMind is shipped. The next big bet is HiveMind — a public AI reasoning network where humans and agents leave compressed reasoning traces, register watches on ideas, surface contradictions, and connect tools through shared memory. Funded by NodeMind revenue — every NodeMind customer helps us build the next layer.

Happy to work with small startups on tight budgets. NodeMind makes vector-database-class retrieval affordable at any scale.

🌐 Concept site: nodemind.space/hivemind 🐦 Updates: @QLNI_AI 📧 Licensing & enterprise: saikiranbathula1@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
benchmarks		benchmarks
README.md		README.md
benchmark.html		benchmark.html
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NodeMind — Binary Document Intelligence

What this repo is

What to check first

The Problem with RAG at Scale

NodeMind's Approach

Performance — Head-to-Head

Modalities

Benchmarks

What these three benchmarks say, in plain English

Why NodeMind's compression is in a different class than FAISS

How It Works

Stage 1 — BGE-M3 Embeddings

Stage 2 — Proprietary Binary Codec

Stage 3 — Multi-Index Hashing (MIH) → ~75× faster retrieval

Why 48× and not 96× or 128×?

Live Demo

Honest Caveats

Patents

What's Next — HiveMind

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NodeMind — Binary Document Intelligence

What this repo is

What to check first

The Problem with RAG at Scale

NodeMind's Approach

Performance — Head-to-Head

Modalities

Benchmarks

What these three benchmarks say, in plain English

Why NodeMind's compression is in a different class than FAISS

How It Works

Stage 1 — BGE-M3 Embeddings

Stage 2 — Proprietary Binary Codec

Stage 3 — Multi-Index Hashing (MIH) → ~75× faster retrieval

Why 48× and not 96× or 128×?

Live Demo

Honest Caveats

Patents

What's Next — HiveMind

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages