Skip to content

adanSiqueira/vector-fraud-detection-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ‡ΊπŸ‡Έ English | πŸ‡§πŸ‡· PortuguΓͺs

Rinha de Backend 2026 β€” Fraud Detection API - Python

Submission for Rinha de Backend 2026: a fraud detection API using approximate nearest-neighbour vector search over 3 million labelled transactions.


Why Python?

This project was intentionally built in Python, even knowing that the competition is heavily dominated by lower-level languages such as Rust, Go and C, which naturally have advantages in raw throughput and latency.

The goal here is not necessarily to outperform those ecosystems, but to explore how far Python can be pushed when every millisecond matters.

The project combines low-overhead architectural decisions, Rust-powered dependencies, memory-conscious optimizations, vector-search compression techniques and careful hot-path tuning to make Python as performant as possible within the competition constraints.


The real challenge: memory, not CPU

At first, I tried to use hnswlib with an HNSW index.

The problem was discovered only after multiple local stress tests and full Docker orchestration runs: the memory footprint was fundamentally incompatible with the competition limits.

The competition rules require:

  • at least 2 API instances
  • 1 load balancer
  • maximum total budget of:
    • 1 CPU
    • 350 MB RAM

The original HNSW approach became impossible because HNSW stores the entire graph structure in memory for every vector.

With:

  • 3 million vectors
  • 14 dimensions
  • HNSW M=8

the index consumed approximately:

~721 MB RAM

No amount of parameter tuning could make two API containers coexist inside the allowed memory budget.

The bottleneck was not Python itself.

It was the memory model of the vector index.


The solution: Faiss IVF + SQ8

Faiss (Facebook AI Similarity Search) is a high-performance vector similarity search library developed by Meta AI and widely used in recommendation systems, semantic search, embeddings retrieval and large-scale nearest-neighbour workloads.

For this project, the chosen index type was:

faiss.IndexIVFScalarQuantizer(...)

which combines two complementary techniques:

  • IVF (Inverted File Index)
  • SQ8 (8-bit Scalar Quantization)

The reason for this choice was memory efficiency.

While HNSW provides excellent recall and latency, its graph-based structure becomes extremely memory-intensive at large scale because the full graph connectivity must remain resident in RAM.

Faiss IVF+SQ8 trades a small amount of recall for a massive reduction in memory usage while still keeping sub-millisecond query times.

This made it possible to fit 3 million vectors inside the competition's strict container memory limits.

The architecture was redesigned around two complementary Faiss techniques:

1. IVF β€” Inverted File Index

The dataset is partitioned into 1000 clusters during build time.

At query time, only 10 clusters are searched (nprobe=10), reducing the search space to roughly 1% of the full dataset.

Instead of:

O(N)

queries become approximately:

O(N / nlist * nprobe)

This dramatically reduces search cost while maintaining good recall.


2. SQ8 β€” Scalar Quantization (8-bit)

Each float32 dimension is compressed:

float32 (4 bytes) β†’ uint8 (1 byte)

This provides:

  • ~4Γ— memory reduction
  • minimal accuracy degradation
  • measured ~97.4% recall compared to IVFFlat

Because the vectors are already normalized into [0,1], scalar quantization works extremely well for this dataset.


Memory impact

The migration from HNSW to Faiss IVF+SQ8 completely changed the feasibility of the architecture.

Component RAM
Faiss IVF+SQ8 index ~64 MB
labels.npy ~3 MB
Python + granian ~50 MB
Total per container ~117 MB

Compared to the original HNSW approach:

Index type Approx RAM
hnswlib HNSW ~721 MB
Faiss IVF+SQ8 ~64 MB

This reduction made the competition constraints achievable.


Architecture

Client
  β”‚  POST /fraud-score  (port 9999)
  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ nginx 1.27-alpine                β”‚
β”‚ Round-robin load balancer        β”‚
β”‚ 0.20 CPU Β· 30 MB                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚ round-robin
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”
     β–Ό               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ api1    β”‚     β”‚ api2    β”‚
β”‚ granian β”‚     β”‚ granian β”‚
β”‚ 1 workerβ”‚     β”‚ 1 workerβ”‚
β”‚ 0.50CPU β”‚     β”‚ 0.40CPU β”‚
β”‚ 280 MB  β”‚     β”‚ 160 MB  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Faiss IVF+SQ8 index pre-built during docker build

Total declared resources: 1.00 CPU Β· 350 MB RAM

Fully compliant with the competition rules.


Startup orchestration problem

After solving the memory footprint problem, another issue appeared:

api1 started correctly, but api2 frequently crashed while loading the same index simultaneously.

The root cause was memory pressure during concurrent index loading.

Even though the steady-state RAM usage fit comfortably inside the limit, Docker briefly experienced a memory spike while both containers loaded their own copy of the index at the same time.

The operating system still had the first container's index pages warm in memory while the second container started reading the same file.


The fix: delayed startup sequencing

The solution was intentionally simple and deterministic:

api2 waits before starting.

command:
  [
    "sh",
    "-c",
    "sleep 15 && python -m granian ..."
  ]

This allows:

  • api1 to fully initialize
  • the OS page cache to stabilize
  • I/O pressure to drop
  • memory spikes to disappear

After introducing startup sequencing, both containers became stable simultaneously under the competition limits.


Performance decisions

1. Starlette instead of FastAPI

FastAPI adds dependency injection and validation layers that cost measurable latency per request.

Using raw Starlette removes that overhead while preserving ASGI ergonomics.


2. orjson instead of stdlib json

orjson is Rust-based and significantly faster than Python's standard JSON implementation for both serialization and deserialization.


3. No Pydantic

The competition guarantees valid payloads.

Avoiding schema validation removes unnecessary allocations and CPU overhead.

All fields are accessed directly from raw dictionaries.


4. Thread-local preallocated NumPy buffers

Each worker allocates a single:

(1, 14) float32

buffer and reuses it for every request.

Benefits:

  • zero per-request NumPy allocations
  • cache-friendly contiguous memory
  • exact layout expected by Faiss

5. granian instead of uvicorn

granian uses a Rust runtime and consistently performs better in throughput and p99 latency than pure-Python ASGI servers.


6. Build-time index generation

The Faiss index is built during Docker build time.

Containers only perform:

faiss.read_index(...)

during startup.

Benefits:

  • no runtime training cost
  • deterministic startup
  • avoids massive transient RAM spikes

7. Multi-stage Docker image

The runtime image contains:

  • no compiler
  • no build-essential
  • no pip install step

Only pre-built Python packages and the generated Faiss index are copied from the builder stage.

This reduces:

  • image size
  • startup complexity
  • runtime dependencies

8. nginx tuned for low latency

The load balancer is configured as a pure pass-through proxy with:

  • upstream keepalive
  • tcp_nodelay
  • disabled buffering
  • disabled access logs

The rules explicitly forbid business logic in the load balancer.


Stack

Component Choice Why
ASGI server granian 2.7.4 Rust runtime, lower p99 latency
Web framework Starlette 0.41.3 Minimal overhead
JSON orjson 3.11.9 Rust-based JSON
Vector search Faiss IVF+SQ8 Massive RAM reduction
Numerics numpy 2.4.4 Vectorized math
Load balancer nginx 1.27-alpine Lightweight reverse proxy
Containerization Docker Compose Competition orchestration

File tree

.
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py
β”‚   └── requirements.txt
β”œβ”€β”€ scripts/
β”‚   └── build_index.py
β”œβ”€β”€ resources/
β”‚   └── references.json.gz
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ nginx.conf
β”œβ”€β”€ test_vectorize.py
β”œβ”€β”€ README.md
β”œβ”€β”€ README.pt-BR.md
└── .gitignore

Endpoints

Method Path Description
GET /ready Health check
POST /fraud-score Returns fraud decision and score

About

High-performance fraud detection API using Python, Faiss IVF+SQ8 and vector similarity search. Rinha de Backend 2026 submission.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors