> βββββββ βββ ββββββ βββ ββββββββββββββββ
> βββββββββββ ββββββ βββ ββββββββββββββββ
> βββββββββββ ββββββ βββ ββββββ βββ
> βββββββββββ ββββββ βββ ββββββ βββ
> ββββββββββββββββββββββββ βββββββ βββββββ βββ
> βββββββ βββββββ βββββββ βββββββ βββββββ βββ
on Rails - Rinha de Backend 2026
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β bulletonrails-ruby - Rinha de Backend 2026 β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β Fraud detection API using IVF approximate KNN - Ruby/Roda β
β Roda + Iodine + Numo + FAISS Β· 1 CPU / 350 MB Β· 2 instances β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 01 Β· What is this |
β 02 Β· How it works β
β 03 Β· Tech stack β
β 04 Β· Architecture β
β 05 Β· Quick start β
β 06 Β· Validation β
β 07 Β· Benchmark results β
β 08 Β· Submission β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Submission for Rinha de Backend 2026 - a competition to build a fraud detection API under extreme resource constraints.
The challenge: build an API that receives a card transaction and decides - in real time - whether it is fraudulent, using vector similarity search against 100k labeled reference transactions.
This submission proves that Ruby can compete in performance-sensitive scenarios when you pick the right tools. No Rails. No ActiveRecord.
No framework overhead. Just Roda, Iodine, Numo::NArray for vectorized math, and IVF index [ preview: hnswlib for O(log N) ] approximate nearest neighbor search.
POST /fraud-score
β
βΌ
VectorNormalizer
βββββββββββββββββ
Transform 14 fields from the payload into a normalized
Numo::SFloat vector using formulas from the spec.
Sentinel -1 for absent last_transaction.
β
βΌ
KnnSearcher (IVF)
βββββββββββββββββ
Approximate KNN via FAISS IndexIVFFlat (nlist=64, nprobe=16).
Sequential cluster scan β SIMD-friendly, 0 false negatives.
index.freeze β no_gvl path β GVL released during C search.
β
βΌ
FraudScorer
βββββββββββββββββ
fraud_score = fraud_neighbors / 5
approved = fraud_score < 0.6
β
βΌ
{ "approved": bool, "fraud_score": float }
| idx | field | formula |
|---|---|---|
| 0 | amount |
clamp(amount / 10_000) |
| 1 | installments |
clamp(installments / 12) |
| 2 | amount_vs_avg |
clamp((amount / avg_amount) / 10) |
| 3 | hour_of_day |
utc_hour / 23 |
| 4 | day_of_week |
(wday + 6) % 7 / 6 - Mon=0, Sun=6 |
| 5 | minutes_since_last_tx |
clamp(minutes / 1440) or -1 if null |
| 6 | km_from_last_tx |
clamp(km / 1000) or -1 if null |
| 7 | km_from_home |
clamp(km_from_home / 1000) |
| 8 | tx_count_24h |
clamp(tx_count / 20) |
| 9 | is_online |
1.0 or 0.0 |
| 10 | card_present |
1.0 or 0.0 |
| 11 | unknown_merchant |
0.0 if known, 1.0 if not |
| 12 | mcc_risk |
lookup from mcc_risk.json (default 0.5) |
| 13 | merchant_avg_amount |
clamp(merchant_avg / 10_000) |
The reference dataset (100k vectors) is loaded once at startup as a Numo::SFloat[100_000, 14] matrix and an IVF index is built from it. Both live in C heap - never touched by the Ruby GC on the hot path.
ββββββββββββββββββββββββ¦βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER β CHOICE β
β βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β Language β Ruby 3.4 β
β HTTP framework β Roda 3.103 - tree routing, ~0.1ms overhead β
β HTTP server β Iodine 0.7.58 - facil.io epoll, 1w x 4t β
β KNN search β FAISS 0.6.0 - IVF nlist=64 nprobe=16, no_gvl β
β Numeric core β numo-narray-alt 0.10 - C++-compat fork, Float32 β
β JSON β Oj 3.17 - 3-5x faster than stdlib JSON β
β Load balancer β haproxy 2.9-alpine - HTTP mode, httpchk /ready β
β β β
β Container β Docker Compose - bridge network β
β β linux/amd64 + linux/arm64 β
ββββββββββββββββββββββββ©βββββββββββββββββββββββββββββββββββββββββββββββββββββ
Why not Rails? Rails consumes 150-200 MB per instance. With 2 instances + haproxy, that exceeds the 350 MB total budget. Roda runs in ~60 MB per instance and adds zero overhead for 2 static endpoints.
Why Iodine instead of Puma? Iodine is built on facil.io, a C event loop using epoll. It handles accept/read/write asynchronously, overlapping I/O with computation. At 650 req/s, this reduces per-request overhead vs Puma's threaded model: p99 dropped from 5.87ms to 4.62ms.
Why FAISS IVF instead of HNSW?
HNSW has a structural floor: graph traversal with random memory access costs ~2.5ms
(~17 node visits, constant cache misses). IVF (Inverted File Index) replaces graph traversal
with sequential cluster scan: quantize the query to the nearest centroid, then scan only
1/nlist of the vectors in sequential memory - SIMD-friendly and cache-coherent.
Result: single-query p99 drops from ~2.5ms to ~341 Β΅s (7x) with FP=0 FN=0 at nlist=64 nprobe=16.
The FAISS gem releases the GVL when the index is frozen (Rice no_gvl path), enabling
true thread parallelism identical to the previous hnswlib behavior.
:9999
k6 / test engine ββββ haproxy (LB)
β round-robin
ββββββββββββββ΄βββββββββββββ
βΌ βΌ
[ api 1 ] [ api 2 ]
Roda + Iodine Roda + Iodine
1 worker, 4 threads 1 worker, 4 threads
β β
Numo::SFloat[100k,14] Numo::SFloat[100k,14]
IVF index (C heap) IVF index (C heap)
LABELS[100k] LABELS[100k]
(async background load) (async background load)
ββββββββββββββββ¦βββββββββββ¦βββββββββββββ
β Service β CPUs β Memory β
β βββββββββββββββ¬βββββββββββ¬βββββββββββββ£
β haproxy β 0.10 β 20 MB β
β api1 β 0.45 β 160 MB β
β api2 β 0.45 β 160 MB β
β βββββββββββββββ¬βββββββββββ¬βββββββββββββ£
β TOTAL β 1.00 β 340 MB β
ββββββββββββββββ©βββββββββββ©βββββββββββββ
Limit: 1 CPU / 350 MB. Used: 1.00 CPU / 340 MB.
βΆ see details (click to expand)
# Clone
git clone https://github.com/bulletdev/bulletonrails-ruby
cd bulletonrails-ruby
# Build and run (HNSW index builds at startup, ~60s)
docker compose up --build -d
# Wait for ready (port 9999 opens only after index build completes)
until curl -sf http://localhost:9999/ready; do sleep 3; done && echo "ready"
# Test a legitimate transaction
curl -s -X POST http://localhost:9999/fraud-score \
-H 'Content-Type: application/json' \
-d '{
"id": "tx-1329056812",
"transaction": { "amount": 41.12, "installments": 2, "requested_at": "2026-03-11T18:45:53Z" },
"customer": { "avg_amount": 82.24, "tx_count_24h": 3, "known_merchants": ["MERC-003", "MERC-016"] },
"merchant": { "id": "MERC-016", "mcc": "5411", "avg_amount": 60.25 },
"terminal": { "is_online": false, "card_present": true, "km_from_home": 29.23 },
"last_transaction": null
}'
# Expected: {"approved":true,"fraud_score":0.0}
# Test a fraudulent transaction
curl -s -X POST http://localhost:9999/fraud-score \
-H 'Content-Type: application/json' \
-d '{
"id": "tx-3330991687",
"transaction": { "amount": 9505.97, "installments": 10, "requested_at": "2026-03-14T05:15:12Z" },
"customer": { "avg_amount": 81.28, "tx_count_24h": 20, "known_merchants": ["MERC-008", "MERC-007", "MERC-005"] },
"merchant": { "id": "MERC-068", "mcc": "7802", "avg_amount": 54.86 },
"terminal": { "is_online": false, "card_present": true, "km_from_home": 952.27 },
"last_transaction": null
}'
# Expected: {"approved":false,"fraud_score":1.0}Validate vectors against the spec via the live endpoint (the container runs at ~132 MB; spawning
a second Ruby process with docker compose exec would exceed the 160 MB limit):
βΆ see details (click to expand)
# legit - expected: {"approved":true,"fraud_score":0.0}
curl -s -X POST http://localhost:9999/fraud-score \
-H 'Content-Type: application/json' \
-d '{
"id": "tx-1329056812",
"transaction": { "amount": 41.12, "installments": 2, "requested_at": "2026-03-11T18:45:53Z" },
"customer": { "avg_amount": 82.24, "tx_count_24h": 3, "known_merchants": ["MERC-003", "MERC-016"] },
"merchant": { "id": "MERC-016", "mcc": "5411", "avg_amount": 60.25 },
"terminal": { "is_online": false, "card_present": true, "km_from_home": 29.23 },
"last_transaction": null
}'
# fraud - expected: {"approved":false,"fraud_score":1.0}
curl -s -X POST http://localhost:9999/fraud-score \
-H 'Content-Type: application/json' \
-d '{
"id": "tx-3330991687",
"transaction": { "amount": 9505.97, "installments": 10, "requested_at": "2026-03-14T05:15:12Z" },
"customer": { "avg_amount": 81.28, "tx_count_24h": 20, "known_merchants": ["MERC-008", "MERC-007", "MERC-005"] },
"merchant": { "id": "MERC-068", "mcc": "7802", "avg_amount": 54.86 },
"terminal": { "is_online": false, "card_present": true, "km_from_home": 952.27 },
"last_transaction": null
}'Expected output:
{"approved":true,"fraud_score":0.0}
{"approved":false,"fraud_score":1.0}
The scripts/validate.rb script exists for offline use (e.g. in a container with extra memory
headroom). Expected output when run outside resource constraints:
Dataset loaded: 100000 vectors
--- legit tx-1329056812 ---
vector: OK [0.0041, 0.1667, 0.05, 0.7826, 0.3333, -1.0, -1.0, 0.0292, 0.15, 0.0, 1.0, 0.0, 0.15, 0.006]
result: OK approved=true, fraud_score=0.0
--- fraud tx-3330991687 ---
vector: OK [0.9506, 0.8333, 1.0, 0.2174, 0.8333, -1.0, -1.0, 0.9523, 1.0, 0.0, 1.0, 1.0, 0.75, 0.0055]
result: OK approved=false, fraud_score=1.0
========================================
ALL VALIDATIONS PASSED
Score formula: final = score_p99 + score_det
score_p99 = max(-3000, min(3000, 1000 * log10(1000ms / p99)))
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EVOLUTION β
β βββββββββββ¦ββββββββββββ¦βββββββββββββββ¦ββββββββββββββββ¦ββββββββββββββββββββββββ£
β Run β Server β p99 β p99_score β final_score β
β βββββββββββ¬ββββββββββββ¬βββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββββββββββ£
β 1 β Puma β OOM kill β - β -705.91 β
β 2 β Puma β 14600ms β -3000 (cut) β -1327.89 β
β 3 β Puma β 12704ms β -3000 (cut) β -1335.93 β
β 4 HNSW β Puma β 5.87ms β +2231.50 β +4977.97 β
β 5 HNSW β Iodine β 4.62ms β +2335.67 β +5082.14 β
β 6 ef200 β Iodine β 5.43ms β +2264.83 β +5264.83 β
β 7 stableβ Iodine β 5.54ms β +2256.75 β +5256.75 β
β 8 gc β Iodine β 6.34ms β +2198.08 β +5198.08 β
β 9 yjit β Iodine β ~3.7ms β +2427 β +5427 β
β 10 resp β Iodine β ~3.7ms β ~2427 β ~5427 β
β 11 faiss β Iodine β ~1.5ms est β ~2825 est β ~5825 est (current) β
ββββββββββββ©ββββββββββββ©βββββββββββββββ©ββββββββββββββββ©ββββββββββββββββββββββββ
Best benchmark - Run 11 (FAISS IVF nlist=64 nprobe=16 - estimated)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β IVF search p99 (spike) 341 Β΅s (HNSW: ~2500 Β΅s) β
β p99 total (estimate) ~1.5 ms β
β p99_score (estimate) ~2825 (max 3000) β
β detection_score 3000.00 (max 3000) PERFECT β
β final_score (estimate) ~5825 / 6000 max (97.1%) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β IVF nlist=64 nprobe=16 vs exact search: β
β false_positives 0 β
β false_negatives 0 β
β GVL released yes (index.freeze β no_gvl) β
β RSS (spike proc) ~100 MB / 160 MB limit β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Run 9 (last official run):
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β p99 (best run) 3.27 ms β
β p99 (median 10 runs) 3.78 ms β
β p99_score (median) ~2427 (max 3000) β
β detection_score 3000.00 (max 3000) PERFECT β
β final_score (best) 5485.57 / 6000 max (91.4%) β
β final_score (median) ~5427 / 6000 max (90.5%) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Run 8 changes - GC compaction
config.ru: addedGC.compactafterDatasetLoader.load!; compacts the Ruby heap after the 100k-record JSON parse before Iodine starts threads; eliminates GC pressure spike that caused p99=20ms regression under peak load on api2- Serving RSS: api1=134MB, api2=136MB, both well within 160MB limit
Run 9 changes - YJIT + hot-path allocation reduction
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β p99 improvement ~40% vs baseline β
β score improvement +229 pts (median) vs Run 8 β
β detection still perfect (0 FP, 0 FN) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Dockerfile:--yjit --yjit-exec-mem-size=8- YJIT enabled with 8 MB code cache; default 48 MB was competing with GC for the 26 MB headroom under the 160 MB limit, causing GC pressure spikes. 8 MB is enough to JIT the hot paths (VectorNormalizer, Roda routing, FraudScorer) without bloating RSS.Dockerfile:ENV MALLOC_ARENA_MAX=2- limits glibc malloc arenas, reduces allocator fragmentation under multi-threaded load.DatasetLoader: labels stored as integers (1 = fraud, 0 = legit) instead of strings; eliminates per-request string comparison and block overhead in FraudScorer.VectorNormalizer:NORM['...']hash lookups extracted to frozen Float constants (MAX_AMOUNT,MAX_KM, etc.); eliminates 9 Hash#[] calls per request on the hot path.
Run 10 changes - pre-mounted HTTP responses
FraudScorer:RESPONSESarray built at startup withK+1pre-serialized JSON strings; sincefraud_score = fraud_count / Khas exactlyK+1possible values, every response is known at boot time. Eliminates Hash allocation + Oj serialization on every request. Built dynamically fromKandTHRESHOLD- safe if the spec changes values. Gain is below p99 noise floor on this setup (~sub-Β΅s per request); included for architectural correctness (same pattern used by the top C implementation).
Run 11 changes - FAISS IVF replaces hnswlib HNSW
KnnSearcher: hnswlib HNSW (ef=200, m=16, ~2.5ms floor) replaced with FAISSIndexIVFFlat(nlist=64, nprobe=16). IVF clusters the 100k vectors into 64 groups; each query scans the 16 closest clusters sequentially (~25k vectors). Sequential memory access is SIMD-friendly and avoids the random cache-miss pattern of graph traversal.DatasetLoader: quantizer stored as@quantizerivar -IndexIVFFlatholds a non-owning C pointer to the quantizer; without the ivar, Ruby GC would collect it and cause a segfault. Callingindex.freezeenables Rice'sno_gvlpath, releasing the GVL during search (same behavior as hnswlib).Gemfile:hnswlib+numo-narrayβfaiss+numo-narray-alt(C++-compatible fork required by Rice/FAISS binding; sameNumo::SFloatAPI, no changes to VectorNormalizer).Dockerfile: builder addslibblas-dev liblapack-dev cmake libgomp1; runtime addslibblas3 liblapack3 libgomp1. Systemlibfaiss-devis NOT installed - its headers conflict with the gem's bundled FAISS source (vendor/faiss/); gem compiles from bundled source instead.- Spike results: nlist=64 nprobe=16 gives FP=0 FN=0 vs exact IndexFlatL2 search across all 100k training vectors. Single-query p99: 341 Β΅s.
Optimization path
Brute-force Numo KNN β p99 12-14s, score -1335
+ BLAS identity trick β alloc 11MB β 800KB per request
+ HNSW O(log N) ef=50 β p99 5.87ms, score +4977 (breakthrough)
+ Iodine epoll 4t β p99 4.62ms, score +5082 (+104 pts)
+ HNSW ef=200 β detect 3000/3000, score +5264 (+182 pts)
+ alloc reduction R7 β Sakamoto DOW, NIL_DIMS const, removed dead @norms_sq
+ GC.compact R8 β api2 RSS -19MB, eliminates GC spike under load
+ YJIT exec-mem=8 R9 β p99 ~3.7ms stable, score ~5427 avg (+229 pts)
+ pre-mounted responses R10β eliminates Hash + Oj per request; gain within noise floor
+ FAISS IVF nlist=64 R11 β IVF p99 341Β΅s (7x vs HNSW), est final ~5825 (+398 pts)
The dominant gain came from HNSW: O(N)=100k comparisons β O(log N)β17 node visits.
Raising ef from 50 to 200 pushed detection accuracy to perfect (0 FP, 0 FN).
Run 7 reduces per-request GVL hold time via Sakamoto DOW (no Time allocation on null last_tx path). Run 8 adds GC.compact after dataset load.
Run 9 enables YJIT with a constrained 8 MB code cache - the default 48 MB caused GC pressure by eating into the 26 MB headroom between serving RSS (~134 MB) and the container limit (160 MB). With exec-mem=8, YJIT JITs only the hot paths and stabilizes at p99 ~3.7ms across 9/10 benchmark runs.
Run 11 breaks through the HNSW structural floor by replacing graph traversal with sequential IVF cluster scan:
341 Β΅s IVF search p99 vs ~2500 Β΅s HNSW (7x improvement).
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GitHub user: bulletdev β
β Repo: bulletonrails-ruby β
β Submission ID: bulletdev-ruby β
β branch main: source code β
β branch submission: docker-compose.yml at root β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
To trigger the official test: open an issue with rinha/test in the description.
βββ Β· Ruby is fast enough Β· βββ