Davemaina1 squeezes legal search into 320MB of RAM

A scrappy re-architecture that runs an 86,000-chunk Kenyan law search engine on a free-tier server - and points at a pattern worth stealing.

searchinfrastructure

Most legal search tools assume you can throw a managed vector database at the problem. Davemaina1 went the other way. The fork splits search into a small Python helper service, then methodically strips it down to fit inside 512MB of memory - the ceiling on Render's free hosting tier.

The heart of the trick: the entire corpus of legal chunks is pre-compressed and parked on cheap object storage. On startup the service pulls down about 50MB, memory-maps it so the data doesn't actually sit in RAM, and answers each query by doing the math directly against the file. No vector database, no reranker, no extra moving parts. Peak memory measured at 320MB, queries around 180 milliseconds.

So what Anyone building a legal-tech product on a tight budget should look at this - it's a credible blueprint for document search that doesn't need a managed vector database underneath it.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

5 commits from Davemaina1/iroh_, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date
c0684a41 RAG sidecar: Python service for embedding+BM25+rerank; Node becomes HTTP client Davemaina1 2026-05-13 ↗ GitHub
bed9cc2f fix(rag): pin Python 3.11, relax torch/numpy version constraints Davemaina1 2026-05-14 ↗ GitHub
Render defaults to Python 3.14 which doesn't have torch 2.5.1 wheels.
Pin to 3.11 via .python-version and allow any torch 2.x.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7a0da671 feat(rag): replace torch with onnxruntime to fit 512MB RAM Davemaina1 2026-05-14 ↗ GitHub
commit body
Eliminates torch (1.5GB) and sentence-transformers entirely. Uses
onnxruntime + tokenizers + huggingface-hub to run the same all-MiniLM-L6-v2
model via its ONNX export (~90MB). Drops CrossEncoder reranker - RRF fusion
alone is sufficient for the testing phase. Estimated memory: ~340MB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0b5251e8 feat(rag): eliminate ChromaDB dependency, load corpus from Supabase Storage Davemaina1 2026-05-14 ↗ GitHub
commit body
Removes both torch/sentence-transformers AND ChromaDB from production.
Corpus (86K chunks, embeddings, metadata) is pre-exported to .npz files
hosted on Supabase Storage (public bucket). On startup, the service
downloads ~50MB, dequantizes int8 embeddings, and builds a BM25 index.
Semantic search is brute-force numpy dot product (~180ms/query for 86K vectors).

Total runtime memory: ~350MB (fits in Render's 512MB free tier).
Zero additional services required.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3ec32238 feat(rag): mmap embeddings + on-demand metadata - peak 320MB RSS Davemaina1 2026-05-14 ↗ GitHub
commit body
Downloads corpus files to /tmp, memory-maps the embeddings (zero RAM cost),
and reads metadata only for the top-K results on each query. Drops BM25
(150MB overhead) - semantic-only search is good enough for testing phase.

Removes chromadb and rank-bm25 dependencies entirely.
Measured peak RSS: 320MB (well within Render's 512MB free tier).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-380.md from inside the repo you want the changes in.

⬇ Download capture-thread-380.md