[feat-024] RAG chat over tabular-review docs

↗ view on GitHub · Nick Whitehouse · 2026-05-07 · a187e3a0

Embeds every doc on upload via a new embed_document job type on the
bug-007 worker pool, stores chunks + embeddings in document_chunks
(pgvector, HNSW), and injects top-K passages into the TR chat system
prompt before the LLM call.

- migrations 007 (vector ext, table, RPC, RLS) + 008 (job_type column)
- text-embedding-3-small (1536 dim) via direct fetch with batched retry
- 800-tok chunks / 150-tok overlap, page-aware via "## Page N" markers
- POST /single-documents/embed-backfill, gated by ENABLE_EMBED_BACKFILL
- TR chat falls back to cell-only context when OPENAI_API_KEY unset

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Repository nwhitehouse/mike
Author Nick Whitehouse <nick.whitehouse@mccarthyfinch.com>
Authored
Parents ada3fc90
Stats 9 files changed , +839 , -3
Part of RAG chat over tabular-review docs (pgvector embeddings)

Capture this commit into my fork

Download a Markdown prompt that tells Claude how to port this exact commit into your working tree. Run it via claude -p < capture-commit-a187e3a0.md from inside the repo you want the change in.

⬇ Download capture-commit-a187e3a0.md