docs: add Phase 13 (Vector RAG with pgvector) design
The foundational retrieval upgrade. Replaces the current LLM-driven
in-context document scan with chunked semantic retrieval over
pgvector with hybrid search.
Data model:
- pgvector extension on Postgres 16 (image swap to
pgvector/pgvector:pg16).
- document_chunks with structure-aware metadata: page_number for
PDF, heading_path text[] for DOCX, char_start/char_end for
highlight overlays, and a stored tsvector generated column for
the full-text arm.
- document_embeddings keyed one-to-one with chunks, single active
embedding model at a time, vector(1024) column type pinned at
migration time. Dimension swap via templated paired migration
plus full reindex.
- rag_ingest_jobs as a simple Postgres-backed queue, consumed by
an in-process worker (single backend replica assumption).
- Eight new org_settings columns covering provider, model, chunk
shape, top-K per arm, and final top-N.
Embedding model:
- bge-m3 via Ollama as the default (1024 dim, multilingual,
CPU-capable). nomic-embed-text as a lighter alternative.
- HF TEI as a dedicated-service option. OpenAI text-embedding-3
family as opt-in, gated by EXTERNAL_AI_DISABLED.
Chunking:
- Token-based (~512 tokens, 64 overlap) with paragraph/sentence/
word boundary preference and heading-forced boundaries for DOCX.
- cl100k_base tokenizer for reproducibility across models.
- Per-chunk metadata: document, version, index, page, heading
path, character offsets.
Retrieval:
- Hybrid: vector cosine via HNSW (m=16, ef_construction=64) plus
ts_rank_cd full-text, merged with Reciprocal Rank Fusion (k=60).
- ACL filter computed up front from Phase 11 effective
permissions; RLS as defence-in-depth.
- search_documents tool refactored to call the new retriever;
same external shape so the model side is unchanged. Tool output
includes chunk_id, document_name, page, heading_path, excerpt,
score.
Worker:
- One in-process worker, SELECT FOR UPDATE SKIP LOCKED on
rag_ingest_jobs, retry cap of 3, batched embedding calls
(default batch 64).
- Hooks: document upload, version creation, admin "reindex"
endpoint.
Frontend:
- Composer scope chip ("Searching: project X (47 documents)")
with a scope-edit modal.
- search_documents tool-call card renders the hit list with
links that jump to document viewer with the chunk
highlighted.
- cite-button hover preview of the chunk excerpt.
- Admin AI Policy gets a Retrieval section with provider/model
selection, chunk/top-K knobs, queue and index stats, and
guarded "Reindex" / "Clear and reindex" actions.
Rollout in six steps gated by a new rag_enabled org switch;
rollback at any step flips the switch back to keep the legacy
in-context tool. Compose image swap to pgvector/pgvector:pg16
documented in the operator deployment guide.
Risks captured for: dimension mismatch, HNSW build time, worker
stalls on malformed docs, chunk explosion on multi-thousand-page
PDFs, permission bypass through retrieval, stale chunks after
version change, OpenAI leak under EXTERNAL_AI_DISABLED, postgres
image change for operators.
Open questions parked: tokenizer choice, heading-aware tuning,
RRF weighting tuning UI, per-document index opt-out, cross-encoder
reranker, workspace-wide retrieval (deferred to Phase 14).
| Repository | cpatpa/PIP |
|---|---|
| Author | Claude <noreply@anthropic.com> |
| Authored | |
| Parents | 876049f9 |
| Stats | 1 file changed , +802 |
| Part of | Phases 10-14 - design docs for web search, groups, multi-model, vector RAG, knowledge collections |
Capture this commit into my fork
Download a Markdown prompt that tells Claude how to port this
exact commit into your working tree. Run it via
claude -p < capture-commit-56917d4b.md
from inside the repo you want the change in.