cpatpa/PIP@56917d4b

The foundational retrieval upgrade. Replaces the current LLM-driven
in-context document scan with chunked semantic retrieval over
pgvector with hybrid search.

Data model:

- pgvector extension on Postgres 16 (image swap to
  pgvector/pgvector:pg16).
- document_chunks with structure-aware metadata: page_number for
  PDF, heading_path text[] for DOCX, char_start/char_end for
  highlight overlays, and a stored tsvector generated column for
  the full-text arm.
- document_embeddings keyed one-to-one with chunks, single active
  embedding model at a time, vector(1024) column type pinned at
  migration time. Dimension swap via templated paired migration
  plus full reindex.
- rag_ingest_jobs as a simple Postgres-backed queue, consumed by
  an in-process worker (single backend replica assumption).
- Eight new org_settings columns covering provider, model, chunk
  shape, top-K per arm, and final top-N.

Embedding model:

- bge-m3 via Ollama as the default (1024 dim, multilingual,
  CPU-capable). nomic-embed-text as a lighter alternative.
- HF TEI as a dedicated-service option. OpenAI text-embedding-3
  family as opt-in, gated by EXTERNAL_AI_DISABLED.

Chunking:

- Token-based (~512 tokens, 64 overlap) with paragraph/sentence/
  word boundary preference and heading-forced boundaries for DOCX.
- cl100k_base tokenizer for reproducibility across models.
- Per-chunk metadata: document, version, index, page, heading
  path, character offsets.

Retrieval:

- Hybrid: vector cosine via HNSW (m=16, ef_construction=64) plus
  ts_rank_cd full-text, merged with Reciprocal Rank Fusion (k=60).
- ACL filter computed up front from Phase 11 effective
  permissions; RLS as defence-in-depth.
- search_documents tool refactored to call the new retriever;
  same external shape so the model side is unchanged. Tool output
  includes chunk_id, document_name, page, heading_path, excerpt,
  score.

Worker:

- One in-process worker, SELECT FOR UPDATE SKIP LOCKED on
  rag_ingest_jobs, retry cap of 3, batched embedding calls
  (default batch 64).
- Hooks: document upload, version creation, admin "reindex"
  endpoint.

Frontend:

- Composer scope chip ("Searching: project X (47 documents)")
  with a scope-edit modal.
- search_documents tool-call card renders the hit list with
  links that jump to document viewer with the chunk
  highlighted.
- cite-button hover preview of the chunk excerpt.
- Admin AI Policy gets a Retrieval section with provider/model
  selection, chunk/top-K knobs, queue and index stats, and
  guarded "Reindex" / "Clear and reindex" actions.

Rollout in six steps gated by a new rag_enabled org switch;
rollback at any step flips the switch back to keep the legacy
in-context tool. Compose image swap to pgvector/pgvector:pg16
documented in the operator deployment guide.

Risks captured for: dimension mismatch, HNSW build time, worker
stalls on malformed docs, chunk explosion on multi-thousand-page
PDFs, permission bypass through retrieval, stale chunks after
version change, OpenAI leak under EXTERNAL_AI_DISABLED, postgres
image change for operators.

Open questions parked: tokenizer choice, heading-aware tuning,
RRF weighting tuning UI, per-document index opt-out, cross-encoder
reranker, workspace-wide retrieval (deferred to Phase 14).
Repository	cpatpa/PIP
Author	Claude <noreply@anthropic.com>
Authored	2026-05-16T05:33:37Z
Parents	`876049f9`
Stats	1 file changed , +802
Part of	Phases 10-14 - design docs for web search, groups, multi-model, vector RAG, knowledge collections
docs: add Phase 13 (Vector RAG with pgvector) design

Capture this commit into my fork