feat(extraction): Phase 2 med-mal extraction pipeline
From the PR description
Summary
Phase 2 med-mal extraction: Postgres (0002-0005), patch_document_extraction_run with GRANTs, per-page Claude JSON extraction, raster + vision for empty text layers (node-canvas, R2 keys per run, end-of-run sweep), §145.64 vision-page peer-review prescan halting before any event call when scanned markers are detected, optional queue mode (EXTRACTION_ASYNC_MODE=queue), REST + UI + chat tools, Vitest and Supertest (403/404/409), backend CI.
Ops
- Apply
backend/migrations/0002-0005on each Supabase environment. - Serverless: use queue mode + a worker process; set
EXTRACTION_JOB_POLL_MSif needed. canvasnative dependency required for scanned pages.- §145.64 vision prescan cost: on a 3K-page Epic with ~60% scanned pages, expect ~$5-7 in Claude marker-detection calls before event extraction begins. The prescan is unconditional by design (no kill-switch env var) so the compliance gate cannot be bypassed.
Compliance gate (closed in this PR)
The original prescan read only the text layer; scanned pages with peer review / M&M conference / RCA report / etc. visible only in the raster bypassed the halt and could result in event rows being written for protected content. New peerReviewVisionPrescan.ts renders empty-text pages, asks Claude whether any canonical PEER_REVIEW_MARKERS phrase is visible, and halts via the existing red-flag insert path before any event-extraction call. Rasters are cached and reused by the main loop.
Follow-ups
- Apply migration
0005_extraction_async_jobs_document_index.sqlon each environment; verifyget_advisors --type performanceno longer flags the unindexed FK. - Gemini multimodal behind flag (deferred until Gemini path has its own JSON-schema tests).
- Periodic R2 sweeper for orphaned rasters from hard crashes (current cleanup is best-effort end-of-run).
Our analysis
Close the §145.64 peer-review compliance gate with vision prescan — read the full analysis →
Think the analysis missed something the PR description covers?
Capture this PR into my fork
Download a Markdown prompt that tells Claude how to port every
commit in this PR into your working tree. Run it via
claude -p < capture-pull-2.md from
inside the repo you want the changes in.