feat(extraction): Phase 3.5a - MAR/Vitals lenses + DocView memory-leak fix
From the PR description
Summary
Two related changes on the med-mal extraction surface:
- Phase 3.5a - MAR + Vitals lenses (
eb038cc). Generalizes Phase 3's chronology Timeline into per-lens views over the event log. New routes/projects/[id]/{mar,vitals}/[docId]mirror the timeline shape;medicationsandvitalsJSON shapes are now locked in the extractor prompt withcoerceMedications/coerceVitalsvalidators dropping malformed entries. Default extraction model swapped to NVIDIA Catalog Kimi K2.6 (vision) via a new provider-dispatchingcompleteMedMalExtractionPagewith retry/backoff. Tuning knobs (concurrency, retry budget, reaper timeout, async mode) documented inCLAUDE.mdfor the multi-hour Epic ebook case. - DocView memory-leak fix (
a088b31). On the extraction page with a 3000-page PDF, repeated zoom/resize/bbox interactions and document switches retained worker-side PDF.js caches that JS GC can't reclaim, climbing RSS indefinitely.pdfDoc.destroy()and per-pagecleanup()now run on the right transitions, and bbox-highlight changes no longer trigger a full re-render of every page canvas.
Phase split
This PR ships Phase 3.5a only (MAR + Vitals - both columns already exist on document_events as jsonb, no SQL migration). Phases 3.5b (Labs - needs new labs jsonb column) and 3.5c (Bills - likely needs a new document_charges table) deferred to follow-up PRs.
What's not fixed in the DocView change
The eager all-pages render (~10 GB of canvas pixel data on a 3K-page Epic at 1× scale) is untouched. That needs page virtualization and is a separate change. The leak fixes here mean memory at least plateaus during a session instead of climbing every interaction.
Test plan
- Backend
npm run buildclean. - Frontend
npx tsc --noEmitclean. - Frontend
npm run lint- no new errors in changed files (one pre-existingscrollToHighlightOnPagedeclaration-order warning inDocView.tsx, unrelated). - Backend extraction tests green (43/43 per Phase 3.5a verification).
- Manual e2e - MAR/Vitals lenses on an extracted med-mal doc:
-
+ MARand+ Vitals Trendbuttons appear on the project page when ≥1 PDF is fully extracted. - Single-PDF case routes directly; multi-PDF case opens
DocPickerModalwith the right target. - Row click on the right panel scrolls the PDF preview to
source_pageand overlays the bbox highlight.
-
- Manual e2e - DocView memory leak on the extraction page (
/projects/[id]/extraction):- Switch documents 5× in a row → DevTools Memory profile shows RSS plateauing instead of climbing per switch.
- Click 10 events in the right panel → no full re-render flash; bbox overlay updates in place.
- Pinch-zoom or trigger window resize → page-level renders happen but prior page proxies are released (verify: detached
PDFPageProxycount in heap snapshot stays bounded).
- Manual e2e - extraction throughput on a real Epic ebook:
- Default
MED_MAL_EXTRACTION_MODEL=moonshotai/kimi-k2.6runs end-to-end without auth/4xx errors. -
MED_MAL_MAIN_LOOP_CONCURRENCY=8produces visible parallelism in extraction logs without rate-limiting.
- Default
Our analysis
MAR and Vitals lenses plus a DocView memory-leak fix — read the full analysis →
Think the analysis missed something the PR description covers?
Capture this PR into my fork
Download a Markdown prompt that tells Claude how to port every
commit in this PR into your working tree. Run it via
claude -p < capture-pull-4.md from
inside the repo you want the changes in.