Phase 2 extraction pipeline: §145.64 vision prescan closes compliance gap on scanned pages

rmerk's Phase 2 med-mal extraction adds a peer-review compliance gate that the earlier text-only prescan left open. On scanned pages where peer-review marker phrases appear only in the raster - not the text layer - the original check would have passed them through to event extraction. The new path rasterizes those pages, asks Claude whether markers are visible, and halts before any event row is written.

The prescan halts extraction through the existing red-flag insert path when peerReviewVisionPrescan.ts detects peer-review markers on pages that had empty text layers. There is no environment variable to disable this. rmerk is explicit that the gate has no bypass, which means every scanned page on a 3,000-page Epic with a ~60% scan rate burns a marker-detection call before event extraction begins - the author estimates $5-7 in prescan spend alone on such a document.

Underneath the compliance gate, Phase 2 is a substantial pipeline: four Postgres migrations (0002-0005) covering document_events, document_red_flags, document_extraction_run, and extraction_async_jobs; per-page Claude JSON extraction in medMalExtractor.ts; raster-plus-vision fallback for pages without a usable text layer (pdfRegions.ts exports renderPageToPngBuffer with a 2048px edge cap); R2 key storage for per-run rasters swept at end-of-run, with prescan rasters cached so the main extraction loop doesn't re-render. Optional EXTRACTION_ASYNC_MODE=queue adds a worker table for serverless deployments where synchronous long-running extractions would time out.

The test coverage adds Vitest and Supertest cases for the 403/404/409 paths plus malformed-PDF error handling. A backend CI workflow (backend-ci.yml) is included. Three follow-up items are flagged: the 0005 index migration needs applying per environment with an advisor check; Gemini multimodal support is deferred behind a flag until it has its own JSON-schema tests; and there's no periodic R2 sweeper yet for rasters orphaned by hard crashes.