f870b0a2 | [feat-007a] Vision-mode auto-on: send PDF page images to Olava | Nick Whitehouse | 2026-05-04 | ↗ GitHub |
commit body When a chat has any attached PDF, render every page (capped at 30, the
vLLM --limit-mm-per-prompt setting) to PNG via @napi-rs/canvas + pdfjs
and splice them into the last user message as OpenAI-style image_url
content blocks. Olava-001 (served from a Qwen3-VL base) reasons over
the document visually rather than waiting for read_document text
extraction.
Implementation:
- lib/pdfRender.ts: PDF buffer → base64 PNG per page
- lib/visionContext.ts: download PDFs from docStore, render, splice
- llm/types.ts: LlmMessage.content now string | LlmContentBlock[]
- llm/olava.ts: pass-through (vLLM serves multimodal natively)
- llm/claude.ts, llm/gemini.ts: flatten text blocks (vision is
Olava-only for now; defensive in case content reaches them)
- lib/chatTools.ts: detect vision content and add a system-prompt
hint so the model reads from images instead of waiting for tools
Test path: any chat with an attached PDF auto-enters vision mode. No
user-visible toggle yet - that comes after we know the quality is good.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
436d028d | [feat-007a] Live citation rendering for vision mode | Nick Whitehouse | 2026-05-04 | ↗ GitHub |
commit body Three pieces working together so pills appear progressively as the
model streams, instead of all-at-once at end-of-turn:
1) Stream-parse the hidden <CITATIONS> JSON block. Once <CITATIONS>
opens we accumulate into a buffer and brace-depth scan for newly
completed {...} entries every delta, emitting a citation_added SSE
event per entry the moment it's parseable. Each ref is deduped via
a per-turn Set so the end-of-turn batched citations event doesn't
re-emit. Resets the per-iter buffer in flushText.
2) Per-marker citation verifier (Olava non-streaming, parallel). When
exactly one PDF is in scope we pre-extract its text once, then in
onContentDelta scan iterText for newly-complete [N] or superscript
runs and fire a verifyCitation Promise per marker without awaiting.
Each resolution emits citation_added live + pushes to events. v1
currently misses 0/N - debug logging added but the stream-parser
path covers the live-pill UX independently.
3) Frontend rAF coalescer. A burst of 30+ citation_added events would
otherwise yield 30 setMessages calls → 30 ChatView re-renders → 30
updateScrollButton invocations, compounding into max-update-depth.
Buffer pending citations in a ref and flush once per animation
frame; force-flush at end-of-stream.
Plus: PDF render swapped from @napi-rs/canvas to node-canvas (napi
rejects pdfjs's internal Path2D objects in ctx.fill, breaking the
very first page). Frontend preprocessCitations also matches Unicode
superscript marker runs (¹²³⁴), which Olava sometimes prefers in
legal-style prose.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
787258cb | [feat-008] pdf-4up rendering via pdftoppm - fixes blank-image bug + 3× compression | Nick Whitehouse | 2026-05-06 | ↗ GitHub |
commit body Two outcomes in one commit:
1) Fixes a silent vision-mode bug. The prior pdfjs+node-canvas glyph
rendering produced blank PNGs - canvas v3 dropped Path2D, pdfjs 4.x
needs it for glyph paths, and the path2d polyfill didn't bridge the
gap. Vision-mode answers were entirely from read_document text
fallback; vision input was noise. Rendering now shells out to
pdftoppm (poppler-utils), which is battle-tested and renders glyphs
correctly.
2) Switches to 4-up grid composition (default pagesPerImage=4). Two
independent spike rounds (25-page services agreement and 75-page
SEK financing doc, see backend/spike-out/text-compression*) showed
≈3× token compression vs 1-up at no fidelity loss on legal-grade
factual queries (dates, currency amounts, party names, repayment
terms). 8-up was rejected - round 1 hallucinated, round 3 returned
mostly empty.
Combined effect: 4× page capacity per request (100 images × 4 pages =
≈400 PDF pages), blank-image bug gone, ~3× cheaper. The 75-page SEK
test doc that previously truncated to 30 pages in 1-up now fits whole.
Files:
- nixpacks.toml: aptPkgs += poppler-utils
- src/lib/pdfRender.ts: rewritten - pdftoppm + grid composer
- src/lib/visionContext.ts: VISION_MAX_IMAGES_PER_REQUEST=100, pass
pagesPerImage:4 to renderer
read_document text-fallback stays in place as a safety net.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
26ef15f4 | [bug] verifier: use olava-extract model name (was hardcoded olava-001) | Nick Whitehouse | 2026-05-06 | ↗ GitHub |
commit body Spike-leftover hardcode caused every per-marker verifier call to 404
against vLLM (which registers olava-extract, not olava-001). The
end-of-stream <CITATIONS> block parser still produced pills, so user-
visible behaviour was just "no progressive pill rendering during
streaming." Behaviour now matches the design.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
6bf6d52d | [feat-009] Vision perf: parallel render + tiered cache + progress UI | Nick Whitehouse | 2026-05-06 | ↗ GitHub |
commit body Four wins for the vision-mode wait time, ordered by user impact:
1) PROGRESS UI. Backend emits vision_render_start/done SSE events
around the pdftoppm call. Frontend renders a "Reading <filename>..."
block (matching the existing DocReadBlock pattern) instead of a
dead spinner. SSE stream now opens BEFORE the render so the
placeholder reaches the browser immediately. ~10s+ wait now feels
intentional rather than broken.
2) PARALLEL RENDER. pdftoppm is CPU-bound; one process handles only
one page at a time. Split into 4 workers via -f/-l page ranges,
each writing to its own subdir to avoid filename collisions. 75
pages went from 28s → 11.7s on bench. Page count discovered via
pdfinfo before splitting (also from poppler-utils).
3) IN-MEMORY LRU CACHE (visionCache.ts). 5-entry cap (composites are
1-2MB each, ~30MB per 75-page doc - keeps worst-case ≤150MB
resident on the 512MB Railway box). Subsequent turns against the
same doc skip render entirely; sub-millisecond hit. No SSE
placeholder events on a memory hit so the UI doesn't flicker.
4) R2 PERSISTENT CACHE (visionR2Cache.ts). Sits behind memory cache.
Single JSON manifest at vision-cache/<base64url(storagePath|p|d)>.json
contains the array of base64 composites. Survives backend restarts
and Railway redeploys. Render → memory write → fire-and-forget R2
write; subsequent processes hit R2 once, then promote to memory.
Errors swallowed - cache is best-effort.
Combined effect on a 75-page PDF:
- First chat ever: ~12s render, ~5MB R2 write
- Same chat session: sub-ms (memory)
- After backend restart: ~1-2s (R2 read + parse)
- New process or doc: back to first-chat numbers
Files:
- backend/src/lib/pdfRender.ts: parallelise pdftoppm; pdfinfo page count
- backend/src/lib/visionCache.ts: new - in-memory LRU
- backend/src/lib/visionR2Cache.ts: new - R2-backed manifest
- backend/src/lib/visionContext.ts: tiered lookup + SSE events + write hookup
- backend/src/routes/chat.ts: open SSE before render so placeholder ships
- frontend/src/app/hooks/useAssistantChat.ts: handle vision_render_start/done
- frontend/src/app/components/assistant/AssistantMessage.tsx: VisionRenderBlock
- frontend/src/app/components/shared/types.ts: vision_render variant on
AssistantEvent
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
b2840147 | [feat-010] Pre-render PDFs at upload + shimmer chip + send-button gate | Nick Whitehouse | 2026-05-06 | ↗ GitHub |
commit body After a PDF upload completes, fire a fire-and-forget vision pre-render
in the background. By the time the user opens a chat against the doc,
the R2 cache is warm and the chat skips the ~10s pdftoppm cost
entirely. Combined with feat-009's caches, first chat against a
freshly-uploaded doc is now near-instant from the user's perspective.
UX:
- Attachment chip in ChatInput shows immediately on attach
- Shimmer overlay (chip-shimmer keyframe in globals.css) plays while
pre-render is in flight - clear visual signal that the chat isn't
quite ready yet
- Send button disabled while ANY attached PDF is pending; tooltip
explains why. Belt-and-braces in handleSubmit so Enter doesn't
sneak past the disabled button
Backend:
- lib/visionPrerender.ts: in-process pending-renders map + R2 lookup
fallback so status survives restarts. kickOffVisionPrerender is
idempotent (no-op if already pending or ready).
- routes/documents.ts (handleDocumentUpload + version upload):
fire-and-forget kick-off after the documents.update completes.
Only PDFs - DOCX vision mode isn't wired.
- routes/projects.ts (project upload): same.
- GET /single-documents/:id/vision-status: returns
{status: pending|ready|failed|missing}. Cheap - combines memory
map with R2 manifest existence check.
Frontend:
- hooks/useVisionStatus.ts: polls vision-status every 1s per attached
PDF until status resolves; caps at 60 attempts so the UI never
locks if the backend goes weird.
- ChatInput uses the hook to drive shimmer + button-disable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
9c244fd2 | [maint] Doc tidy: Sprint 2 backlog entries + spike artifact | Nick Whitehouse | 2026-05-07 | ↗ GitHub |
commit body - backlog.md: append Sprint 2 (vision mode) covering feat-007a through
feat-010 with status, commits, and open issues. Add an "Open items"
section for bug-005 (verifier blocking [DONE]), feat-011 (vLLM
prefix caching), feat-012 (text-as-image compression), and the
feat-006 outcome note.
- backend/.gitignore: exclude spike-out/ - local benchmark output,
~70 paired markdown reports per run, regenerable on demand.
- backend/scripts/spike_compression.ts: keep the spike runner as a
reusable harness; current shape targets the round-3 SEK financing
doc with the 5 winner-candidate variants.
SECURITY.md (untracked, from the parallel security review) deliberately
left untouched - not part of this session's scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
b814d76a | [bug-005] Disable per-marker verifier by default; gate behind OLAVA_VERIFIER env | Nick Whitehouse | 2026-05-07 | ↗ GitHub |
commit body Verifier (feat-007a) was awaiting all in-flight per-marker Olava calls
at end-of-turn before sending [DONE]. Per-call latency observed 12-17s
on olava-extract, and 3-of-4 calls came back empty in practice. This
added ~12s to time-to-[DONE] on every chat with citations.
Empirically the model emits a clean <CITATIONS> JSON block on its own
the vast majority of the time - citations land via the existing block-
parser path regardless. The verifier is mostly redundant work today.
Single env-gated flag: when OLAVA_VERIFIER is unset (or anything other
than "on"), the pre-extract is skipped, verifierDocId stays null, and
the existing fireVerifier early-return turns marker detection into a
no-op. verifierPromises stays empty so the end-of-turn await collapses.
All supporting code preserved (verifyCitation, marker detection,
streaming SSE emit) so re-enabling is just OLAVA_VERIFIER=on if/when
we observe the model regressing on the JSON tail.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|