[feat-007a] Vision-mode auto-on: send PDF page images to Olava

↗ view on GitHub · Nick Whitehouse · 2026-05-04 · f870b0a2

When a chat has any attached PDF, render every page (capped at 30, the
vLLM --limit-mm-per-prompt setting) to PNG via @napi-rs/canvas + pdfjs
and splice them into the last user message as OpenAI-style image_url
content blocks. Olava-001 (served from a Qwen3-VL base) reasons over
the document visually rather than waiting for read_document text
extraction.

Implementation:
  - lib/pdfRender.ts: PDF buffer → base64 PNG per page
  - lib/visionContext.ts: download PDFs from docStore, render, splice
  - llm/types.ts: LlmMessage.content now string | LlmContentBlock[]
  - llm/olava.ts: pass-through (vLLM serves multimodal natively)
  - llm/claude.ts, llm/gemini.ts: flatten text blocks (vision is
    Olava-only for now; defensive in case content reaches them)
  - lib/chatTools.ts: detect vision content and add a system-prompt
    hint so the model reads from images instead of waiting for tools

Test path: any chat with an attached PDF auto-enters vision mode. No
user-visible toggle yet - that comes after we know the quality is good.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Repository nwhitehouse/mike
Author Nick Whitehouse <nick.whitehouse@mccarthyfinch.com>
Authored
Parents 8731d95e
Stats 10 files changed , +381 , -35
Part of Vision mode: PDF page images → Olava (feat-007a / 008 / 009 / 010 / bug-005)

Capture this commit into my fork

Download a Markdown prompt that tells Claude how to port this exact commit into your working tree. Run it via claude -p < capture-commit-f870b0a2.md from inside the repo you want the change in.

⬇ Download capture-commit-f870b0a2.md