[feat-007a] Vision-mode auto-on: send PDF page images to Olava
When a chat has any attached PDF, render every page (capped at 30, the
vLLM --limit-mm-per-prompt setting) to PNG via @napi-rs/canvas + pdfjs
and splice them into the last user message as OpenAI-style image_url
content blocks. Olava-001 (served from a Qwen3-VL base) reasons over
the document visually rather than waiting for read_document text
extraction.
Implementation:
- lib/pdfRender.ts: PDF buffer → base64 PNG per page
- lib/visionContext.ts: download PDFs from docStore, render, splice
- llm/types.ts: LlmMessage.content now string | LlmContentBlock[]
- llm/olava.ts: pass-through (vLLM serves multimodal natively)
- llm/claude.ts, llm/gemini.ts: flatten text blocks (vision is
Olava-only for now; defensive in case content reaches them)
- lib/chatTools.ts: detect vision content and add a system-prompt
hint so the model reads from images instead of waiting for tools
Test path: any chat with an attached PDF auto-enters vision mode. No
user-visible toggle yet - that comes after we know the quality is good.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Repository | nwhitehouse/mike |
|---|---|
| Author | Nick Whitehouse <nick.whitehouse@mccarthyfinch.com> |
| Authored | |
| Parents | 8731d95e |
| Stats | 10 files changed , +381 , -35 |
| Part of | Vision mode: PDF page images → Olava (feat-007a / 008 / 009 / 010 / bug-005) |
Capture this commit into my fork
Download a Markdown prompt that tells Claude how to port this
exact commit into your working tree. Run it via
claude -p < capture-commit-f870b0a2.md
from inside the repo you want the change in.