feat: native PDF vision for scanned documents

↗ view on GitHub · Emmanuel Carjat · 2026-05-11 · 74564426

Add multimodal PDF processing so scanned PDFs (no text layer) can be
analysed by Gemini and Claude models instead of returning empty results.

- streamGeminiMultimodal: pass raw PDF bytes as inlineData to Gemini
- streamClaudeMultimodal: pass raw PDF bytes as document content block to Claude
- loadSourceTexts: store rawPdfBase64 when pdfjs extracts no text
- queryGeminiAllColumns: dispatch to vision path for Gemini/Claude;
  return a clear grey error cell for OpenAI (no native PDF support)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Repository ecarjat/mike
Author Emmanuel Carjat <emmanuel.carjat@quanthouse.com>
Authored
Parents be1665ab
Stats 4 files changed , +180 , -34
Part of Native PDF vision for scanned documents (Gemini + Claude)

Capture this commit into my fork

Download a Markdown prompt that tells Claude how to port this exact commit into your working tree. Run it via claude -p < capture-commit-74564426.md from inside the repo you want the change in.

⬇ Download capture-commit-74564426.md