Multimodal fallback for scanned PDFs: Gemini and Claude get inline PDF vision, OpenAI gets an explicit stub

Scanned legal documents - the kind with no text layer - were silently returning empty cells in tabular reviews. ecarjat fixed that by detecting the failure and routing to native PDF vision on Gemini and Claude, with a deliberate "not supported" cell for OpenAI rather than a silent zero.

discoverycontract-review

The change touches four files. In routes/tabular.ts, loadSourceTexts now checks whether pdfjs extracted any text; if it didn't, it encodes the raw PDF bytes as base64 and attaches them to a new rawPdfBase64 field on the source object. The calling function queryGeminiAllColumns then dispatches on provider: Gemini gets streamGeminiMultimodal, which passes an inlineData part on the user turn; Claude gets streamClaudeMultimodal, which wraps the bytes in an Anthropic document content block. OpenAI, which has no equivalent, gets a grey "OpenAI does not support PDF vision" cell. All three new exports are wired through backend/src/lib/llm/index.ts, and a GeminiUserPart type is exposed for callers.

The two multimodal stream functions are additive - existing text-path behaviour is unchanged. The dispatch in queryGeminiAllColumns is the only integration point that runs on live traffic.

A few things to think through before importing. Raw PDF bytes sent to an LLM are substantially larger than extracted text: a 50-page scanned document is going to hit your token budget hard. If you're pulling this in, pairing it with a per-user page cap is worth doing upfront. Also, the function name queryGeminiAllColumns is now misleading - it dispatches across three providers. Worth renaming on import to avoid confusion later.

So what Worth importing if scanned PDFs are a real input type for your users and you're running Gemini or Claude. The code is clean and the OpenAI stub is better than silent failure. Budget for the token cost of inline PDF data before turning it on.

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

1 commit from ecarjat/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA	Subject	Author	Date
`74564426`	feat: native PDF vision for scanned documents	Emmanuel Carjat	2026-05-11	↗ GitHub
commit body Add multimodal PDF processing so scanned PDFs (no text layer) can be analysed by Gemini and Claude models instead of returning empty results. - streamGeminiMultimodal: pass raw PDF bytes as inlineData to Gemini - streamClaudeMultimodal: pass raw PDF bytes as document content block to Claude - loadSourceTexts: store rawPdfBase64 when pdfjs extracts no text - queryGeminiAllColumns: dispatch to vision path for Gemini/Claude; return a clear grey error cell for OpenAI (no native PDF support) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

SHA

Subject

Author

Date

74564426

feat: native PDF vision for scanned documents

Emmanuel Carjat

2026-05-11

↗ GitHub

commit body

Add multimodal PDF processing so scanned PDFs (no text layer) can be
analysed by Gemini and Claude models instead of returning empty results.

- streamGeminiMultimodal: pass raw PDF bytes as inlineData to Gemini
- streamClaudeMultimodal: pass raw PDF bytes as document content block to Claude
- loadSourceTexts: store rawPdfBase64 when pdfjs extracts no text
- queryGeminiAllColumns: dispatch to vision path for Gemini/Claude;
  return a clear grey error cell for OpenAI (no native PDF support)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-398.md from inside the repo you want the changes in.