ecarjat teaches Mike to read scanned PDFs
Tabular reviews stop returning blank cells when the document is a scan instead of a real text file.
Plenty of legal documents arrive as scans - a photocopy saved as a PDF with no underlying text layer. Until now, when @ecarjat's fork tried to pull structured data out of those files for a tabular review, it just came back empty and the user was none the wiser.
The fix routes scanned PDFs straight to the AI as images instead of text. Google's Gemini and Anthropic's Claude both accept PDFs natively and now handle these documents directly; OpenAI doesn't support that yet, so those cells come back with a clear note rather than a silent blank. Worth flagging: sending whole scanned PDFs to an AI is meaningfully more expensive than sending extracted text, so anyone borrowing this will want a page cap alongside it.
Spotted something wrong? Or know the PR text has fresher detail than the writeup above?