jamietso teaches Mike to read the redlines, not the clean copy

The AI now sees insertions, deletions and comments as edits - so it can tell you what the other side actually changed.

contract-reviewchat-ui

Out of the box, Mike reads contracts the way a clean printout would: it accepts every tracked change and drops the comment bubbles before the document reaches the model. Useful for summarising a final draft, useless when the whole question is "what did counterparty move?"

jamietso flips that. The assistant now ingests documents with the edits left in - each insertion, deletion and margin comment flagged inline, so you can ask it to walk you through a markup the way an associate would. It handles both Word files and PDFs, and the PDF side reads colour-coded redlines produced by the usual comparison tools - Litera, Workshare, and Word's own Compare - by spotting the red, blue and green text those tools emit.

So what Anyone who lives in negotiation drafts - transactional lawyers, contract-review teams - should look, because reviewing the changes is the job, and most AI tools quietly throw them away.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

1 commit from jamietso/mike-redline, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date
394f2ba2 Add redline-aware DOCX/PDF extraction and comment-bubble support Jamie Tso 2026-05-05 ↗ GitHub
commit body
Feeds tracked changes and review comments to the LLM as inline markers
instead of stripping them ("accepted view"). Closes the redline-reading
gap that closed-source legal AI products like Harvey and Legora ship as
a paid feature.

DOCX
- extractDocxBodyText (lib/docxTrackedChanges.ts): walks document.xml and
  emits {++ins++} / {--del--} for w:ins/w:del, and {>>by AUTHOR: text<<}
  for comment bubbles loaded once from word/comments.xml.
- tabular's extractDocxMarkdown switches from mammoth to the same
  redline-aware extractor so column extraction sees redlines too.

PDF
- New scripts/redline_extract.py uses PyMuPDF to detect color-based
  redlines per text span: red/strikethrough -> {--del--},
  blue/underline -> {++ins++}, green -> {<<moved>>}. Algorithm ported
  from Diff Master's browserPyMuPdfProcessor (Pyodide), now spawned as a
  Node subprocess via lib/pdfRedlineExtract.ts. Falls back to pdfjs-dist
  text-only extraction if Python or pymupdf are unavailable.
- extractPdfMarkdown (tabular) and extractPdfText (chatTools) both call
  the new extractor first.

Prompts
- chatTools SYSTEM_PROMPT and tabular EXTRACTION_SYSTEM / SYSTEM all
  document the {++/--/<<>>}, {>>...<<} markers so the LLM knows how to
  read them and what "current" vs "original" means.

Misc
- storage.ts: forcePathStyle: true on the S3 client so MinIO and other
  path-style S3 endpoints work locally without subdomain DNS.
- Sidebar / layout / site-logo: brand reads "Mike (v2)" so side-by-side
  comparisons against upstream are unambiguous.
- backend/.env.example: PYTHON_BIN documented; pymupdf install line in
  README.

Adds Python 3.10+ + pymupdf as an optional runtime dep - extractor
gracefully no-ops to text-only if either is missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-102.md from inside the repo you want the changes in.

⬇ Download capture-thread-102.md