feat: add redline-aware document extraction

✅ merged · #1 · Mace-legal/mike ← Mace-legal/mike · opened 16d ago by JonasBoury · merged 16d ago · self · ↗ on GitHub

From the PR description

Summary

  • surface DOCX tracked changes and comment bubbles as inline markers for the assistant
  • add optional PyMuPDF-based PDF redline extraction with pdfjs fallback
  • teach chat and tabular prompts how to interpret insertions, deletions, moved text, and reviewer comments
  • preserve Mammoth tabular DOCX extraction when no review markup is detected

Verification

  • npm run build --prefix backend
  • DOCX smoke test for insertion/deletion/comment markers using synthetic DOCX
  • PDF smoke test for red/blue/green text markers using PyMuPDF-generated sample

Our analysis

Surface DOCX and PDF redlines as inline markers for the assistant — read the full analysis →

Think the analysis missed something the PR description covers?

Capture this PR into my fork

Download a Markdown prompt that tells Claude how to port every commit in this PR into your working tree. Run it via claude -p < capture-pull-1.md from inside the repo you want the changes in.

⬇ Download capture-pull-1.md