Inline email-attachment text in read_document for .eml and .msg

✅ merged · #15 · easterbrooka/mike ← easterbrooka/mike · opened 15d ago by easterbrooka · merged 15d ago by easterbrooka · self · +588-10 across 7 files · ↗ on GitHub

From the PR description

When the LLM reads a .msg or .eml, it now also gets the extracted text of each attachment (PDF, DOCX, TXT, XLSX, recursive EML/MSG up to 3 nesting levels deep) appended to the email body. Lets the LLM answer "summarise the attached contract" without the user having to manually re-upload attachments as separate documents.

UI is unchanged - the /display endpoint still serves the existing ParsedEml shape, attachment chips still list filenames only, and no new schema or storage is involved. Pure read-side enhancement.

Implementation:

  • New lib/extract/emailAttachments.ts: a shared dispatcher that takes {filename, bytes} and renders text by suffix. Skips inline images (png/jpg/svg/etc.) and any unknown type. Caps per-attachment at 50k chars and total across all attachments at 400k chars; nested .eml/.msg recurse up to MAX_RECURSION_DEPTH = 3 then bail with a placeholder.
  • eml.ts gains extractEmlForLLM(buf, depth=0): reuses the existing parse, filters out mailparser-flagged inline parts (related=true) so signature logos don't pollute output, then renders attachments through the shared dispatcher.
  • msg.ts gains extractMsgForLLM(buf, depth=0): same shape, sources attachment bytes via MsgReader.getAttachment(idx). Survives per-attachment getAttachment() failures.
  • chatTools.read_document eml/msg branches swap to the new For-LLM variants.

Tests: 16 new vitest cases (9 emailAttachments dispatcher, 3 eml extractEmlForLLM, 4 msg extractMsgForLLM). Backend suite now 133/133 green; tsc clean.

Our analysis

Inline attachment text when LLM reads .msg/.eml — read the full analysis →

Think the analysis missed something the PR description covers?

Capture this PR into my fork

Download a Markdown prompt that tells Claude how to port every commit in this PR into your working tree. Run it via claude -p < capture-pull-15.md from inside the repo you want the changes in.

⬇ Download capture-pull-15.md