Inline email-attachment text in read_document for .eml and .msg
From the PR description
When the LLM reads a .msg or .eml, it now also gets the extracted text of each attachment (PDF, DOCX, TXT, XLSX, recursive EML/MSG up to 3 nesting levels deep) appended to the email body. Lets the LLM answer "summarise the attached contract" without the user having to manually re-upload attachments as separate documents.
UI is unchanged - the /display endpoint still serves the existing ParsedEml shape, attachment chips still list filenames only, and no new schema or storage is involved. Pure read-side enhancement.
Implementation:
- New lib/extract/emailAttachments.ts: a shared dispatcher that takes {filename, bytes} and renders text by suffix. Skips inline images (png/jpg/svg/etc.) and any unknown type. Caps per-attachment at 50k chars and total across all attachments at 400k chars; nested .eml/.msg recurse up to MAX_RECURSION_DEPTH = 3 then bail with a placeholder.
- eml.ts gains extractEmlForLLM(buf, depth=0): reuses the existing parse, filters out mailparser-flagged inline parts (related=true) so signature logos don't pollute output, then renders attachments through the shared dispatcher.
- msg.ts gains extractMsgForLLM(buf, depth=0): same shape, sources attachment bytes via MsgReader.getAttachment(idx). Survives per-attachment getAttachment() failures.
- chatTools.read_document eml/msg branches swap to the new For-LLM variants.
Tests: 16 new vitest cases (9 emailAttachments dispatcher, 3 eml extractEmlForLLM, 4 msg extractMsgForLLM). Backend suite now 133/133 green; tsc clean.
Our analysis
Inline attachment text when LLM reads .msg/.eml — read the full analysis →
Think the analysis missed something the PR description covers?
Capture this PR into my fork
Download a Markdown prompt that tells Claude how to port every
commit in this PR into your working tree. Run it via
claude -p < capture-pull-15.md from
inside the repo you want the changes in.