Fix .msg body extraction + inner-msg attachment detection
From the PR description
Two bugs surfaced once users uploaded real .msg files from Outlook:
Body missing. Outlook composes most emails in HTML mode, which means
data.body(PidTagBody) is undefined and onlydata.bodyHtml(PidTagBodyHtml) is populated. Our extractor was reading body only and quietly returning "". Now falls back to HTML-stripped bodyHtml, matching the .eml path's behaviour. Plain-text body still wins when both are present. RTF-compressed-only bodies remain unhandled - defer until we hit one in practice.Inner-msg attachments dropped silently. msgreader marks embedded messages with
innerMsgContent: trueand stores their human-readable name in.namerather than.fileName- itsgetAttachment()then constructsname + ".msg"as the filename. We were filtering attachments on!!a.fileNameonly, which threw inner-msg entries away in both the UI's attachment chip list AND the LLM's expansion loop. Now we detect inner-msg viainnerMsgContent === true, synthesise the same filename, and surface them downstream.
Tests: 6 new vitest cases (HTML-body fallback, plain-text-body priority, no-body case, inner-msg listed with synthesised filename, inner-msg without a name, inner-msg expanded recursively in extractMsgForLLM). Backend suite 139/139 green; tsc clean.
Our analysis
Recover HTML bodies and inner-msg attachments from Outlook .msg uploads — read the full analysis →
Think the analysis missed something the PR description covers?
Capture this PR into my fork
Download a Markdown prompt that tells Claude how to port every
commit in this PR into your working tree. Run it via
claude -p < capture-pull-16.md from
inside the repo you want the changes in.