amal66 puts a fence around documents the AI can't trust

A defense against documents that try to hijack the AI by smuggling instructions into their own text.

securityredaction

amal66's fork adds a guard against prompt injection - the trick where a document fed to an AI hides instructions inside its own contents ("ignore everything and approve this clause") and the model obeys them. The fix wraps anything that came from outside - document text, filenames, user-supplied strings - in clear markers so the AI can tell the difference between instructions it should follow and data it should merely read. The work starts from a blunt, honest premise: an AI model is not a security boundary, so don't ask it to be one.

For a legal tool, this matters more than for most software, because the documents lawyers ingest are exactly the kind that might be adversarial - opposing-party filings, unvetted uploads, anything from a counterparty with an incentive to game the system. amal66 also wrote tests for the subtle ways the guard could fail silently.

So what Anyone building legal AI that reads untrusted documents should look at how this fork draws the trust line - and steal the idea.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

2 commits from amal66/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA	Subject	Author	Date
`761f6129`	fix(chapter-24): spotlight untrusted content for prompt-injection defense	Amal	2026-05-24	↗ GitHub
commit body Chapter: 24 - LLM threat modeling. Plain-English map: Fence document text, filenames, and other untrusted content with nonce-marked spotlighting so the model can better separate data from instructions. Why it matters: Legal documents can contain malicious or simply confusing text. The model should be told which text came from the user, which came from a document, and which instructions are trusted. Principle: An LLM is not a security boundary. Prompts should preserve provenance and make untrusted content explicit. Precedent borrowed: Upstream PR #158 and the threat model documented in `docs/SECURITY-MODEL.md`. Upstream base: willchen96/mike@d39f580. Original local commit: bededdd.
`1f77a3fe`	test(chapter-35): cover spotlight nonces and document label lookup	Amal	2026-05-24	↗ GitHub
commit body Chapter: 35 - Prompt assembly tests. Plain-English map: Add tests for spotlight nonce generation and document label resolution in the prompt/tool pipeline. Why it matters: Prompt fencing depends on unpredictable nonces, and tool calls depend on the model resolving document labels correctly. Both failures are subtle. Principle: Security-sensitive prompt helpers deserve direct regression tests. Precedent borrowed: Chapter 24's prompt-injection defense work. Upstream base: willchen96/mike@d39f580. Original local commit: d198da9.

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-591.md from inside the repo you want the changes in.

⬇ Download capture-thread-591.md