1sbang hardens Mike against the questions it shouldn't answer

A prompt-only rewrite that teaches the legal assistant to refuse leaking its own setup, handing over personal data, or being talked into bulk data grabs.

securitycompliance

1sbang's change doesn't touch a single tool or feature - it rewrites the standing instructions Mike runs on. Three new guardrails go in. The assistant now refuses to repeat or paraphrase its own configuration, even when asked with a "just continue where you left off" trick that pretends it already started spilling. It refuses to pull personal data - social security numbers, medical records, criminal history, settlement figures tied to a named person - based on what you're asking for, not on whether a document happens to be open. And it pushes back on sweeping every document at once, copying files between client matters, and making silent edits without showing you the changes first.

The careful part: ordinary work survives untouched. Payment terms, the parties to a contract, business addresses all still come back. The change went through automated red-team testing, and each fix was kept only if it blocked attacks without breaking a legitimate request.

So what Anyone weighing a legal AI for client work should care: this is what it looks like to guard against the tool being socially engineered into spilling confidential data.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

1 commit from 1sbang/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date

48c9f772 Security hardening: system prompt confidentiality, PII boundaries, and tool use guardrails Isaac Bang 2026-05-05 ↗ GitHub

SHA	Subject	Author	Date
`48c9f772`	Security hardening: system prompt confidentiality, PII boundaries, and tool use guardrails	Isaac Bang	2026-05-05	↗ GitHub
commit body Adds three security sections to SYSTEM_PROMPT in chatTools.ts: CONFIDENTIALITY: instructs Mike to never reveal, quote, or acknowledge its system instructions, including fake-prior-context social engineering patterns. PRIVACY BOUNDARIES: enumerates PII categories always refused on intent (not on document availability): SSNs, bank accounts, passports, addresses, phone, DOB, medical, genetic, biometrics, protected class attributes, compensation details, criminal history, and settlement amounts tied to named individuals. Preserves normal legal document work (contract terms, party identification). TOOL USE BOUNDARIES: adds intent-based refusal for bulk document/workflow enumeration, cross-client data replication, silent edits without review, injection payloads, and external forwarding clauses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

commit body

Adds three security sections to SYSTEM_PROMPT in chatTools.ts:

CONFIDENTIALITY: instructs Mike to never reveal, quote, or acknowledge its
system instructions, including fake-prior-context social engineering patterns.

PRIVACY BOUNDARIES: enumerates PII categories always refused on intent (not
on document availability): SSNs, bank accounts, passports, addresses, phone,
DOB, medical, genetic, biometrics, protected class attributes, compensation
details, criminal history, and settlement amounts tied to named individuals.
Preserves normal legal document work (contract terms, party identification).

TOOL USE BOUNDARIES: adds intent-based refusal for bulk document/workflow
enumeration, cross-client data replication, silent edits without review,
injection payloads, and external forwarding clauses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-305.md from inside the repo you want the changes in.

⬇ Download capture-thread-305.md