nforum teaches Mike when to say no

A new set of refusal rules tells the assistant to clam up on personal data, resist prompt-leak tricks, and stay inside its lane on tool use.

securitycompliance

nforum added three guardrail sections to Mike's core instructions. The first stops the assistant from quoting or even acknowledging its own hidden instructions, including when a user tries the old trick of pretending an earlier conversation was cut off mid-paste.

The second draws a clear line on personal data: the assistant will refuse to pull out things like Social Security numbers, bank details, medical history, or named individuals' settlement figures, no matter what's been uploaded. Ordinary legal work - contract terms, party names, business addresses - stays untouched; the block triggers on the type of request, not on what documents happen to be available. The third limits tool misuse, such as bulk-harvesting documents or quietly copying one client's data into another's matter.

Worth knowing: these are instructions, not hard walls, and the bulk-document rule could trip up legitimate multi-file review depending on phrasing.

So what Anyone weighing an AI tool against confidentiality duties should look at how refusals like these are drawn - and how easily they bend.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

2 commits from nforum/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date
48c9f772 Security hardening: system prompt confidentiality, PII boundaries, and tool use guardrails Isaac Bang 2026-05-05 ↗ GitHub
commit body
Adds three security sections to SYSTEM_PROMPT in chatTools.ts:

CONFIDENTIALITY: instructs Mike to never reveal, quote, or acknowledge its
system instructions, including fake-prior-context social engineering patterns.

PRIVACY BOUNDARIES: enumerates PII categories always refused on intent (not
on document availability): SSNs, bank accounts, passports, addresses, phone,
DOB, medical, genetic, biometrics, protected class attributes, compensation
details, criminal history, and settlement amounts tied to named individuals.
Preserves normal legal document work (contract terms, party identification).

TOOL USE BOUNDARIES: adds intent-based refusal for bulk document/workflow
enumeration, cross-client data replication, silent edits without review,
injection payloads, and external forwarding clauses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
b00a72aa Merge PR #38: Security hardening - system prompt, PII, guardrails Bojan Plese 2026-05-07 ↗ GitHub

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-53.md from inside the repo you want the changes in.

⬇ Download capture-thread-53.md