nforum tightens Mike's guardrails for confidentiality, PII, and tool use

A short addition to the assistant's instructions tries to make Mike say no to a specific set of legal-product worst cases.

securitycompliance

Isaac Bang's fork bolts three refusal sections onto Mike's core instructions. The first tells the assistant never to reveal or even acknowledge its own internal instructions, including when a user pretends a prior conversation already unlocked them. The second is a privacy layer that refuses on intent rather than on what happens to be in the documents - SSNs, bank details, medical history, comp, criminal records, settlement amounts tied to named individuals - while explicitly preserving normal contract-terms and party-identification work. The third draws lines around tool use: no bulk enumeration of a firm's documents or workflows, no cross-client data shuffling, no silent edits, no acting on injected instructions.

Worth noting: instruction-only guardrails are best-effort and bypassable with enough pressure, and the bulk-enumeration rule could clip legitimate review-across-many-documents work. Read the language before importing.

So what Anyone deploying Mike inside a firm should look at this as a starting template for what the assistant should refuse - and then pressure-test it before trusting it.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

2 commits from nforum/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date
48c9f772 Security hardening: system prompt confidentiality, PII boundaries, and tool use guardrails Isaac Bang 2026-05-05 ↗ GitHub
commit body
Adds three security sections to SYSTEM_PROMPT in chatTools.ts:

CONFIDENTIALITY: instructs Mike to never reveal, quote, or acknowledge its
system instructions, including fake-prior-context social engineering patterns.

PRIVACY BOUNDARIES: enumerates PII categories always refused on intent (not
on document availability): SSNs, bank accounts, passports, addresses, phone,
DOB, medical, genetic, biometrics, protected class attributes, compensation
details, criminal history, and settlement amounts tied to named individuals.
Preserves normal legal document work (contract terms, party identification).

TOOL USE BOUNDARIES: adds intent-based refusal for bulk document/workflow
enumeration, cross-client data replication, silent edits without review,
injection payloads, and external forwarding clauses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
b00a72aa Merge PR #38: Security hardening - system prompt, PII, guardrails Bojan Plese 2026-05-07 ↗ GitHub

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-53.md from inside the repo you want the changes in.

⬇ Download capture-thread-53.md