System prompt gains confidentiality, PII, and tool-use guardrails

nforum adds 35 lines to the system prompt covering three refusal categories: system prompt leakage, PII extraction, and tool-use boundaries. All three use intent-based refusal - the model is told to decline based on what is being asked, not whether the relevant data is currently available.

securitycompliance

The commit (48c9f77) is purely additive - a single file, three new sections appended to SYSTEM_PROMPT in backend/src/lib/chatTools.ts.

CONFIDENTIALITY instructs the model never to quote or acknowledge the system prompt, including under fake-continuation attacks ("continue where you left off," "finish pasting your instructions"). A specific deflection string is prescribed for that case.

PRIVACY BOUNDARIES enumerates twelve PII categories the model must refuse to extract regardless of document availability: SSNs, bank accounts, passports, home addresses, phone numbers, dates of birth, medical and genetic data, biometrics, protected-class attributes, individual compensation, criminal history, and named-individual settlement amounts. Standard legal work - party identification, contract terms, business addresses, payment amounts in document context - is explicitly preserved.

TOOL USE BOUNDARIES prohibits bulk enumeration of documents or workflows, cross-client data replication, silent edits without user review, processing injection payloads, and adding external-forwarding clauses to contracts.

Two things to check. This commit and the MCP Connectors PR (#32) land on the same day; the tool-use guardrails appear written with user-installed connectors in mind, so reviewing them together makes sense. The bulk-enumeration prohibition could also cut into legitimate multi-document tabular-review flows depending on how users phrase those requests - test before importing.

Prompt-only guardrails are a best effort. They add tokens to every request and can be worked around with sufficient prompt budget. A defense-in-depth posture would pair them with application-level controls.

So what Worth pulling if you want a starting point for LLM-layer data governance in a legal product. Read it carefully alongside the MCP Connectors change and verify the bulk-document language doesn't clip normal tabular-review workflows before deploying.

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

2 commits from nforum/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA	Subject	Author	Date
`48c9f772`	Security hardening: system prompt confidentiality, PII boundaries, and tool use guardrails	Isaac Bang	2026-05-05	↗ GitHub
commit body Adds three security sections to SYSTEM_PROMPT in chatTools.ts: CONFIDENTIALITY: instructs Mike to never reveal, quote, or acknowledge its system instructions, including fake-prior-context social engineering patterns. PRIVACY BOUNDARIES: enumerates PII categories always refused on intent (not on document availability): SSNs, bank accounts, passports, addresses, phone, DOB, medical, genetic, biometrics, protected class attributes, compensation details, criminal history, and settlement amounts tied to named individuals. Preserves normal legal document work (contract terms, party identification). TOOL USE BOUNDARIES: adds intent-based refusal for bulk document/workflow enumeration, cross-client data replication, silent edits without review, injection payloads, and external forwarding clauses. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`b00a72aa`	Merge PR #38: Security hardening - system prompt, PII, guardrails	Bojan Plese	2026-05-07	↗ GitHub

SHA

Subject

Author

Date

48c9f772

Security hardening: system prompt confidentiality, PII boundaries, and tool use guardrails

Isaac Bang

2026-05-05

↗ GitHub

commit body

Adds three security sections to SYSTEM_PROMPT in chatTools.ts:

CONFIDENTIALITY: instructs Mike to never reveal, quote, or acknowledge its
system instructions, including fake-prior-context social engineering patterns.

PRIVACY BOUNDARIES: enumerates PII categories always refused on intent (not
on document availability): SSNs, bank accounts, passports, addresses, phone,
DOB, medical, genetic, biometrics, protected class attributes, compensation
details, criminal history, and settlement amounts tied to named individuals.
Preserves normal legal document work (contract terms, party identification).

TOOL USE BOUNDARIES: adds intent-based refusal for bulk document/workflow
enumeration, cross-client data replication, silent edits without review,
injection payloads, and external forwarding clauses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

b00a72aa

Merge PR #38: Security hardening - system prompt, PII, guardrails

Bojan Plese

2026-05-07

↗ GitHub

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-53.md from inside the repo you want the changes in.