mglynnhenley makes Mike show you which words it might be making up

A new colour overlay tints the parts of an answer the AI is least sure about - so you can see the risky bits before you trust them.

chat-uicontract-review

Every AI assistant invents things occasionally; the hard part is knowing which sentence to double-check. mglynnhenley's fork adds a confidence overlay that scores the assistant's answer word by word and tints the shakiest parts directly in the reply. A slider lets you set how strict the highlighting is. The same treatment carries over to document-review tables, so flagged cells light up as soon as the scores land - the text appears instantly, and the risk shading fades in a moment later.

The scoring runs through a separate model the fork talks to over the network, and it's strictly opt-in: if that service isn't wired up, nothing breaks - the highlights simply don't appear. The team also points a demo endpoint at a live test deployment so you can try the whole flow without standing up your own.

So what Anyone relying on AI answers in legal work should watch this: a built-in 'don't trust this part' signal is exactly the guardrail cautious practitioners have been asking for.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

3 commits from mglynnhenley/mikehasprobes, oldest first. Source extracted verbatim from the harvested git log.

SHA	Subject	Author	Date
`e7549126`	Add hallucination-probe scoring across chat + tabular review	Matilda	2026-05-06	↗ GitHub
commit body Wire Mike to a Modal-hosted, OpenAI-compatible probe service. After each Claude/Gemini response, send the completion as a prefilled assistant turn to the probe and stream per-token scores onto the existing SSE channel. Persist scores on `chat_messages.probe_scores` and `tabular_cells.probe_scores`. UI fades a heat-strip + risk badge under cells/messages as scores arrive. Also: local mock probe at /mock-probe for development without the Modal service, and a "Think" toggle on the chat input so users can opt into adaptive thinking per turn (off by default - Sonnet 4.6 was rejecting the unconditional flag). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`ef3d78ce`	Keep 000 base schema clean; probe columns live in 001/002	Matilda	2026-05-06	↗ GitHub
commit body Per review: the one-shot base schema should stay vanilla. Probe score columns are additive and belong only in 001_probe_scores.sql (tabular_cells) and 002_chat_probe_scores.sql (chat_messages), which already exist as incremental migrations. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`4dcbc056`	Consolidate probe migrations into single 001	Matilda	2026-05-06	↗ GitHub
Merge 002's chat_messages.probe_scores into 001 alongside the tabular_cells columns. One migration covers the entire probe schema extension; 002 deleted. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-279.md from inside the repo you want the changes in.

⬇ Download capture-thread-279.md