mglynnhenley straps a hallucination meter onto Mike

Every assistant reply now gets shadow-scored for hallucination risk, with the doubt rendered as a fading heat strip under the text.

chat-uicompliance

After each chat turn or table cell from Mike, mglynnhenley's fork quietly replays the assistant's answer to an external scoring service - hosted on Modal, a platform for running AI workloads - that grades the output token by token for how likely it is to be made up. The text shows up at normal speed; a heat strip and a risk badge fade in a beat later, so the model still feels as fast as before.

The plumbing is pragmatic. A local mock endpoint lets the team keep building when the scoring service is down, and a circuit breaker means an unreachable probe just hides the highlights rather than breaking the chat. There's also a small new "Think" toggle next to the model picker, off by default, because one of the underlying models was refusing requests when adaptive thinking was sent on every turn.

So what For legal teams flirting with generative AI on real matters, a visible, persistent confidence layer is closer to what review workflows actually need than a raw chat box.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

3 commits from mglynnhenley/mikehasprobes, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date
e7549126 Add hallucination-probe scoring across chat + tabular review Matilda 2026-05-06 ↗ GitHub
commit body
Wire Mike to a Modal-hosted, OpenAI-compatible probe service. After
each Claude/Gemini response, send the completion as a prefilled
assistant turn to the probe and stream per-token scores onto the
existing SSE channel. Persist scores on `chat_messages.probe_scores`
and `tabular_cells.probe_scores`. UI fades a heat-strip + risk badge
under cells/messages as scores arrive.

Also: local mock probe at /mock-probe for development without the
Modal service, and a "Think" toggle on the chat input so users can
opt into adaptive thinking per turn (off by default - Sonnet 4.6 was
rejecting the unconditional flag).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ef3d78ce Keep 000 base schema clean; probe columns live in 001/002 Matilda 2026-05-06 ↗ GitHub
commit body
Per review: the one-shot base schema should stay vanilla. Probe
score columns are additive and belong only in 001_probe_scores.sql
(tabular_cells) and 002_chat_probe_scores.sql (chat_messages),
which already exist as incremental migrations.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4dcbc056 Consolidate probe migrations into single 001 Matilda 2026-05-06 ↗ GitHub
Merge 002's chat_messages.probe_scores into 001 alongside the
tabular_cells columns. One migration covers the entire probe schema
extension; 002 deleted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-279.md from inside the repo you want the changes in.

⬇ Download capture-thread-279.md