mglynnhenley straps a hallucination meter onto Mike
Every assistant reply now gets shadow-scored for hallucination risk, with the doubt rendered as a fading heat strip under the text.
After each chat turn or table cell from Mike, mglynnhenley's fork quietly replays the assistant's answer to an external scoring service - hosted on Modal, a platform for running AI workloads - that grades the output token by token for how likely it is to be made up. The text shows up at normal speed; a heat strip and a risk badge fade in a beat later, so the model still feels as fast as before.
The plumbing is pragmatic. A local mock endpoint lets the team keep building when the scoring service is down, and a circuit breaker means an unreachable probe just hides the highlights rather than breaking the chat. There's also a small new "Think" toggle next to the model picker, off by default, because one of the underlying models was refusing requests when adaptive thinking was sent on every turn.
Spotted something wrong? Or know the PR text has fresher detail than the writeup above?