nwhitehouse puts guardrails on the AI's tool-using loop

A small safety layer that stops the assistant from spinning its wheels when it gets stuck calling the same tools over and over.

infrastructureworkflow

Anyone who has watched an AI assistant get into a rut - fetching the same document three times, or chaining lookups for a full minute before answering - knows the failure mode. nwhitehouse's fork now ships a controller that watches for three of those patterns: too many tool calls in one turn, the same tool being called with identical inputs again and again, and a turn that has simply run too long.

When any of those trip, the assistant isn't cut off mid-thought. It still receives whatever it just looked up - it's just told, politely, to stop fetching and answer with what it has. Thresholds are tunable, and every escalation gets written to the fork's audit log so operators can see when the brakes were pumped.

So what Legal-ops leads evaluating in-house AI assistants should care: this is the kind of unglamorous reliability work that separates a demo from something a fee-earner will actually trust.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

1 commit from nwhitehouse/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date
22ab8a76 [feat-014] LoopController: bound tool dispatches per turn Nick Whitehouse 2026-05-07 ↗ GitHub
commit body
Wraps the runTools callback in runLLMStream with a small controller that
escalates on three triggers (first one wins):
  - MAX_STEPS_EXCEEDED   - total tool dispatches >= 12 (env: OLAVA_MAX_STEPS)
  - REPEATED_TOOL_CALL   - same name+args 3× (env: OLAVA_MAX_REPEATED_CALLS)
  - WALL_CLOCK_EXCEEDED  - > 60s since turn start (env: OLAVA_WALL_CLOCK_MS)

On escalation the controller appends a "stop calling tools and synthesise
the best answer you can" note to every tool result in the batch. The model
still receives the data it just fetched - we only ask it to stop reaching
for more. Combined with the existing maxIterations: 10 in streamChatWithTools
this is belt-and-suspenders against runaway loops.

Also emits a loop.escalated event to the feat-015 audit log so post-hoc
"why did this turn behave weirdly?" questions are answerable from SQL.

Class is independent of chat code - 7 unit tests in loopController.test.ts
cover each trigger + the negative case where args differ + currentStep
accounting + escalation note formatting. All 16 backend tests pass
(7 new + 9 existing security regressions).

Stacked on feat-015 (uses recordEvent for the audit row). Will rebase
cleanly onto main after feat-017 + feat-015 land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-116.md from inside the repo you want the changes in.

⬇ Download capture-thread-116.md