nwhitehouse puts a leash on the assistant's tool-calling

A new guardrail stops the AI from spinning in circles when a tool keeps failing, and steers it toward an answer instead of a timeout.

workflowinfrastructure

nwhitehouse added a watchdog that counts how many times the assistant reaches for a tool inside a single turn. If it takes too many steps, repeats the exact same call over and over, or simply runs too long, the watchdog steps in.

What makes this thoughtful is how it intervenes. Rather than cutting the assistant off mid-thought, it hands back the data already gathered along with a nudge to stop fetching and write a final answer. So you get a partial-but-real response instead of an error screen. Every intervention is logged as an audit event, and the limits can be loosened for heavier research-style sessions that legitimately need to call more tools. The defaults are tuned for ordinary questions that touch one to three tools.

So what Anyone relying on a legal AI assistant in live work should care: this is the difference between a usable answer and a hung session when the model gets stuck.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

1 commit from nwhitehouse/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date
22ab8a76 [feat-014] LoopController: bound tool dispatches per turn Nick Whitehouse 2026-05-07 ↗ GitHub
commit body
Wraps the runTools callback in runLLMStream with a small controller that
escalates on three triggers (first one wins):
  - MAX_STEPS_EXCEEDED   - total tool dispatches >= 12 (env: OLAVA_MAX_STEPS)
  - REPEATED_TOOL_CALL   - same name+args 3× (env: OLAVA_MAX_REPEATED_CALLS)
  - WALL_CLOCK_EXCEEDED  - > 60s since turn start (env: OLAVA_WALL_CLOCK_MS)

On escalation the controller appends a "stop calling tools and synthesise
the best answer you can" note to every tool result in the batch. The model
still receives the data it just fetched - we only ask it to stop reaching
for more. Combined with the existing maxIterations: 10 in streamChatWithTools
this is belt-and-suspenders against runaway loops.

Also emits a loop.escalated event to the feat-015 audit log so post-hoc
"why did this turn behave weirdly?" questions are answerable from SQL.

Class is independent of chat code - 7 unit tests in loopController.test.ts
cover each trigger + the negative case where args differ + currentStep
accounting + escalation note formatting. All 16 backend tests pass
(7 new + 9 existing security regressions).

Stacked on feat-015 (uses recordEvent for the audit row). Will rebase
cleanly onto main after feat-017 + feat-015 land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-116.md from inside the repo you want the changes in.

⬇ Download capture-thread-116.md