[bug-002] Recover tool calls when vLLM streaming drops the payload

↗ view on GitHub · Nick Whitehouse · 2026-05-03 · e772ac55

feat-001's premise was that the Olava LoRA's custom tool-call markup
(<tool_call><function=...><parameter=...>) would arrive in delta.content
during streaming, where parseCustomToolCall could extract it. Verified
against the live RunPod endpoint with a "What's the latest court opinion
involving AI" query: vLLM finishes with finish_reason="tool_calls" but
neither populates delta.tool_calls (accCalls=0) nor includes the markup
in delta.content (raw text comes through as just "\n\n"). The tool-call
info just disappears in streaming mode for this LoRA.

Fix: when streaming finishes with finish_reason="tool_calls" but no
tool call extracted from either channel, re-issue the iter as a single
non-streaming request and parse the markup from message.content. One
extra request per tool-using iter, only on iter 0. Iter 1+ (the prose
answer iters that come after the tool runs) stream normally - that's
where the streaming win actually lives.

Net behaviour:
- Tool-free turns: stream tokens (feat-001 win preserved).
- Tool-using turns iter 0: ~2s of dead air to detect + recover the
  call. Same as the original always-non-stream behaviour.
- Tool-using turns iter 1+: stream prose tokens to the user.

OLAVA_FORCE_NONSTREAM_TOOLS=true escape hatch from feat-001 still
works if the recovery itself misbehaves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Repository nwhitehouse/mike
Author Nick Whitehouse <nick.whitehouse@mccarthyfinch.com>
Authored
Parents 1271d07c
Stats 1 file changed , +99
Part of Olava streaming + tool-call recovery

Capture this commit into my fork

Download a Markdown prompt that tells Claude how to port this exact commit into your working tree. Run it via claude -p < capture-commit-e772ac55.md from inside the repo you want the change in.

⬇ Download capture-commit-e772ac55.md