[feat-001] Stream tokens during tool-using turns

↗ view on GitHub · Nick Whitehouse · 2026-05-03 · 6d2aac9e

Olava previously fell back to a non-streaming request whenever tools were
forwarded, because vLLM's tool-call streaming is broken for the LoRA's
custom <tool_call><function=...><parameter=...> markup - `delta.tool_calls`
arrives empty even when finish_reason is "tool_calls".

Fix it client-side: keep streaming on, accumulate raw delta.content, but
filter the user-visible stream through a small state machine that hides
<think>...</think> blocks and everything after a <tool_call> open tag.
Held-back tail handles markup that spans chunk boundaries. After the stream
ends, run the existing parseCustomToolCall() on the raw buffer to extract
the call and dispatch via runTools - same path the non-streaming branch
already used.

Also fixes a related bug: the no-tools "streaming" path was buffering the
entire response and emitting one giant onContentDelta at the end. Now
genuinely per-token in both paths.

Emergency rollback available via OLAVA_FORCE_NONSTREAM_TOOLS=true.

Adds backlog.md to track the sprint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Repository nwhitehouse/mike
Author Nick Whitehouse <nick.whitehouse@mccarthyfinch.com>
Authored
Parents 325ff20c
Stats 2 files changed , +245 , -16
Part of Olava streaming + tool-call recovery

Capture this commit into my fork

Download a Markdown prompt that tells Claude how to port this exact commit into your working tree. Run it via claude -p < capture-commit-6d2aac9e.md from inside the repo you want the change in.

⬇ Download capture-commit-6d2aac9e.md