[feat-001] Stream tokens during tool-using turns
Olava previously fell back to a non-streaming request whenever tools were forwarded, because vLLM's tool-call streaming is broken for the LoRA's custom <tool_call><function=...><parameter=...> markup - `delta.tool_calls` arrives empty even when finish_reason is "tool_calls". Fix it client-side: keep streaming on, accumulate raw delta.content, but filter the user-visible stream through a small state machine that hides <think>...</think> blocks and everything after a <tool_call> open tag. Held-back tail handles markup that spans chunk boundaries. After the stream ends, run the existing parseCustomToolCall() on the raw buffer to extract the call and dispatch via runTools - same path the non-streaming branch already used. Also fixes a related bug: the no-tools "streaming" path was buffering the entire response and emitting one giant onContentDelta at the end. Now genuinely per-token in both paths. Emergency rollback available via OLAVA_FORCE_NONSTREAM_TOOLS=true. Adds backlog.md to track the sprint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Repository | nwhitehouse/mike |
|---|---|
| Author | Nick Whitehouse <nick.whitehouse@mccarthyfinch.com> |
| Authored | |
| Parents | 325ff20c |
| Stats | 2 files changed , +245 , -16 |
| Part of | Olava streaming + tool-call recovery |
Capture this commit into my fork
Download a Markdown prompt that tells Claude how to port this
exact commit into your working tree. Run it via
claude -p < capture-commit-6d2aac9e.md
from inside the repo you want the change in.