nwhitehouse/mike@eaef8912

What's in this commit:
- backend/.env.example       - OLAVA_THINKING_MODE (off|low|standard,
                               default standard), OLAVA_MAX_TOKENS
                               (default 8192, was 16384),
                               OLAVA_COMPLETION_MAX_TOKENS (2048).
- backend/src/lib/llm/olava.ts - Qwen3 thinking control via vLLM
                                 `chat_template_kwargs.enable_thinking`.
                                 In low/off mode also appends a /no_think
                                 hint to the system prompt. Caller-passed
                                 `enableThinking: false` forces low mode
                                 regardless of env (used by helper calls).
- backend/src/lib/chatTools.ts - adds a "REASONING BUDGET: keep internal
                                 analysis brief and targeted" line to the
                                 chat system prompt as soft guidance.
- backend/src/lib/research/{queryExpander,triage}.ts - non-interactive
                                 helper calls opt out of thinking
                                 (enableThinking: false) so a 5-word
                                 search-query rewrite doesn't burn 4000
                                 tokens reasoning first.
- frontend/.../AssistantMessage.tsx - thinking card collapsed by default,
                                      readable spacing, markdown-aware
                                      reasoning rendering, bounded scroll
                                      area so long reasoning doesn't
                                      dominate the message.

Defaults take effect immediately on deploy. To disable Qwen reasoning
entirely (snappier, no <think> block), set OLAVA_THINKING_MODE=low in
the Railway env. No code change needed.

Removed from earlier draft: the OLAVA_REASONING_DISPLAY_CHAR_LIMIT cap +
"[Thought process truncated by display limit.]" marker. The collapsed-
by-default UI handles "hide so much of the read out" without a hard
backend truncation; the marker was ugly when it appeared.

Backlog entries for bug-008 (assistant thinking output noisy) and
feat-019 added. Rebased onto main post-Sprint-3 so feat-017's
tool_call_id / tool_calls preservation in olava.ts is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Repository	nwhitehouse/mike
Author	Nick Whitehouse <nick.whitehouse@mccarthyfinch.com>
Authored	2026-05-07T15:57:41+12:00
Committed	2026-05-07T17:51:23+12:00
Parents	`0d9be7cb`
Stats	7 files changed , +203 , -66
Part of	Thinking controls + collapsed reasoning UI (feat-019)

[feat-019] Thinking controls + collapsed reasoning UI

Capture this commit into my fork