nwhitehouse puts the AI's inner monologue on a diet

The fork now lets the underlying model think hard, think lightly, or skip thinking entirely - and hides the messy reasoning by default.

chat-uiinfrastructure

Reasoning models like the one nwhitehouse uses can spend thousands of words "thinking out loud" before answering. That's useful for a hard legal question, wasteful when the system is just rewriting a five-word search query in the background. This update adds three dials: a global thinking mode (off, low, or standard), tighter token budgets, and a per-task override so small internal calls - like query rewrites and triage - skip the reasoning step entirely.

On the user side, the long reasoning readout that used to dominate each assistant message is now tucked into a collapsed card. Click to expand if you want to audit how the model got there; otherwise the answer leads.

So what For anyone running a reasoning-model chat product, this is the cleanest pattern yet for keeping costs and UI noise down without giving up the smarts on the questions that matter.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

1 commit from nwhitehouse/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date
eaef8912 [feat-019] Thinking controls + collapsed reasoning UI Nick Whitehouse 2026-05-07 ↗ GitHub
commit body
What's in this commit:
- backend/.env.example       - OLAVA_THINKING_MODE (off|low|standard,
                               default standard), OLAVA_MAX_TOKENS
                               (default 8192, was 16384),
                               OLAVA_COMPLETION_MAX_TOKENS (2048).
- backend/src/lib/llm/olava.ts - Qwen3 thinking control via vLLM
                                 `chat_template_kwargs.enable_thinking`.
                                 In low/off mode also appends a /no_think
                                 hint to the system prompt. Caller-passed
                                 `enableThinking: false` forces low mode
                                 regardless of env (used by helper calls).
- backend/src/lib/chatTools.ts - adds a "REASONING BUDGET: keep internal
                                 analysis brief and targeted" line to the
                                 chat system prompt as soft guidance.
- backend/src/lib/research/{queryExpander,triage}.ts - non-interactive
                                 helper calls opt out of thinking
                                 (enableThinking: false) so a 5-word
                                 search-query rewrite doesn't burn 4000
                                 tokens reasoning first.
- frontend/.../AssistantMessage.tsx - thinking card collapsed by default,
                                      readable spacing, markdown-aware
                                      reasoning rendering, bounded scroll
                                      area so long reasoning doesn't
                                      dominate the message.

Defaults take effect immediately on deploy. To disable Qwen reasoning
entirely (snappier, no <think> block), set OLAVA_THINKING_MODE=low in
the Railway env. No code change needed.

Removed from earlier draft: the OLAVA_REASONING_DISPLAY_CHAR_LIMIT cap +
"[Thought process truncated by display limit.]" marker. The collapsed-
by-default UI handles "hide so much of the read out" without a hard
backend truncation; the marker was ugly when it appeared.

Backlog entries for bug-008 (assistant thinking output noisy) and
feat-019 added. Rebased onto main post-Sprint-3 so feat-017's
tool_call_id / tool_calls preservation in olava.ts is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-125.md from inside the repo you want the changes in.

⬇ Download capture-thread-125.md