Five-pass research pipeline for a small reasoning model

nwhitehouse built a full multi-step research orchestrator on top of the Olava-001 vLLM endpoint - query expansion, parallel search fan-out, triage, per-result extraction, and streaming synthesis, all coordinated by a 302-line pipeline with hard caps of 25 Olava calls and 45 seconds wall-clock. Tested end-to-end against real CourtListener data, it produced a 2KB synthesis with inline citations where the single-pass path emitted only whitespace.

searchchat-ui

The implementation lives in eight new files under backend/src/lib/research/. Each pass is its own module: queryExpander.ts turns the user's question into 3-6 specialised queries; searchFanOut.ts runs legal and web searches in parallel and deduplicates by URL; triage.ts picks the top-N most relevant results; extractor.ts fires parallel Olava calls to produce 2-3 sentence tailored summaries for each result; synthesizer.ts streams the final markdown answer. budget.ts enforces the call and wall-clock caps and reports which one tripped - callers get a research.cap_hit SSE event and whatever synthesis has completed so far rather than an abort.

chat.ts auto-routes to the orchestrator whenever sources.legal is non-empty or sources.web is true. That trigger is fork-specific to the nwhitehouse UI source-selector. The shape of the pipeline is more general and separates cleanly by concern.

On the frontend, a new onit-status-icon.tsx Drift Grid ripple loader cross-fades to the Olava logo when streaming stops. AssistantMessage.tsx gains a ResearchStepBlock showing dot + label + metadata per step ("top 5 selected"), and ReferenceBlock renders results numbered and indented under "Ranked results" with a continuous vertical connector. Two bug fixes landed in the same commit: deduplication of research_step events by key on reload (so a refresh doesn't show both running and done states for the same step), and a guard dropping empty-whitespace content events that were splitting the UI into duplicate "Completed in N steps" cards.

The Olava adapter was extended to forward delta.reasoning and delta.reasoning_content via a new onReasoningDelta callback so the UI can show a live "Thinking..." indicator during long model think-times. The reasoning-forwarding note in the diff is worth reading: a prior security commit had dropped these fields from output because of PII concerns about backend logs, but nwhitehouse's read is that surfacing them to the user's own UI is within scope.

So what Worth a close look if you're building a multi-step research agent over a small reasoning model. The per-pass file split is clean and the budget/SSE patterns transfer directly. Skip it if you need an Anthropic or Gemini backend - this pipeline is tuned for Olava-001's reasoning token budget and the specific `sources.legal`/`sources.web` trigger logic is nwhitehouse-specific. Don't import wholesale; port the pattern.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

1 commit from nwhitehouse/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date

a4adcbf3 [feat-005] Multi-pass research orchestrator + UI integration Nick Whitehouse 2026-05-04 ↗ GitHub

SHA	Subject	Author	Date
`a4adcbf3`	[feat-005] Multi-pass research orchestrator + UI integration	Nick Whitehouse	2026-05-04	↗ GitHub
commit body Trigger: chat.ts auto-routes to the orchestrator when the user has any research source selected (sources.legal non-empty OR sources.web=true). Five-pass pipeline with budget enforcement (≤25 Olava calls, ≤45s wall), designed fresh for Olava-001 (Qwen3.6 + LoRA) per the SLM-cost-advantage strategy. work___'s services/{orchestrator,sub_agent,loop_controller}.py served as architectural reference, not a port. Backend (new) - backend/src/lib/research/types.ts - shared event/result types - backend/src/lib/research/budget.ts - call counter + wall-clock cap - backend/src/lib/research/queryExpander.ts - pass 1: 1 Olava call → 3-6 specialised queries - backend/src/lib/research/searchFanOut.ts - pass 2: parallel legal/web searches, dedupe by URL - backend/src/lib/research/triage.ts - pass 3: 1 Olava call → top-N most relevant - backend/src/lib/research/extractor.ts - pass 4: N parallel Olava calls → tailored extracts - backend/src/lib/research/synthesizer.ts - pass 5: streaming Olava call → markdown answer - backend/src/lib/research/orchestrator.ts - pipeline coordinator + SSE event emission Backend (modified) - routes/chat.ts: auto-detect research mode and route to runResearchOrchestrator - lib/llm/olava.ts: forward delta.reasoning(_content) via onReasoningDelta so the UI shows a live "Thinking..." indicator during long Olava think-times (rather than dead air). Persisted as part of chat_messages - same scope as the response itself. Frontend (new) - src/components/chat/onit-status-icon.tsx - Drift Grid 3×3 ripple loader (ported from work___ UI System UPGRADE/loaders.jsx) cross-fades to the Onit O logo (#00112c) when streaming stops. Replaces MikeIcon in the assistant ResponseStatus. - globals.css: gridPulse keyframe. Frontend (modified) - AssistantMessage.tsx - ResearchStepBlock: dot + label + meta detail (e.g. "top 5 selected"), rendered inline inside the existing PreResponseWrapper alongside other tool chatter. Wrapper auto-collapses to "Completed in N steps" once synthesis content arrives. - ReferenceBlock: numbered (1. 2. 3.) and indented under "Ranked results", with a continuous parent-x vertical line drawing through them so they visually nest under the parent step. No per-reference dots or in-between connectors. Click opens URL in new tab. - bug-003 obsoleted: reference_added events now flow through the wrapper again (single-pass mode no longer fires search tools - that's all research-mode now where synthesis is reliably non-empty). - Empty/whitespace-only content events no longer split wrappers (Olava sometimes emits "\n\n" between reasoning blocks; without this guard the UI breaks into two separate "Completed in N steps" cards). - shared/types.ts: AssistantEvent extended with research_step + sources.web. - hooks/useAssistantChat.ts: research_step SSE handler dedupes by key in the events array (so reload doesn't show running+done duplicates of the same step). Transient research.* events are kept out of persistence; research_step drives the UI. Process - backlog.md: full feat-005 design + feat-006 (citation reliability via add_citation tool - defer per testing notes 2026-05-04) appended. Smoke-tested end-to-end against real Olava + Brave + CourtListener with "What is the latest court case where a lawyer had misused AI in court?": ~30s wall, 16 events, 2KB synthesis with inline [Title](URL) citations to the April 2026 Sullivan & Cromwell case (vs single-pass Olava emitting just "\n\n" with the same prompt). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

commit body

Trigger: chat.ts auto-routes to the orchestrator when the user has any
research source selected (sources.legal non-empty OR sources.web=true).
Five-pass pipeline with budget enforcement (≤25 Olava calls, ≤45s wall),
designed fresh for Olava-001 (Qwen3.6 + LoRA) per the SLM-cost-advantage
strategy. work___'s services/{orchestrator,sub_agent,loop_controller}.py
served as architectural reference, not a port.

Backend (new)
- backend/src/lib/research/types.ts - shared event/result types
- backend/src/lib/research/budget.ts - call counter + wall-clock cap
- backend/src/lib/research/queryExpander.ts - pass 1: 1 Olava call → 3-6 specialised queries
- backend/src/lib/research/searchFanOut.ts - pass 2: parallel legal/web searches, dedupe by URL
- backend/src/lib/research/triage.ts - pass 3: 1 Olava call → top-N most relevant
- backend/src/lib/research/extractor.ts - pass 4: N parallel Olava calls → tailored extracts
- backend/src/lib/research/synthesizer.ts - pass 5: streaming Olava call → markdown answer
- backend/src/lib/research/orchestrator.ts - pipeline coordinator + SSE event emission

Backend (modified)
- routes/chat.ts: auto-detect research mode and route to runResearchOrchestrator
- lib/llm/olava.ts: forward delta.reasoning(_content) via onReasoningDelta
so the UI shows a live "Thinking..." indicator during long Olava think-times
(rather than dead air). Persisted as part of chat_messages - same scope as
the response itself.

Frontend (new)
- src/components/chat/onit-status-icon.tsx - Drift Grid 3×3 ripple loader
(ported from work___ UI System UPGRADE/loaders.jsx) cross-fades to the
Onit O logo (#00112c) when streaming stops. Replaces MikeIcon in the
assistant ResponseStatus.
- globals.css: gridPulse keyframe.

Frontend (modified)
- AssistantMessage.tsx
- ResearchStepBlock: dot + label + meta detail (e.g. "top 5 selected"),
rendered inline inside the existing PreResponseWrapper alongside other
tool chatter. Wrapper auto-collapses to "Completed in N steps" once
synthesis content arrives.
- ReferenceBlock: numbered (1. 2. 3.) and indented under "Ranked results",
with a continuous parent-x vertical line drawing through them so they
visually nest under the parent step. No per-reference dots or in-between
connectors. Click opens URL in new tab.
- bug-003 obsoleted: reference_added events now flow through the wrapper
again (single-pass mode no longer fires search tools - that's all
research-mode now where synthesis is reliably non-empty).
- Empty/whitespace-only content events no longer split wrappers (Olava
sometimes emits "\n\n" between reasoning blocks; without this guard the
UI breaks into two separate "Completed in N steps" cards).
- shared/types.ts: AssistantEvent extended with research_step + sources.web.
- hooks/useAssistantChat.ts: research_step SSE handler dedupes by key in
the events array (so reload doesn't show running+done duplicates of the
same step). Transient research.* events are kept out of persistence;
research_step drives the UI.

Process
- backlog.md: full feat-005 design + feat-006 (citation reliability via
add_citation tool - defer per testing notes 2026-05-04) appended.

Smoke-tested end-to-end against real Olava + Brave + CourtListener with
"What is the latest court case where a lawyer had misused AI in court?":
~30s wall, 16 events, 2KB synthesis with inline [Title](URL) citations to
the April 2026 Sullivan & Cromwell case (vs single-pass Olava emitting
just "\n\n" with the same prompt).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-124.md from inside the repo you want the changes in.

⬇ Download capture-thread-124.md