nwhitehouse builds a five-pass research engine for a small model

The biggest single feature in this fork: a research pipeline that fires automatically the moment you tick a legal database or the web as a source.

searchchat-ui

Instead of asking one big model one big question, nwhitehouse breaks legal research into five stages - expand the user's question into several sharper queries, fan out searches in parallel across legal databases and the web, triage the hits down to the most relevant, pull tailored extracts from each, then stitch the answer together with inline citations. Hard ceilings keep it honest: no more than 25 model calls and 45 seconds.

The twist is that it's tuned for Olava-001, a small reasoning model - not a frontier one. A new loading animation and live "thinking" indicator paper over the latency, and the chat UI now shows each research step as it happens, with ranked sources threaded underneath. End-to-end, it returns a cited two-kilobyte answer in about thirty seconds; the same prompt to the bare model produced nothing.

So what If you're trying to get useful legal research out of a cheap or local model rather than a frontier one, this is one of the cleaner blueprints to crib from.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

1 commit from nwhitehouse/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date
a4adcbf3 [feat-005] Multi-pass research orchestrator + UI integration Nick Whitehouse 2026-05-04 ↗ GitHub
commit body
Trigger: chat.ts auto-routes to the orchestrator when the user has any
research source selected (sources.legal non-empty OR sources.web=true).
Five-pass pipeline with budget enforcement (≤25 Olava calls, ≤45s wall),
designed fresh for Olava-001 (Qwen3.6 + LoRA) per the SLM-cost-advantage
strategy. work___'s services/{orchestrator,sub_agent,loop_controller}.py
served as architectural reference, not a port.

Backend (new)
- backend/src/lib/research/types.ts        - shared event/result types
- backend/src/lib/research/budget.ts       - call counter + wall-clock cap
- backend/src/lib/research/queryExpander.ts - pass 1: 1 Olava call → 3-6 specialised queries
- backend/src/lib/research/searchFanOut.ts - pass 2: parallel legal/web searches, dedupe by URL
- backend/src/lib/research/triage.ts        - pass 3: 1 Olava call → top-N most relevant
- backend/src/lib/research/extractor.ts     - pass 4: N parallel Olava calls → tailored extracts
- backend/src/lib/research/synthesizer.ts   - pass 5: streaming Olava call → markdown answer
- backend/src/lib/research/orchestrator.ts  - pipeline coordinator + SSE event emission

Backend (modified)
- routes/chat.ts: auto-detect research mode and route to runResearchOrchestrator
- lib/llm/olava.ts: forward delta.reasoning(_content) via onReasoningDelta
  so the UI shows a live "Thinking..." indicator during long Olava think-times
  (rather than dead air). Persisted as part of chat_messages - same scope as
  the response itself.

Frontend (new)
- src/components/chat/onit-status-icon.tsx - Drift Grid 3×3 ripple loader
  (ported from work___ UI System UPGRADE/loaders.jsx) cross-fades to the
  Onit O logo (#00112c) when streaming stops. Replaces MikeIcon in the
  assistant ResponseStatus.
- globals.css: gridPulse keyframe.

Frontend (modified)
- AssistantMessage.tsx
  - ResearchStepBlock: dot + label + meta detail (e.g. "top 5 selected"),
    rendered inline inside the existing PreResponseWrapper alongside other
    tool chatter. Wrapper auto-collapses to "Completed in N steps" once
    synthesis content arrives.
  - ReferenceBlock: numbered (1. 2. 3.) and indented under "Ranked results",
    with a continuous parent-x vertical line drawing through them so they
    visually nest under the parent step. No per-reference dots or in-between
    connectors. Click opens URL in new tab.
  - bug-003 obsoleted: reference_added events now flow through the wrapper
    again (single-pass mode no longer fires search tools - that's all
    research-mode now where synthesis is reliably non-empty).
  - Empty/whitespace-only content events no longer split wrappers (Olava
    sometimes emits "\n\n" between reasoning blocks; without this guard the
    UI breaks into two separate "Completed in N steps" cards).
- shared/types.ts: AssistantEvent extended with research_step + sources.web.
- hooks/useAssistantChat.ts: research_step SSE handler dedupes by key in
  the events array (so reload doesn't show running+done duplicates of the
  same step). Transient research.* events are kept out of persistence;
  research_step drives the UI.

Process
- backlog.md: full feat-005 design + feat-006 (citation reliability via
  add_citation tool - defer per testing notes 2026-05-04) appended.

Smoke-tested end-to-end against real Olava + Brave + CourtListener with
"What is the latest court case where a lawyer had misused AI in court?":
~30s wall, 16 events, 2KB synthesis with inline [Title](URL) citations to
the April 2026 Sullivan & Cromwell case (vs single-pass Olava emitting
just "\n\n" with the same prompt).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-124.md from inside the repo you want the changes in.

⬇ Download capture-thread-124.md