nwhitehouse rebuilds Mike's bulk-document engine for scale

Tabular review - the bit that runs the same question across thousands of documents - now survives tab closes, backend restarts, and the kind of proxy timeouts that used to kill a four-hour run.

discoveryinfrastructure

The old setup held a single live connection open between browser and server for the entire run. Close the tab, lose your wifi, restart the server, or hit a proxy timeout - the whole job died and you started over. Fine for 50 documents; useless at the 5,000-10,000 scale nwhitehouse is targeting.

The rewrite turns each run into a durable job stored in the database, with a pool of workers picking up documents one at a time and a lease mechanism so nothing gets dropped or double-processed if a worker dies mid-task. The frontend now polls for progress and reconnects automatically. One tradeoff the author flags openly: you no longer see cells fill in word-by-word - each document's row appears as a block when its worker finishes.

So what Anyone running Mike against a real document set rather than a demo folder should be watching this - it's the difference between a tool that works on a deck and one that survives an actual matter.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

1 commit from nwhitehouse/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA Subject Author Date
a663f8df [bug-007] Tabular generate as durable job + worker pool (5K-10K-doc scale) Nick Whitehouse 2026-05-07 ↗ GitHub
commit body
Replaces the previous SSE-streaming generate handler with a durable job
table + in-process worker pool + frontend polling. The previous design
worked at 200-doc scale but fell over at the actual product target
(5K-10K-doc tabular projects):
  - vLLM cannot serve N concurrent inference requests at that scale
  - the request handler tied up an Express worker for hours
  - SSE was a single point of failure (proxy idle, browser tab close,
    backend restart all killed the run with no recovery)
  - in-flight progress was lost on a restart

Schema (migration 005):
  - tabular_jobs(id, review_id, status, total_items, started_at,
    completed_at, cancel_requested_at, error, ...)
  - tabular_job_items(id, job_id, document_id, status, attempt_count,
    lease_expires_at, error, ...)
  - claim_tabular_job_item(lease_seconds) RPC: atomic worker claim via
    FOR UPDATE SKIP LOCKED. Multi-instance safe.
  - RLS via the existing can_access_review() predicate.

Backend:
  - lib/tabularJobs.ts: extraction + LLM helpers moved here so the
    worker doesn't import a route file (no circular deps); added
    createGenerateJob, claimNextItem, processOneJobItem,
    maybeFinalizeJob, TabularWorkerPool.
  - routes/tabular.ts: POST /generate now creates a job and returns
    immediately. New endpoints: GET /jobs/:id, GET /jobs/:id/cells,
    POST /jobs/:id/cancel, GET /reviews/:id/active-job.
  - index.ts: TabularWorkerPool started after app.listen(); SIGTERM/
    SIGINT shutdown stops the loops gracefully (in-flight items
    expire their lease and the next worker reclaims them).

Frontend:
  - mikeApi.ts: removed streamTabularGeneration; added
    startTabularGenerate, getTabularJob, getTabularJobCells,
    cancelTabularJob, getActiveTabularJob.
  - TabularReviewView.tsx: EventSource reader replaced with a
    pollJob loop that surfaces 12/200 progress live and resumes
    automatically on remount via getActiveTabularJob.

Env knobs: TABULAR_GENERATE_CONCURRENCY (workers, default 10),
TABULAR_JOB_LEASE_SECONDS (300), TABULAR_WORKER_IDLE_MS (500),
NEXT_PUBLIC_TABULAR_POLL_MS (1500).

Verified:
  - tsc --noEmit clean (backend + frontend)
  - all 16 backend tests pass (no regressions)
  - migration applied to local Supabase; 6 RLS policies + claim RPC
    + tables + indexes in place.

Known limitation: in-flight cells (worker mid-way through one doc's
columns) aren't surfaced to the frontend until the item reaches a
terminal state. Matches the user's mental model that a doc's row of
cells appears together when its turn finishes. If live per-cell
streaming becomes a requirement, add tabular_cells.updated_at + a
separate query path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-150.md from inside the repo you want the changes in.

⬇ Download capture-thread-150.md