[bug-007] Tabular generate as durable job + worker pool (5K-10K-doc scale)

↗ view on GitHub · Nick Whitehouse · 2026-05-07 · a663f8df

Replaces the previous SSE-streaming generate handler with a durable job
table + in-process worker pool + frontend polling. The previous design
worked at 200-doc scale but fell over at the actual product target
(5K-10K-doc tabular projects):
  - vLLM cannot serve N concurrent inference requests at that scale
  - the request handler tied up an Express worker for hours
  - SSE was a single point of failure (proxy idle, browser tab close,
    backend restart all killed the run with no recovery)
  - in-flight progress was lost on a restart

Schema (migration 005):
  - tabular_jobs(id, review_id, status, total_items, started_at,
    completed_at, cancel_requested_at, error, ...)
  - tabular_job_items(id, job_id, document_id, status, attempt_count,
    lease_expires_at, error, ...)
  - claim_tabular_job_item(lease_seconds) RPC: atomic worker claim via
    FOR UPDATE SKIP LOCKED. Multi-instance safe.
  - RLS via the existing can_access_review() predicate.

Backend:
  - lib/tabularJobs.ts: extraction + LLM helpers moved here so the
    worker doesn't import a route file (no circular deps); added
    createGenerateJob, claimNextItem, processOneJobItem,
    maybeFinalizeJob, TabularWorkerPool.
  - routes/tabular.ts: POST /generate now creates a job and returns
    immediately. New endpoints: GET /jobs/:id, GET /jobs/:id/cells,
    POST /jobs/:id/cancel, GET /reviews/:id/active-job.
  - index.ts: TabularWorkerPool started after app.listen(); SIGTERM/
    SIGINT shutdown stops the loops gracefully (in-flight items
    expire their lease and the next worker reclaims them).

Frontend:
  - mikeApi.ts: removed streamTabularGeneration; added
    startTabularGenerate, getTabularJob, getTabularJobCells,
    cancelTabularJob, getActiveTabularJob.
  - TabularReviewView.tsx: EventSource reader replaced with a
    pollJob loop that surfaces 12/200 progress live and resumes
    automatically on remount via getActiveTabularJob.

Env knobs: TABULAR_GENERATE_CONCURRENCY (workers, default 10),
TABULAR_JOB_LEASE_SECONDS (300), TABULAR_WORKER_IDLE_MS (500),
NEXT_PUBLIC_TABULAR_POLL_MS (1500).

Verified:
  - tsc --noEmit clean (backend + frontend)
  - all 16 backend tests pass (no regressions)
  - migration applied to local Supabase; 6 RLS policies + claim RPC
    + tables + indexes in place.

Known limitation: in-flight cells (worker mid-way through one doc's
columns) aren't surfaced to the frontend until the item reaches a
terminal state. Matches the user's mental model that a doc's row of
cells appears together when its turn finishes. If live per-cell
streaming becomes a requirement, add tabular_cells.updated_at + a
separate query path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Repository nwhitehouse/mike
Author Nick Whitehouse <nick.whitehouse@mccarthyfinch.com>
Authored
Parents 22ab8a76
Stats 7 files changed , +1650 , -475
Part of Tabular generate as durable job + worker pool

Capture this commit into my fork

Download a Markdown prompt that tells Claude how to port this exact commit into your working tree. Run it via claude -p < capture-commit-a663f8df.md from inside the repo you want the change in.

⬇ Download capture-commit-a663f8df.md