[bug-007] Tabular generate as durable job + worker pool (5K-10K-doc scale)
Replaces the previous SSE-streaming generate handler with a durable job
table + in-process worker pool + frontend polling. The previous design
worked at 200-doc scale but fell over at the actual product target
(5K-10K-doc tabular projects):
- vLLM cannot serve N concurrent inference requests at that scale
- the request handler tied up an Express worker for hours
- SSE was a single point of failure (proxy idle, browser tab close,
backend restart all killed the run with no recovery)
- in-flight progress was lost on a restart
Schema (migration 005):
- tabular_jobs(id, review_id, status, total_items, started_at,
completed_at, cancel_requested_at, error, ...)
- tabular_job_items(id, job_id, document_id, status, attempt_count,
lease_expires_at, error, ...)
- claim_tabular_job_item(lease_seconds) RPC: atomic worker claim via
FOR UPDATE SKIP LOCKED. Multi-instance safe.
- RLS via the existing can_access_review() predicate.
Backend:
- lib/tabularJobs.ts: extraction + LLM helpers moved here so the
worker doesn't import a route file (no circular deps); added
createGenerateJob, claimNextItem, processOneJobItem,
maybeFinalizeJob, TabularWorkerPool.
- routes/tabular.ts: POST /generate now creates a job and returns
immediately. New endpoints: GET /jobs/:id, GET /jobs/:id/cells,
POST /jobs/:id/cancel, GET /reviews/:id/active-job.
- index.ts: TabularWorkerPool started after app.listen(); SIGTERM/
SIGINT shutdown stops the loops gracefully (in-flight items
expire their lease and the next worker reclaims them).
Frontend:
- mikeApi.ts: removed streamTabularGeneration; added
startTabularGenerate, getTabularJob, getTabularJobCells,
cancelTabularJob, getActiveTabularJob.
- TabularReviewView.tsx: EventSource reader replaced with a
pollJob loop that surfaces 12/200 progress live and resumes
automatically on remount via getActiveTabularJob.
Env knobs: TABULAR_GENERATE_CONCURRENCY (workers, default 10),
TABULAR_JOB_LEASE_SECONDS (300), TABULAR_WORKER_IDLE_MS (500),
NEXT_PUBLIC_TABULAR_POLL_MS (1500).
Verified:
- tsc --noEmit clean (backend + frontend)
- all 16 backend tests pass (no regressions)
- migration applied to local Supabase; 6 RLS policies + claim RPC
+ tables + indexes in place.
Known limitation: in-flight cells (worker mid-way through one doc's
columns) aren't surfaced to the frontend until the item reaches a
terminal state. Matches the user's mental model that a doc's row of
cells appears together when its turn finishes. If live per-cell
streaming becomes a requirement, add tabular_cells.updated_at + a
separate query path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Repository | nwhitehouse/mike |
|---|---|
| Author | Nick Whitehouse <nick.whitehouse@mccarthyfinch.com> |
| Authored | |
| Parents | 22ab8a76 |
| Stats | 7 files changed , +1650 , -475 |
| Part of | Tabular generate as durable job + worker pool |
Capture this commit into my fork
Download a Markdown prompt that tells Claude how to port this
exact commit into your working tree. Run it via
claude -p < capture-commit-a663f8df.md
from inside the repo you want the change in.