fix(tabular): enforce per-document access on review flows

✅ merged · #20 · manueljpconde/mikeEU ← manueljpconde/mikeEU · opened 16d ago by manueljpconde · merged 15d ago by manueljpconde · self · +432-27 across 2 files · ↗ on GitHub

From the PR description

Summary

  • Closes a confirmed cross-tenant data exposure path: a logged-in user could submit another user's documents.id to tabular routes and have the backend read those bytes and forward extracted content to an LLM.
  • Adds assertDocumentsAccessible(documentIds, userId, userEmail, db) - a route-local helper that fetches docs in one batched query, loops ensureDocAccess, and fail-closes on missing/denied/DB-error.
  • Wired at four tabular call sites (create, PATCH, regenerate-cell, generate). All denials → 404 (no existence leak).

Classification

trust-boundary - adds a new authorization gate to an existing trust-sensitive route family (backend/src/routes/tabular.ts). Routes accept user input that controls which document bytes the backend reads and forwards to an LLM.

Linked issue

Closes #15. Parent tracker: #12.

Verification

  • npm run lint --prefix frontend: ✗ - 43 errors / 67 warnings, all pre-existing in unrelated files (privacy/page.tsx, support/page.tsx, convert-courts-to-ts.js, etc.). PR is backend-only; net delta 0.
  • npm run build --prefix frontend: ✓ (sanity; no frontend changes)
  • npm run build --prefix backend: ✓ (tsc)
  • npm run test --prefix backend: ✓ - 44/44 (existing 34 + 10 new in tabular.test.ts)
  • npm run test --prefix frontend: ✓ 55/55 (sanity)
  • npm run lint:catalogs --prefix frontend: ✓ 158 × 5 (sanity)

Trust-boundary checks

  • Auth gate: all four affected routes already mount requireAuth; identity from res.locals.userId / res.locals.userEmail only. No new identity reads from req.body / req.query / req.headers.
  • Tenant scoping: helper loops ensureDocAccess(doc, userId, userEmail, db) from backend/src/lib/access.ts. Owner-of-doc passes immediately; otherwise falls through to checkProjectAccess. Existing ensureReviewAccess retained at every entry. The invariant - every document_id that becomes or remains a tabular_cell must pass ensureDocAccess for the current user - holds across create / PATCH (both branches) / regenerate / generate.
  • Storage paths: not touched. No reads of documents.storage_path (deprecated) or writes to document_versions.storage_path.
  • Download links: not touched. No buildDownloadUrl paths modified.
  • Frontend exposure: zero NEXT_PUBLIC_* additions. Backend-only PR. No model SDK imports added.
  • Model registry: backend/src/lib/llm/models.ts and frontend/src/app/components/assistant/ModelToggle.tsx not in diff.
  • Built-in workflows: not touched.
  • Provider quirks: LLM adapters not touched. Validation always precedes provider invocation on regenerate-cell and generate.
  • Citation grammars: not touched.
  • Rate limits: backend/src/index.ts not in diff.
  • Logging: count-only ({ count: docAccess.bad.length }) at all four new call sites. No IDs, no userId, no req.body content.

Behavioural changes

Endpoint New gate(s) before any DB write or LLM call
POST /tabular-review assertDocumentsAccessible(document_ids) → 404 on fail. Validates before the review row is inserted (no orphan reviews).
PATCH /tabular-review/:reviewId Restructured: resolve final documentIds (from body or defaulting branch), assertDocumentsAccessible, then the review metadata update, then cell delete (explicit replacement) + cell insert. A 404 on doc validation now leaves the review's display_name / shared_with / etc. untouched - no partial metadata write on access denial.
POST /tabular-review/:reviewId/regenerate-cell New cell-existence check on (review_id, document_id, column_index) (DB error → 500 "Failed to validate cell"; missing → 404 "Cell not found"). Then assertDocumentsAccessible([document_id]) → 404 "Document not found" on fail.
POST /tabular-review/:reviewId/generate Re-validates the final docs array (covers both the cells-derived branch and the project_id fallback). Runs before flushHeaders(), so a denial cleanly returns 404 without a dirty SSE stream. Catches pre-existing bad rows from before this fix landed.

Rollout

None required.

  • No schema change.
  • No env vars.
  • No .env.example change.
  • No new RLS policy or revoke grant.
  • No frontend change.
  • Backend-only PR, single deploy.
  • Rollback: revert the PR. No DB rows or storage objects modified.

Planning artifacts

Test plan (manual, owed before merge)

Run against a local backend pointed at a test Supabase project. Two test users (User A = doc owner, User B = stranger), plus optionally User C (review-shared but not project-shared) for the PATCH defaulting branch.

  • Create rejected: User B sends document_ids: [<A's doc>] → 404, no tabular_reviews row, count-only log line.
  • Create accepted: User B with own doc → 201.
  • PATCH explicit replacement rejected: User B's review, replacement IDs include A's doc → 404. Verify display_name / shared_with unchanged after the call (regression-protected).
  • PATCH defaulting branch rejected (if fixture setup permits): User C (review-shared, no project access). columns_config-only PATCH on project-bound review → 404 because defaulting docs aren't accessible.
  • PATCH columns-only with own docs → 200, new column-cells created.
  • Regenerate cell-doesn't-exist rejected: fabricated (document_id, column_index) pair → 404 "Cell not found".
  • Regenerate doc-revoked between insert and call → 404 "Document not found" (edge case; only run if your data model permits).
  • Generate rejects pre-existing bad row: SQL-insert a tabular_cells row referencing a foreign document_id, then POST .../generate → 404 before SSE opens, no LLM provider call (verify backend logs / provider dashboard).
  • Generate succeeds on clean review.
  • No regression on legitimate flow: User A end-to-end with own docs → all four routes work as today.

Out of scope (follow-ups)

  • One-time cleanup of pre-existing bad tabular_cells rows - only if evidence surfaces post-deploy. /generate's runtime check covers them otherwise.
  • /clear-cells defensive document_id validation (low priority; deletion can't leak).
  • Broader log-scrubbing across tabular.ts and other routers - tracked under #16.
  • Pre-existing frontend lint debt - separate cleanup PR.

Our analysis

Close cross-tenant document access on tabular review routes — read the full analysis →

Think the analysis missed something the PR description covers?

Capture this PR into my fork

Download a Markdown prompt that tells Claude how to port every commit in this PR into your working tree. Run it via claude -p < capture-pull-20.md from inside the repo you want the changes in.

⬇ Download capture-pull-20.md