Phase 5: Excel I/O - xlsx/xls/xlsm/csv ingestion + per-cell citations + XlsxView

↗ view on GitHub · Scott Rozen · 2026-05-15 · 2ac696ce

Spreadsheets are now first-class documents in GordonOSS.

## Backend
- Born `backend/src/lib/extractors/` per CLAUDE.md deterministic-first rule:
  `xlsx.ts` (ExcelJS, numfmt-formatted values, formula preservation, merged
  ranges) and `csv.ts` (RFC 4180 handwritten parser) - both pure/side-effect-free
  with 10 new unit tests.
- `documents.ts`: accept xlsx/xls/xlsm/csv; xls→xlsx normalization via
  libreoffice-convert; spreadsheets skip PDF conversion; structure tree
  lists sheet names.
- `convert.ts`: `xlsToXlsx()` helper.
- `documentReading.ts`: xlsx/csv branch calls extractor+flattener; citation
  reminder appended with spreadsheet cell-address guidance when file_type is a
  spreadsheet.
- `chatTools.ts`: system prompt extended with spreadsheet citation form;
  `normalizeCitation` preserves `Sheet!Cell` strings in the `page` field.
- `models.ts`: Gemma 4 31B added as default (higher free-tier quota than
  Gemini Flash); `providerForModel` routes `gemma-*` through Gemini adapter.
- Removed `freeTierGuard.ts` and its test - guard was blocking real documents
  from free-tier Gemini. Data-privacy tier guard redesign deferred to CLAUDE.md
  "Future capabilities".
- Chat error routes now surface real `err.message` in dev instead of generic
  "Stream error".

## Frontend
- `XlsxView.tsx` (new): sheet tabs, sticky column-letter header + row-number
  gutter, read-only formula bar (cell address chip + formula/value), numfmt-
  formatted display, click-to-select, citation jump + 2.5s yellow highlight.
- `DocPanel.tsx`, `DocViewModal.tsx`, chat `page.tsx`: route xlsx/csv to
  XlsxView ahead of DocxView/DocView.
- `types.ts`: `CitationQuote.cellRef`; `expandCitationToEntries` routes
  Sheet!Cell strings; `formatCitationPage` shows cell ref verbatim.
- `exportToExcel.ts`: per-cell ExcelJS comments containing citation list.
- Upload `accept` extended to xlsx/xls/xlsm/csv in all five upload sites.
- `ModelToggle.tsx`: Gemma 4 31B added at top of Google group; set as default.
- `DocxView.tsx`: childNodes crash demoted to warn + inline fallback.
- `CLAUDE.md`: editable formula bar, generate_xlsx tool, data-privacy tier
  guard added as future capabilities.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Repository Archibald312/GordonOSS
Author Scott Rozen <scott.n.nunez@gmail.com>
Authored
Parents 8a33897e
Stats 42 files changed , +2066 , -298
Part of Phase 5 - Excel I/O ingestion + per-cell citations + XlsxView

Capture this commit into my fork

Download a Markdown prompt that tells Claude how to port this exact commit into your working tree. Run it via claude -p < capture-commit-2ac696ce.md from inside the repo you want the change in.

⬇ Download capture-commit-2ac696ce.md