LLM policy: admin-driven provider switches, curated local models, IPv6 rate-limit fix

Three commits moving LLM configuration from hardcoded defaults into `org_settings`, fixing an IPv6 rate-limiter bug that logged an error on every backend boot, and bumping rate-limit defaults from debug-obstacle levels to something suited for normal firm use.

infrastructuremulti-tenant

Admin-driven model picker (76030d6f, migration 0019, +1013/-1036 across 23 files). Adds three columns to org_settings: providers_enabled (JSONB master switch per provider, defaults all-on), local_llm_base_url (runtime override; NULL falls back to LOCAL_LLM_BASE_URL env), and local_llm_models (curated [{id,label}] array). A new /admin/llm page lets an admin choose which providers are enabled, optionally override the local endpoint URL at runtime, and control which Ollama models appear in the user picker. Per-user API keys are removed - only env-supplied keys from install.sh / .env.compose are honoured. The hardcoded assertProviderAllowed check in the dispatcher is replaced by assertModelAllowed reading from LlmPolicy loaded off org_settings. On backend boot, if policy.local_llm_base_url is set, the backend writes it into process.env.LOCAL_LLM_BASE_URL so the local adapter picks it up without a restart.

Follow-ups (61689c39). The rate limiter keyGenerator was calling req.ip ?? "anon" when falling back to IP for pre-auth requests. express-rate-limit v8 with IPv6 addresses was logging ERR_ERL_KEY_GEN_IPV6 on every boot and on every pre-auth request from an IPv6 client. The fix: call ipKeyGenerator(req.ip ?? "") from the library, which canonicalises IPv6 addresses to a /64 prefix. This also closes a bypass path: rotating the low 64 bits of an IPv6 address was a real way to exceed the general IP bucket. Migration 0020 drops user_api_keys - the table held per-user AES-256-GCM ciphertext that never escaped the encrypted-at-rest layer, so the changelog records the drop as acceptable.

Rate-limit defaults (af2a0d89). General: 300 → 1500 per 15 min per IP. Chat: 30 → 200 per 15 min per user. Chat-create: 60 → 300 per 15 min per user. Upload: 50 → 300 per hour per user. All four are now documented as commented-out env vars in .env.compose.example so operators can tune without a code change.

So what The IPv6 `ipKeyGenerator` fix is a one-liner with a real security implication - rotating low-order IPv6 bits bypassed naive per-IP buckets - and zero risk to port. The migration of provider config from per-user API keys into `org_settings` makes sense for a firm deployment but is the opposite shape from a product serving multiple tenants; decide which model fits before porting. The rate-limit numbers are firm-specific; don't copy the values, copy the principle that defaults should suit normal load rather than be a debugging convenience. The "only drop a table that held ciphertext that never escaped its layer" bar is a reasonable standard for when a hard drop is safe.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

3 commits from cpatpa/PIP, oldest first. Source extracted verbatim from the harvested git log.

SHA	Subject	Author	Date
`76030d6f`	LLM policy: admin-driven providers + curated local models	Claude	2026-05-16	↗ GitHub
commit body Replaces the hardcoded model picker with a server-driven list keyed on org_settings. Adds /admin/llm so an admin can pick which providers are enabled, override the local LLM base URL at runtime, and curate which Ollama models appear in the picker. The per-user API keys surface is removed; only env-supplied keys (set in install.sh / .env.compose) are honoured. Schema (migration 0019): - org_settings.providers_enabled JSONB (master switch per provider). - org_settings.local_llm_base_url text (runtime override; NULL falls back to the env var). - org_settings.local_llm_models JSONB (curated [{id,label}]). Backend: - New lib/llmPolicy.ts: loadLlmPolicy, availableModels, assertModelAllowed, orgApiKeys. Centralises gating. - New lib/localDiscovery.ts: probes /api/tags on the OpenAI-compat host to list installed Ollama models. - New routes: GET /me/models (filtered list for users), GET/PATCH /admin/llm, POST /admin/llm/refresh-local. - llm/index.ts dispatcher now consults the policy on every streamChatWithTools / completeText. EXTERNAL_AI_DISABLED env still wins for the three external providers. - Boot reads org_settings.local_llm_base_url and sets process.env.LOCAL_LLM_BASE_URL so the local adapter picks it up. - /user/api-keys GET/PUT removed. user_api_keys table left in place for now; a follow-up migration can drop it once we are confident no encrypted data needs preservation. - userSettings.getUserApiKeys now returns env-only keys. - userApiKeys.ts deleted. Frontend: - ModelToggle fetches /me/models on mount, dropping the hardcoded catalogue. Empty list prompts the user to ask an admin. - New /admin/llm page: per-provider toggles, base-URL field, refresh button, curated-model checkboxes. - /account/models page, ApiKeyMissingModal, modelAvailability lib all removed. apiKeyStatus / apiKeys / saveApiKey stripped from pipApi.ts and UserProfileContext. - ChatInput, TabularReviewView, TRChatPanel: drop apiKeys plumbing. Backend rejection is now the only gate. - useSelectedModel: persist whatever the picker emits; ModelToggle reconciles against the live list on mount.
`61689c39`	Follow-ups: IPv6 rate-limit, drop user_api_keys, refresh testing doc	Claude	2026-05-16	↗ GitHub
commit body - Rate limiter keyGenerator now calls ipKeyGenerator from express-rate-limit when falling back to IP, which canonicalises IPv6 addresses to a /64 prefix. Closes the ERR_ERL_KEY_GEN_IPV6 warnings printed on every backend boot since the multer 2 / v8 rate-limit upgrade and prevents IPv6 clients bypassing the IP bucket by rotating low-order bits. - Migration 0020 drops the user_api_keys table. Migration 0019 moved provider configuration to org_settings and the backend no longer reads or writes it; the column held AES-256-GCM ciphertext that never escaped the encrypted-at-rest layer, so a hard drop is acceptable. - docs/safe-local-testing.md rewritten to reflect the post-Supabase reality (Postgres + Auth.js, AES-encrypted local storage, Admin LLM panel, pip-uninstall.sh). The previous content was the upstream Mike doc and was misleading.
`af2a0d89`	Raise default rate limits for internal firm deployment	Claude	2026-05-16	↗ GitHub
commit body The original defaults (300 general / 30 chat / 60 chat-create / 50 upload per 15-minute window) were tuned tight enough that one person debugging a flow could easily hit them and see the generic "Too many requests" message in the browser with no obvious correlation back to what triggered it. Raise the defaults to numbers that suit normal multi-user firm use without rebuilding when an operator wants to bump them further: general 300 -> 1500 per IP, per 15 min chat 30 -> 200 per user, per 15 min chat-create 60 -> 300 per user, per 15 min upload 50 -> 300 per user, per hour All four remain tunable via the same env var names so deployments that want stricter limits (or stricter for a specific window) can still set them. Documented in .env.compose.example.

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-373.md from inside the repo you want the changes in.

⬇ Download capture-thread-373.md