LLM dispatch bug hunt: local/* prefix ignored, stale model ids crash chat

Seven commits tracing one user-visible symptom - `POST /chat` throwing "External AI providers are disabled" on a deployment where only Ollama was provisioned - through four layers of the dispatch stack before landing on the real root cause: `resolveModel` in `lib/llm/models.ts` didn't recognise `local/*` model ids and silently reset them to the Gemini default.

chat-uiinfrastructure

The deployment had EXTERNAL_AI_DISABLED=true and no external provider keys. Users saw the external-disabled error despite the model picker visibly showing a local model.

Layer 1: diagnostics (953bd2a3). assertModelAllowed gets two extra characters of context: the error message now includes the offending model id and resolved provider. Two lines in lib/llmPolicy.ts. Would have shortened the whole investigation; added first.

Layer 2: title and tabular policy gap (afc21cea). resolveTitleModel and a new resolveTabularModel were hardcoded to prefer Gemini/OpenAI-nano/Claude-Haiku regardless of org policy. On an externals-disabled install, every title generation and every tabular extraction threw even when the main chat worked. Both functions now consult LlmPolicy: if externals are allowed and a key is present, use the cheapest external; otherwise fall through to the first curated local/<id>; otherwise return null. Callers are updated: title-gen writes a truncated copy of the user message as the title and returns 200 (title is best-effort); tabular returns 503 with "ask an admin" instead of 500.

Layer 3: frontend race (30fa8010). Clicking Send before /me/models resolved meant useSelectedModel still held a stale localStorage value (commonly gemini-3-flash-preview). The hook now fetches available models on mount, validates the stored selection against the live list, and only then sets a ready flag. ChatInput returns early on submit if !modelReady, and the Send button is disabled until the list resolves. ModelToggle's own fallback-onChange is removed to stop it fighting the hook's validation.

Layer 4: route-level tolerance (532738c5). resolveAllowedModel(requested, policy) added to llmPolicy.ts: returns the requested id if the policy allows it, the first allowed id otherwise, or null if nothing is enabled. POST /chat and POST /projects/:id/chat now resolve before dispatching and log substitutions. No-provider-at-all returns 503 with a clear message.

Layer 5: dispatcher self-heal (dc31b293). The same tolerant resolution moves into streamChatWithTools / completeText in lib/llm/index.ts. Any future caller passing a stale id gets a substitution and a warning log; the only remaining hard-error is an empty policy.

The actual root cause (ac3ff6a9). All five layers above and the original error persisted on the live install with no [chat/stream] substituting warning. A trace log (4a5fa1f0) revealed it: runLLMStream in chatTools.ts re-resolved its incoming model via resolveModel in lib/llm/models.ts, which checked a hardcoded ALL_MODELS set containing only the external provider catalogue. local/* ids were not in the set, so even after the route correctly substituted gemini-3-flash-preview → local/llama3.2:3b, resolveModel silently reset it back to DEFAULT_MAIN_MODEL (Gemini). The fix is one character: ALL_MODELS.has(id) || id.startsWith("local/"). Trace log removed in the same commit.

So what Worth porting as a bundle if your fork uses local models with the same `local/` prefix scheme. The `local/*` omission from `ALL_MODELS` is the kind of bug that hides behind correct-looking upstream code for months; the resolution layers added on the way to finding it (`resolveAllowedModel` at the route, dispatcher self-heal, picker `ready` flag) are useful even after the root cause is patched - they convert "500 on chat" into a logged substitution. The discipline of adding a trace log, finding the cause, and removing it in the same commit is the right approach. The title-gen graceful-degrade (write a message snippet as title, return 200) is worth keeping in any fork on the externals-enabled path as well.

View this fork on GitHub →

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

7 commits from cpatpa/PIP, oldest first. Source extracted verbatim from the harvested git log.

SHA	Subject	Author	Date
`953bd2a3`	assertModelAllowed: include the offending model id in error	Claude	2026-05-16	↗ GitHub
commit body When the chat dispatch rejects a request because the resolved provider is gated off, the error message was generic ("External AI providers are disabled by organisation policy.") with no indication of which model id triggered it. That makes "why isn't my chat working?" issues much harder to diagnose when the UI claims one model is selected but the request body carries another (stale localStorage, default-model fallback, etc.). Append the model id and resolved provider to the message so the backend log line names the actual culprit.
`afc21cea`	Title + tabular: fall back to local LLM when externals disabled	Claude	2026-05-16	↗ GitHub
commit body Title generation was hardcoded to pick from Gemini / OpenAI nano / Claude Haiku regardless of policy. On a deployment where EXTERNAL_AI_DISABLED=true and local Ollama is the only provider, every title call (and every tabular call) threw "External AI providers are disabled by organisation policy", surfacing as a 500 on POST /chat/<id>/generate-title even though the chat itself worked. userSettings.resolveTitleModel and a new resolveTabularModel now consult the LlmPolicy: prefer external when explicitly allowed AND a key is present, otherwise fall back to the first curated local model (`local/<id>`), otherwise return null. Callers updated: - chat.generate-title: when title_model is null, write a trimmed message snippet as the title and return 200. Best-effort; the red error toast no longer fires when chat itself is fine. - tabular.compose-column-prompt and the two cell-extraction paths: 503 with a clear "ask an admin" message instead of throwing. - tabular chat-title generation: skip silently rather than crash the tabular chat exchange.
`30fa8010`	Fix race: chat dispatched stale model id before picker loaded	Claude	2026-05-16	↗ GitHub
commit body If a user clicked Send on a freshly-loaded chat before /me/models resolved, useSelectedModel still held the legacy default (`gemini-3-flash-preview` or a stale localStorage value). The backend's assertModelAllowed then rejected with "External AI providers are disabled" even though the picker UI was about to auto-fall-back to a valid local model. The picker just hadn't caught up yet. Move the validation into useSelectedModel itself: on mount it fetches the live available-models list, validates the stored selection against it, and only THEN emits a usable id. Returns a third element `ready` so callers can disable Send while we don't yet have a verified-valid selection. ChatInput now early-returns on submit when !modelReady, and the Send button is disabled until the list has resolved. ModelToggle's own fallback-onChange is removed since useSelectedModel handles it authoritatively (avoids double-emit churn).
`532738c5`	chat: tolerant model resolution instead of throwing on stale ids	Claude	2026-05-16	↗ GitHub
commit body A user-visible chat would hard-fail with "External AI providers are disabled by organisation policy. (model='gemini-3-flash-preview', ...)" when the request body carried a model id that's no longer allowed by the current policy. This happened most reliably on the auto-send path from InitialView -> /assistant/chat/<id>: the new chat's queued first message has whatever model the InitialView's ChatInput held at submit time, and that can be stale if the bundle was loaded before /admin/llm was configured. Add resolveAllowedModel(requested, policy) to llmPolicy: returns the requested id if allowed, otherwise the first allowed id, or null when no provider is enabled at all. POST /chat and POST /projects/:id/chat now resolve before dispatching. Substitutions are logged so admins can see them. A truly empty policy (no provider enabled) returns 503 with a clear "ask an admin" message rather than a generic 500. assertModelAllowed is still used by streamChatWithTools as the hard gate for direct calls (e.g. tabular extraction); the chat streaming path now never reaches it with a forbidden id.
`dc31b293`	LLM dispatcher: self-heal stale model ids at the deepest layer	Claude	2026-05-16	↗ GitHub
commit body The chat dispatch had been crashing with "External AI providers are disabled (model='gemini-3-flash-preview', provider='google')" even after the chat.ts route added a tolerant resolveAllowedModel substitution. That route-level fix only covers /chat and /projects/:id/chat; tabular and any future caller would still trip the original assertModelAllowed. Move the resolve logic INTO streamChatWithTools and completeText. Now any caller that hands the dispatcher a stale model id sees the dispatcher substitute the first allowed model, log a warning, and proceed. The only way to surface an error to the user is truly empty policy (no provider enabled), which throws with a clear "ask an admin" message. Net effect: - /chat, /projects/:id/chat: route-level resolve runs first, no change for happy path. - Tabular review and any other path that calls runLLMStream / streamChatWithTools / completeText: now also tolerant. - Title generation: already nullable; falls through here too. The assertModelAllowed helper stays in llmPolicy.ts for callers that genuinely want a hard gate, but the LLM dispatcher no longer uses it on the streaming chat path.
`4a5fa1f0`	chat: add a one-line trace log for model resolution	Claude	2026-05-16	↗ GitHub
commit body There is a deployment in the field where the route-level resolveAllowedModel demonstrably substitutes 'gemini-3-flash-preview' -> 'local/qwen3-next:80b' when invoked from a test harness against the same policy, yet the same request flow still throws "External AI providers are disabled (model='gemini-3-flash-preview')" at runtime with no [chat/stream] substituting warning in the log. Add a single console.log that prints the body model, parsed model, resolved model, and the two key policy booleans on every chat stream. One log line per request, so the disconnect surfaces unambiguously the next time it bites. Remove the log once the cause is identified.
`ac3ff6a9`	resolveModel: accept local/* model ids	Claude	2026-05-16	↗ GitHub
commit body runLLMStream in chatTools.ts re-resolved its incoming model via resolveModel(model, DEFAULT_MAIN_MODEL). resolveModel checked the hardcoded ALL_MODELS set, which only contained the external provider catalogue (claude, gemini, openai variants) -- not the local/* prefix used for Ollama models. So even after chat.ts had correctly substituted the request body's stale 'gemini-3-flash-preview' into 'local/llama3.2:3b', runLLMStream's internal resolveModel silently kicked it BACK to 'gemini-3-flash-preview' (the DEFAULT_MAIN_MODEL fallback), which then tripped the deepest-layer self-heal we added in llm/index.ts and produced "[llm] requested model 'gemini-3-flash-preview' not allowed; substituting 'local/qwen3-next:80b'" -- not the model the user actually picked. The fix is local-aware: treat any id starting with `local/` as a valid model the caller already validated, so resolveModel passes it through instead of forcing the gemini fallback. Also drop the debug trace log from chat.ts now that the cause is identified.

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-462.md from inside the repo you want the changes in.

⬇ Download capture-thread-462.md