Fourth LLM provider: NVIDIA API Catalog with Kimi K2.6 as new default

rmerk adds `backend/src/lib/llm/nvidia.ts` as a fourth provider alongside Claude, Gemini, and OpenAI, targeting NVIDIA's OpenAI-compatible endpoint at `integrate.api.nvidia.com/v1`. Kimi K2.6 becomes the new default model for chat, titles, and tabular review - replacing the prior Gemini default across the board.

infrastructurechat-ui

The provider file is 327 LOC implementing streaming and multi-turn tool calls against the Chat Completions endpoint. Routing is via a slash-based heuristic in providerForModel: NVIDIA catalog IDs use a vendor/name shape, which no existing provider uses, so model.includes("/") reliably dispatches to the new branch. The three main NVIDIA models registered are moonshotai/kimi-k2.6, meta/llama-3.3-70b-instruct, and deepseek-ai/deepseek-r1, plus mid and low tiers with Llama 3.1 70B and 8B.

NVIDIA_BASE_URL is overridable via env, which means the same provider file works against a self-hosted NIM, a local Ollama instance, or vLLM - any OpenAI-compatible server. That's a practical escape hatch if you want the abstraction without a hard dependency on NVIDIA's hosted endpoint.

Two things worth checking before importing. First, the default model swap is global: DEFAULT_MAIN_MODEL, DEFAULT_TITLE_MODEL, and DEFAULT_TABULAR_MODEL all move to moonshotai/kimi-k2.6. Any environment without an NVIDIA_API_KEY will fail on those defaults where previously it might have worked with a Gemini key. Second, per-user NVIDIA key storage doesn't work yet. The ApiKeyProvider union gains "nvidia" in TypeScript, but the DB CHECK constraint on user_api_keys still rejects it. The UI path for per-user NVIDIA keys produces an error until a migration relaxes that constraint. For now, NVIDIA_API_KEY is env-only.

modelAvailability.ts gains an "nvidia" branch so the frontend picker doesn't grey out NVIDIA models regardless of which env keys are present. Without that companion change, the UI would show the models as unavailable even when the backend could serve them.

So what Worth importing if you want cheaper or open-weight model access through NVIDIA's catalog, or if you need an Ollama/vLLM option via `NVIDIA_BASE_URL`. The provider plug-in is clean and follows the existing pattern. Check the default-model swap before deploying to an environment with existing users - the change is silent and broad.

Spotted something wrong? Or know the PR text has fresher detail than the writeup above?

Commits in this thread

1 commit from rmerk/mike, oldest first. Source extracted verbatim from the harvested git log.

SHA	Subject	Author	Date
`8535292c`	feat: add NVIDIA API Catalog provider with Kimi K2.6 as default	Ryan Choi	2026-05-11	↗ GitHub
commit body Adds a fourth LLM provider that targets build.nvidia.com's OpenAI-compatible Chat Completions endpoint, enabling Kimi K2.6, Llama 3.3 70B, and DeepSeek R1 through a single env key. Backend: new lib/llm/nvidia.ts handles streaming + multi-turn tool calls against integrate.api.nvidia.com/v1. ApiKeyProvider extended with "nvidia" (env-only - DB CHECK constraint still blocks per-user storage). Frontend: model picker now shows an NVIDIA group at the top with Kimi K2.6 selected by default. ApiKeyState and modelAvailability extended to recognise nvidia so the picker doesn't grey out available models. All gemini-3-flash-preview fallbacks swapped to moonshotai/kimi-k2.6 so the app boots usefully without a Gemini key.

SHA

Subject

Author

Date

8535292c

feat: add NVIDIA API Catalog provider with Kimi K2.6 as default

Ryan Choi

2026-05-11

↗ GitHub

commit body

Adds a fourth LLM provider that targets build.nvidia.com's OpenAI-compatible
Chat Completions endpoint, enabling Kimi K2.6, Llama 3.3 70B, and DeepSeek R1
through a single env key.

Backend: new lib/llm/nvidia.ts handles streaming + multi-turn tool calls
against integrate.api.nvidia.com/v1. ApiKeyProvider extended with "nvidia"
(env-only - DB CHECK constraint still blocks per-user storage).

Frontend: model picker now shows an NVIDIA group at the top with Kimi K2.6
selected by default. ApiKeyState and modelAvailability extended to recognise
nvidia so the picker doesn't grey out available models.

All gemini-3-flash-preview fallbacks swapped to moonshotai/kimi-k2.6 so the
app boots usefully without a Gemini key.

Capture this thread into my fork

Download a single Markdown prompt that tells Claude how to port every commit above into your working tree — adapting paths and structure to match your repo. Run it via claude -p < capture-thread-417.md from inside the repo you want the changes in.