brauliogusmao wires Mike up to run on models that never leave the building
This fork can now point at locally-hosted AI models instead of only calling out to remote providers.
brauliogusmao added support for Ollama - a tool for running open-source language models on your own machine or server - as a fully supported option alongside the remote services Mike already talks to. In practice that means a firm could run the assistant against a model sitting on hardware it controls, rather than sending every prompt and document out to a third-party API. A new menu groups these local models separately and only shows them when a local instance is actually reachable.
There's a thoughtful wrinkle for the smaller models realistic to run locally: they're less reliable at deciding when to go fetch a document on their own, so the fork quietly loads the relevant document content up front instead of waiting to be asked. It's a deliberate accommodation, and it does make local models behave a little differently from the remote ones.
Spotted something wrong? Or know the PR text has fresher detail than the writeup above?