Skip to content

Conversation

@PalmerAL
Copy link
Collaborator

This PR sets up the basic infrastructure to run an LLM inside Min using node-llama-cpp inside a utility process. Any llama.cpp-formatted model file should work; the model can be configured by updating modelPath inside llmService.mjs. My testing so far has been with either this model or this one.

My original intent with this was to see if it was possible to generate high-quality page summaries to display in the searchbar. Unfortunately, with llama-3.2-1b, the quality of the summaries seems quite poor. llama-3.2-3b does much better, but keeping the model loaded requires around 5GB of memory. I think this means that any use case that requires the model to continually be loaded in the background is infeasible, but it might work in a situation where the user explicitly requests to use it, which would allow us to load the model for a brief period of time and then immediately unload it. I'm planning to experiment with language translation (replacing the current cloud-based version) and with an explicit "summarize page" command, but if anyone has additional ideas for where this could be useful, I'd be happy to test them.

@codacli
Copy link

codacli commented Jul 25, 2025

Hello, thank you for your work, would it be possible to take example from Léo AI of Brave or [Ollama Client - Chat with Local LLM Models]? Is leaving the choice to the user a good strategy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants