-
I had originally opened an issue, but it is not really a problem, a discussion would be more appropriate Hi! first of all, congratulations on the project! I saw that to upload models there are only options like HuggingFace, and OpenAI API. However, I would like to run the model locally with my poor and bullied GPU (RTX 2060 6GB). I have seen that there are projects like GPTQ that allow through optimizations to get an LLM inside a small GPU. In your documentation and in the Issue section, however, I could not find any information about loading this model (which I have already downloaded) and applying its optimizations. (The model I want to use is called LLaMA 7b 4bit downloaded from the repository https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi thanks, the spec to have local/custom models is currently in development. You will choose "CustomLLMConfig" in the admin and just tell the cat where your llm endpoint is. Than community will release a few gpu-ready LLMs, or you just make your own |
Beta Was this translation helpful? Give feedback.
-
We now support both llama-cpp-python and Ollama |
Beta Was this translation helpful? Give feedback.
We now support both llama-cpp-python and Ollama