Skip Navigation

Self hosted LLM

Hello internet users. I have tried gpt4all and like it, but it is very slow on my laptop. I was wondering if anyone here knows of any solutions I could run on my server (debian 12, amd cpu, intel a380 gpu) through a web interface. Has anyone found any good way to do this?

18 comments
  • text-generation-webui is kind of the standard from what I've seen to run it with a webui, but the vram stuff here is accurate. Text LLMs require an insane amount of vram to keep a conversation going.

  • Ollama is a nice server base, they lots of projects that plug on top of that.

  • Thanks to this post, and the other comments in here, I've discovered that the ultimate ui for ai-models may well be

    https://github.com/ParisNeo/lollms-webui

    and on HuggingFace ( that name is aweful: to me it is the creepy-horrible FaceHugger, from the movie Alien, that I saw so many decades ago ) TheBloke has some models which are smaller

    https://huggingface.co/TheBloke/

    so you can choose a model that will actually-work on your hardware.

    I think Llama-2 for brainstorming & CodeLlama-instruct for learning programming examples seems to be the cleanest pair, from what I've read, and he's got GGUF versions with different quantizations, so you can choose what will actually-fit on your hardware.

    There are other models on huggingface which seem very useful, like

    • whisper-large-v3 for speech-to-text,
    • whisperspeech for text-to-speech,
    • sdxl-turbo for image-making ( for some copyright-free subjects to practice drawing with ), and so-on..

    Some models require GPU, not all.

    Damn things moved fast!

18 comments