• sunzu2
    link
    fedilink
    19 days ago

    https://ollama.org/

    You can pick something that fits your GPU size. Works well on apple silicon too. My fav’s now are qwen3 series. Prolly best performance for local single gpu

    Will work on CPU/RAM but slower

    If you got Linux, I would put into a docker container. Might too much for the first try. There easier options I think.

    • @tormeh@discuss.tchncs.de
      link
      fedilink
      English
      28 days ago

      Ollama is apparently going for lock-in and incompatibility. They’re forking llama.cpp for some reason, too. I’d use GPT4All or llama.cpp directly. They support Vulkan, too, so your GPU will just work.