cross-posted from: https://sh.itjust.works/post/61139432

I seriously can’t believe how much progress he’s made for the FOSS community. He actually might take a bite out of the big 3’s profits with this

    • Rhaedas@fedia.io
      link
      fedilink
      arrow-up
      3
      ·
      1 day ago

      Most models are going to require CUDA. There are some AMD ones out there, but it’s a totally different math and setup. As for the one I mentioned, it’s a pretty new idea so there are only a few out there, maybe just one (Qwen based). But I did get a 31B model to work on my 12GB, I just had to move from Ollama to llama.cpp to gain the control needed to set the parameters, and fine tune what it put on the CUDA to the max it would take. I had Claude help me along the way.

      It’s new enough that there aren’t any good abliterated/uncensored models yet.

      • Jayjader@jlai.lu
        link
        fedilink
        English
        arrow-up
        1
        ·
        14 hours ago

        I’m surprised that you’re talking about models being CUDA-specific or AMD-specific. I’ve had a bunch of models running on my amd-only pc, using ollama, lemonade, and lm-studio, through either rocm or vulkan. None of these models were billed as AMD-specific. I had to do some config tweaking for ollama to use my graphics card but that’s more because I have a weird in-between-generations card that also predates the LLM hype (6700XT).

        However, I did generally need to look for the GGUF format versions of things - usually accounts like unsloth have them uploaded on huggingface barely a day or two after the original version gets posted.