• 1 Post
  • 76 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle

  • A problem is volunteers and critical mass.

    Open source “hacks” need a big pool of people who want something to seed a few brilliant souls to develop it in their free time. It has to be at least proportional to the problem.

    This kinda makes sense for robot vacuums: a lot of people have them, and the cloud service is annoying, simpler, and not life critical.

    Teslas are a whole different deal. They are very expensive, and fewer people own them. Replicating even part of the cloud API calls is a completely different scope. The pool of Tesla owners willing to dedicate their time to that is just… smaller.

    Also, I think buying a Tesla, for many, was a vote of implicit trust in the company and its software. It’s harder for someone cynical of its cloud dependence to end up with an entire luxury automobile.



  • brucethemoose@lemmy.worldtoProgrammer Humor@programming.devC++
    link
    fedilink
    arrow-up
    146
    arrow-down
    2
    ·
    edit-2
    19 days ago

    Meanwhile, Rust punches you in the face for the mere suggestion. Again. And again.

    Python happily nods, runs it one page at a time, very slowly, randomly handing things off to a C person standing to the side, then returns a long poem. You wanted a number.

    Assembly does no checking, and reality around you tears from an access violation.

    EDIT: Oh, and the CUDA/PyTorch person is holding a vacuum sucking money from your wallet, with a long hose running to Jensen Huang’s kitchen.




  • CachyOS is not immutable. It’s basically like a more optimized and fleshed out EndeavorOS.

    Backups and rollbacks are done as they are done in vanilla Arch, and I’m not sure if there’s any analogue.

    I did use Fedora Kionite for a hot minute (which was immutable), but… found it to be too much of a hassle? I dunno, I just keep anything important on a separate drive/mount that’s easy to back up, and there was just too much fuss dealing with apps so needed, so I don’t see the point. CachyOS is preconfigured really well, so even if I had to nuke the whole partition, it would set me back like 30 minutes until I reinstall it. But if you need an immutable distro, this is not the place to look.

    CachyOS is very much focused and optimized for gaming, arguably more than any other distro. There are many performance tweaked versions of popular packages in their distros, no need to reach out to the AUR.




  • I’ve been using CachyOS for over two years and the same install/partition for over a year.

    100% recommend, it’s my distro to end all distros, and I can’t imagine anything else being (for me) easier or more performance/efficient, while having the critical mass of the Arch community behind it and being relatively stable. It’s not like Manjaro, just screwing everything up with its changes and throwing away Arch’s work, yet on very rare occasions the CachyOS devs fix something that breaks in vanilla arch.

    I’m not even gaming on it or following all the tweaks, but it’s amazing for dev work, servers, or anything really, and configured great ootb.


  • Honestly, most LLMs suck at the full 128K. Look up benchmarks like RULER.

    In my personal tests over API, LLama 70B is bad out there. Qwen (and any fine tune based on Qwen Instruct, with maybe an exception or two) not only sucks, but is impractical past 32K once its internal rope scaling kicks in. Even GPT-4 is bad out there, with Gemini and some other very large models being the only usable ones I found.

    So, ask yourself… Do you really need 128K? Because 32K-64K is a boatload of code with modern tokenizers, and that is perfectly doable on a single 24G GPU like a 3090 or 7900 XTX, and that’s where models actually perform well.


  • Late to this post, but shoot for and AMD Strix Halo or Nvidia Digits mini PC.

    Prompt processing is just too slow on Apple, and the Nvidia/AMD backends are so much faster with long context.

    Otherwise, your only sane option for 128K context in a server with a bunch of big GPUs.

    Also… what model are you trying to use? You can fit Qwen coder 32B with like 70K context on a single 3090, but honestly its not good above 32K tokens anyway.


  • Unfortunately Nvidia is, by fair, the best choice for local LLM coder hosting, and there are basically two tiers:

    • Buy a used 3090, limit the clocks to like 1400 Mhz, and then host Qwen 2.5 coder 32B.

    • Buy a used 3060, host Arcee Medius 14B.

    Both these will expose an OpenAI endpoint.

    Run tabbyAPI instead of ollama, as it’s far faster and more vram efficient.

    You can use AMD, but the setup is more involved. The kernel has to be compatible with the rocm package, and you need a 7000 card and some extra hoops for TabbyAPI compatibility.

    Aside from that, an Arc B570 is not a terrible option for 14B coder models.