For the gifs, are you talking about the keyboard or lemmy client?
For the gifs, are you talking about the keyboard or lemmy client?
It’s Jerboa with the “Black” Theme selected under look and feel.
Unexpected keyboard is just the best!
I’m curious, how do you run the 4x3090s? The FE Cards would be 4x3=12 PCIe slots and 4x16=64 PCIe lanes… Did you nvlink them? What about transient power spikes? Any clock or even VBIOS mods?
I’m also on p2p 2x3090 with 48GB of VRAM. Honestly it’s a nice experience, but still somewhat limiting…
I’m currently running deepseek-r1-distill-llama-70b-awq with the aphrodite engine. Though the same applies for llama-3.3-70b. It works great and is way faster than ollama for example. But my max context is around 22k tokens. More VRAM would allow me more context, even more VRAM would allow for speculative decoding, cuda graphs, …
Maybe I’ll drop down to a 35b model to get more context and a bit of speed. But I don’t really want to justify the possible decrease in answer quality.
I’m running such a setup!
This is my nixos config, though feel free to ignore it, since it’s optmized for me and not others.
How did I achieve your described setup?
The added info from pv
is also nice ^^
It’s gotten to a point where I get “Sign-In to confirm you’re not a bot” on my residential ip(s).
Which I won’t, therefore no yt for me.
Luckily I’ve got better and more entertaining things to do.
It’s affected by the write-hole phenomenon. In BTRFS case that can mean that perfectly good old data might corrupt without any notice.
That’s true, sr.ht it not a drop-in-replacement, but rather a full on alternative.
I really like radicle though.
I use sourcehut, specifically because I like their web gui!
Do you know how access rights management work on radicle?
Last time I checked I could just add commits to any open PR…
Luckily, the main repo is different, having a canonical version.
Thanks for the writeup! So far I’ve been using ollama, but I’m always open for trying out alternatives. To be honest, it seems I was oblivious to the existence of alternatives.
Your post is suggesting that the same models with the same parameters generate different result when run on different backends?
I can see how the backend would have an influence hanfling concurrent api calls, ram/vram efficiency, supported hardware/drivers and general speed.
But going as far as having different context windows and quality degrading issues is news to me.
Is there an inherent benefit for using NVLINK? Should I specifically try out Aprodite over the other recommendations when having 2x 3090 with NVLINK available?
I use sourcehut.
dd if=/dev/zero of=image.png bs=1k count=1024 conv=notrunc
Good question: https://github.com/styluslabs/Write/commits/master/LICENSE
yes: sntx.space, check out the spurce button in the bottom right corner.
I’m building/running it the homebrewed-unconventional route. That is I have just a bit of html/css and other files I want to serve, then I use nix to build that into a usable website and serve it on one of my homelab machines via nginx. That is made available through a VPS running HA-Proxy and its public IP. The Nebula overlay network (VPN) connects the two machines.
Something like this: https://apps.gnome.org/en/Komikku/ ?