Another reason to self host your own AI

SuspiciousCarrot78@aussie.zone · edit-2 1 day ago

Another reason to self host your own AI

brucethemoose@lemmy.world · 4 hours ago

MoEs can be very fast with hybrid inference. I run Xiaomi Mimo 2.5 (a 310B model, 116GB weights) on my single 3090 + 7800 CPU, and it outputs faster than I can read it.

It’s also easier to fit long context, if you need that.

It’s best to use the ik_llama.cpp fork for that, though. It gives a huge boost to hybrid MoE speeds.