Recent post re: AI as utility
Myself, I’m a fan of local LLM / self hosted ML… but if you ever needed a clarion call that a hard pivot is coming (soon) for online/ cloud based AI…Altman et al are making some concerning mouth noises (to say nothing of broader concerns with OAI, Anthropic etc).
Right now, I’m sketching out a plan where my Raspberry Pi (always on, 2-3w) uses a magic packet to wake up my modest AI server (Lenovo P330 with Tesla P4) if/when needed (Qwen 3.6-35B-A3B); no point in chugging down 80-100w, 24/7 for no good reason.
If the trend continues the direction it appears to be (increasing costs, environmental impacts etc) then I’d feel a lot better hosting my own as port of first call and replacing simpler tasks with more traditional programs. YMMV.


MoEs can be very fast with hybrid inference. I run Xiaomi Mimo 2.5 (a 310B model, 116GB weights) on my single 3090 + 7800 CPU, and it outputs faster than I can read it.
It’s also easier to fit long context, if you need that.
It’s best to use the ik_llama.cpp fork for that, though. It gives a huge boost to hybrid MoE speeds.