• 1 Post
  • 80 Comments
Joined 2 years ago
cake
Cake day: June 10th, 2023

help-circle


  • My latest project runs on a VM I use vscode’s ssh editing feature on. I edit the only copy of the file in existence (I have made no backup and there is no version control) and then I restart the systems service.

    So what if I mess it up? Big deal. The discord bot goes down for a few minutes and I fix it.

    Same goes for the machine configs. Ideally the machines are stable, the critical ones get backups, and if they aren’t stable then I suppose the best way to fix it would be in prod ( my VMs run debian, they’re stable).


  • I feel i’m kinda vaccinated against the junior feeling because week 2 of my first job out of college, I crashed both sides of a cluster, leaving the client’s factory responsible for half of their European production dead for 3 days.

    I panicked for a few days then they asked me to do an incident report and I thought I was cooked and then literally nothing happened to me. Nowadays if shit hits the fan at 16h59 then I’m gone at 17h00 anyway and so should everybody that’s bothered by the smell.





  • It seems your assessment is correct. You’d be surprised at the speeds you can get on poor wifi when you don’t care about latency. The average speed marching up with your download is a dead giveaway too. The fact that maximum over 5 minutes exceeds it is a bit weird, but it could be explained by some networking equipment in the middle (probably at your ISP if I was to guess) terminating MTUs for whatever reason. A common one is misconfiguring various solutions for capping internet speeds to subscribers, where your local MTU will be set correctly but the outgoing ones will be set to the maximum speed of the link.




  • SNMP does what you want. You just need a good monitoring solution that’s not as involved as Prometheus+grafana (I feel you, I’ve been there)

    I really enjoy PRTG, but it’s way too expensive for a home lab, still throwing it out there if you feel like you have money to burn.

    I hear good word about libreNMS, it’s next on my list when my PRTG licence runs out.

    Be warned that monitoring is ultimately a fickle thing; what you don’t write in yaml config for grafana, you get to dig through obscure SNMP libs to find out (though I find that’s easier for me, ymmv) for other tools.

    I recommend against: nagios (I like it but if you hate Prometheus it’s definitely not for you), checkmk (throw checkmk into the sun please it just fucking sucks), cacti (NO!), solar winds (why?)

    if you feel like you want to become a datacenter admin: zabbix scales very very well, both in performance and ease of admin against hundreds of servers, but it’s overkill for a home lab, and it can get you lost in configs for hours.




  • I’m not fully familiar with the overheads associated with all things going on on a chipset, but it’s not unreasonable to think that this workload, plus whatever the chipset has to do (hardware management tasks mostly), as well as the CPU’s other tasks on similar interfaces that might saturate the IO die/controller, would influence this.

    B350 isn’t a very fast chipset to begin with, and I’m willing to bet the CPU in such a motherboard isn’t exactly current-gen either. Are you sure you’re even running at PCIe 3.0 speeds too? There are 2.0 only CPUs available for AM4.