

I like to code, garden and tinker
Centralization is a weakness. These services can be targeted by governments that want to limit communication. Free speech is a commodity, and servers host this free speech. If a hostile organization, such as a government, targets a channel of free speech such as those hosted on a platform that makes it easy to setup a mastodon instance, this become an easy target that will affect a large portion of users. If you are serious about freedom, you have the freedom to self-host your own platforms.
Edit: I realize my post doesn’t answer the question proposed, but it’s more of an argument against such services. I would argue self-hosting doesn’t rely on paying third-parties to host your software, but I guess this is in the eye of the hoster.
Our studies show this lower literacy-higher receptivity link is strongest for using AI tools in areas people associate with human traits, like providing emotional support or counselling.
This is really dangerous, as subjective matters can easily steer people in vulnerable positions to think and act a certain way. Depending on the training data and safe guards put in place, this could easily lead to AIs telling users to do horrible things to themselves or others.
When I say residential IP addresses, I mostly mean proxies using residential IPs, which allow scrappers to mask themselves as organic traffic.
Edit: Your point stands on there are a lot of services without these protections in place, but a lot of services are protective against scrapping.
I don’t think either are good, but it’s funny how it’s bad when the other guy is doing it.
Foreign propaganda bad. Native propaganda good.
Edit: To add on this, isn’t this capitalism? These people are being paid to do a job, shouldn’t that be celebrated? /s
The only way I can think of is require users to authenticate themselves, but this isn’t much of a hurdle.
To get into the details of it, what do you define as an AI bot? Are you worried about scrappers grabbing the contents of you website? What is the activities of an “AI Bot”. Are you worried about AI bots registering and using your platform?
The real answer is not even cloudflare will fully defend you from this. If anything cloudflare is just making sure they get paid for access to your website by AI scappers. As someone who has worked around bot protections (albeit in a different context than web scrapping), it’s a game of cat and mouse. If you or some company you hire are not actively working against automated access, you lose as the other side is active.
Just think of your point that they are using residential IP addresses. How do they get these addresses? They provide addons/extensions for browsers that offer some service (generally free VPNs) in exchange for access to your PC and therefore your internet in the contract you agree to. The same can be used by any addon, and if the addon has permissions to read any website they can scrape those websites using legit users for whatever purposes they want. The recent exposure of the Honey scam highlights this, as it’s very easy to get users to install addons by selling users they might save a small amount of money (or make money for other programs). There will be users who are compromised by addons/extensions or even just viruses that will be able to extract the data you are trying to protect.
I’d just skip OpenVPN altogether and get started with Wireguard or Headscale/Tailscale.
This one was huge for me. OpenVPN is pretty heavy with CPU overhead, where as wireguard is almost free. I was getting throttled due to the overhead of OpenVPN and roasting the CPU on my Netgear R6350 (it’s what I had lying around). With wireguard I get nearly the same speeds as without a VPN and my loads are very reasonable.
Also with weaker routers like mine, be wary of trying to use QoS, this will probably not help network congestion and instead become a bottleneck (like it did for me). This is where a beefy dedicated router really shines.
Semi-cold? That’s extra, you’ll be lucky to afford it. The affordable water been sitting out on the pavement for a few weeks.
My question would be, why do you need a more powerful server? Are you monitoring your load and seeing it’s overloaded often? Are you just looking to be able to hook more drives to it? Do you need to re-encode video on the fly for other devices? Giving some more details would help someone to give a more insightful answer. I personally am using a Raspberry Pi 4, Chromebox w/ an i7, an old HP rack server, and an old desktop PC for my self hosting needs, as this is cheaper than buying all new hardware (though the electricity bill isn’t the greatest haha, but oh well). If you are just looking for more storage, using the USB 3.0 slots on the Raspberry Pi 4b you can add a couple extra SSDs using a NVMe to USB 3.0 enclosure. For most purposes the speeds will be fine for most applications.
As for SSD vs HDD, SSD hands down. The only reason you’d pick an HDD is if your trying to get more storage cheaper and don’t mind a higher rate of failure. If your data is at all valuable, and it almost always is, redundancy should be added as well.
And as for running Linux, if it can’t run Linux I wouldn’t want to own it.
Edit: Fixed typo
This might help, sorry if it doesn’t, but here is a link to CloudFlares 5xx error code page on error 521. If you’ve done everything in the resolution list your ISP might be actively blocking you from hosting websites, as it is generally against the ISPs ToS to do such on residential service lines. This is why I personally rent a VPS and have a wireguard VPN setup to host from the VPN, which is basically just a roll your own version of Tailscale using any VPS provider. This way you don’t need to expose anything via your ISPs router/WAN and they can’t see what you are sending or which ports you are sending on (other than the encrypted VPN traffic to your VPS of course).
I use my own router with DD-WRT in-between the ISPs router/modem and my LAN, and use a different subnet. I haven’t had any issues with this myself, and my router just sees the ISP router/modem as the WAN.
I’ve never ran this program, but skimmed the documentation. You should be able to use the SHIORI_DIR
(or a custom database table following those instructions) along with the -p
argument for launching the web interface. A simple bash script that should work:
export SHIORI_DIR=/path/to/shiori-data-dir
shiori serve -p 8081
To run multiple versions, I’d suggest setting up each instance as a service on your machine in case of reboots and/or crashes.
Now for serving them, you have two options. The first is just let the users connect to the port directly, but this is generally not done for outward facing services (not that you can’t). The second is to setup a reverse proxy and route the traffic through subdomains or subpaths. Nginx is my go-to solution for this. I’ve also heard good things about Caddy. You’ll most likely have to use subdomains for this, as lots of apps assume they are the root path without some tinkering.
Edit: Corrected incorrect cli arguments and a typo.
TL;DR: The bot is configured to condense certain instances and communities. At the moment, only beehaw.org is marked to be condensed.
Quickly looking at the source code, it seems ReplyToPostsCommand
uses a SummaryTextWrapper
, which contains an iterable for both CondensedSummaryTextWrapperProvider
and DefaultSummaryTextWrapperProvider
. The DefaultSummaryTextWrapperProvider
has a priority of -1_000
(so it’s always checked last) and is set to always return true
on the supports(Community $community): bool
. CondensedSummaryTextWrapperProvider
references the config/services.yaml for it’s supports(Community $community): bool
call which lists 0 condensed communities and 1 condensed instance, being beehaw.org.
Best I could find is here, which is an article by Randall Munroe (the xkcd artist), and states:
davean (the xkcd sysadmin) wrote the patch
This blog post links to another wayback machine page (thank you archive.org!) here, which explains the sorting algorithm and states it’s original author:
Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson.
Sounds like some QoS software is also limiting LAN traffic, seeing as it still works if the internet is disconnected. I would look if your router has “Adaptive QoS” or something similar enabled.
In most setups I have seen, the nginx instance provided by Lemmy is used due to the routing needed between lemmy/lemmy-ui being handled in nginx. Your reverse proxy can then point to the nginx instance to expose lemmy.
Why not a Raspberry Pi? The supply chain issues are clearing up.
In the US it seems the supply chain issues are alive as ever. Most of the official resellers are sold out on anything but the Pico and Zero boards. Some do have 4B boards for sale if you buy their starter kit with them, increasing the price by $65 on canakit. The supply issues are definitely not resolved for home users no matter what the CEO wants to say.
It seems up to me. Do you use cloudflare’s 1.1.1.1 DNS? If so archive.is blocks their DNS queries.
Basically, data storage is cheap. Currently it can cost as low as $0.01/day for other instances to store the content that is being sent by the largest communities. Otherwise instances can clear out old data and let the larger instances be the archive hosts.