• 0 Posts
  • 160 Comments
Joined 2 years ago
cake
Cake day: June 15th, 2023

help-circle



  • SEO (search engine optimization) has dominated search results for almost as long as search engines have existed. The entire field of SEO is about gaming the system at the expense of users, and often also at the expense of search platforms.

    The audience for an author’s gripping life story in every goddamn recipe was never humans, either. That was just for Google’s algorithm.

    Slop is not new. It’s just more automated now. There are two new problems for users, though:

    1. Google no longer gives a shit. They used to play the cat-and-mouse game, and while their victories were never long-lasting, at least their defeats were not permanent. (Remember ExpertsExchange? It took years before Google brought down the hammer on that. More recently, think of how many results you’ve seen from Pinterest, Forbes, or Medium, and think of how few of those deserved even a second of your time.)
    2. Companies that still do give a shit face a much more rapid exploitation cycle. The cats are still plain ol’ cats, but the mice are now Borg.

  • Well I’m sorry, but most PDF distillers since the 90s have come with OCR software that can extract text from the images and store it in a way that preserves the layout AND the meaning

    The accuracy rate of even the best OCR software is far, far too low for a wide array of potential use cases.

    Let’s say I have an archive of a few thousand scientific papers. These are neatly formatted digital documents, not even scanned images (though “scanned images” would be within scope of this task and should not be ignored). Even for that, there’s nothing out there that can produce reliably accurate results. Everything requires painstaking validation and correction if you really care about accuracy.

    Even ArXiv can’t do a perfect job of this. They launched their “beta” HTML converter a couple years ago. Improving accuracy and reliability is an ongoing challenge. And that’s with the help or LaTeX source material! It would naturally be much, much harder if they had to rely solely on the PDFs generated from that LaTeX. See: https://info.arxiv.org/about/accessible_HTML.html

    As for solving this problem with “AI”…uh…well, it’s not like “OCR” and “AI” are mutually exclusive terms. OCR tools have been using neural networks for a very long time already, it just wasn’t a buzzword back then so nobody called it “AI”. However, in the current landscape of “AI” in 2025, “accuracy” is usually just a happy accident. It doesn’t need to be that way, and I’m sure the folks behind commercial and open-source OCR tools are hard at work implementing new technology in a way that Doesn’t Suck.

    I’ve played around with various VL models and they still seem to be in the “proof of concept” phase.


  • I’ve been using cryptpad.fr (the “flagship instance” of CryptPad) for years. It’s…fine. Really, it’s fine. I’m not thrilled with the experience, but it is functional and I’m not aware of any viable alternatives that are end-to-end encrypted.

    It’s based on OnlyOffice, which is basically a heavyweight web-first Microsoft Office clone. Set your expectations accordingly.

    No mobile apps, and the web UI is not optimized for mobile. I mean, it works, but does using the desktop MS Office UI on a smartphone sound like fun to you?

    Performance is tolerable but if you’re used to Google Sheets, it’s a big downgrade. Some of this is just the necessary overhead involved in an end-to-end encrypted cloud service. Some of it is because, again, this is a heavyweight desktop UI running in a web browser. It’s functional, but it’s not fast and it’s not pretty.


  • For instance, Mozilla said it may have removed blanket claims that it never sells user data because the legal definition of “sale of data” is now “broad and evolving,” Mozilla’s blog post stated.

    Uh huh.

    The company pointed to the California Consumer Privacy Act (CCPA) as an example of why the language was changed, noting that the CCPA defines “sale” as the “selling, renting, releasing, disclosing, disseminating, making available, transferring, or otherwise communicating orally, in writing, or by electronic or other means, a consumer’s personal information by [a] business to another business or a third party” in exchange for “monetary” or “other valuable consideration.”

    Yes. That’s what “sale of data” means. Everybody understood that. That’s exactly what we don’t want you to do.






  • DNS over HTTPS. It allows encrypted DNS lookup with a URL, which allows for url-based customizations not possible with traditional DNS lookups (e.g. the server could have /ads or /trackers endpoints so you can choose what to block).

    DNS Over TLS (DoT) is similar, but it doesn’t use URLs, just IP addresses like generic DNS. Both are encrypted.


  • I don’t think there’s any simple answer to what’s beginner-friendly, because so much is hardware-dependent. They mentioned obscure laptop hardware, and at that point I wouldn’t even make a recommendation to someone beyond “see if any distros have a wiki page about that specific hardware, and search for forum threads about it”.

    I’m sure there are cases when Arch is a lot easier than Mint. I’m not sure why they dismissed Fedora out of hand, though. What’s wrong with Fedora?




  • In my experience, this is more a problem if you are fully running your own mail servers, not so much if you are using an established email service. My MX record reflects my email provider, and my outgoing mail goes through their servers. So I’m as trusted as they are, in general. Your mail provider should have instructions on how to set up DNS for verification.


  • If you’re willing to pay money for it, you can get your own domain for $2-$15 per year, then use it with pretty much any commercial email service. That way you can change email providers without changing your address.

    This is my plan going forward. I’m going to suffer the inconvenience of changing my address, but only one more time, not every time I want to change providers.



  • If you think this isn’t related to human rights, then you’ve missed the point.

    People have the right to use technology, and indeed we effectively need technology to exercise our right to free speech. You cannot have one without the other. Not anymore.

    The right way to think about this that they are arbitrarily banning a topic of discussion simply because it is not dead-center average. This isn’t even a legal issue, and the justification is utter nonsense (Facebook itself runs on Linux, like >90% of the internet). No government has officially asked them to do this, though the timing suggests that it is unofficially from the Trump administration.

    This is about exerting control, establishing precedent, and applying a chilling effect to anything not directly aligned with their interests. This obviously extends to human rights issues. This is a test run.