

Borges alleges that a little-known federal tech team called the Department of Government Efficiency, or DOGE
“Little known”? It was constantly in the news for the past year.
Off-and-on trying out an account over at @tal@oleo.cafe due to scraping bots bogging down lemmy.today to the point of near-unusability.


Borges alleges that a little-known federal tech team called the Department of Government Efficiency, or DOGE
“Little known”? It was constantly in the news for the past year.
Glance…dashboard
Oh, man, that’s a little confusing name-wise. There’s also the unrelated Glances, which also displays a dashboard that might list things like the TX/RX data from your router.


Actually, thinking about this…a more-promising approach might be deterrent via poisoning the information source. Not bulletproof, but that might have some potential.
So, the idea here is that what you’d do there is to create a webpage that looks, to a human, as if only the desired information shows up.
But you include false information as well. Not just an insignificant difference, as with a canary trap, or a real error intended to have minimal impact, only to identify an information source, as with a trap street. But outright wrong information, stuff where reliance on the stuff would potentially be really damaging to people relying on the information.
You stuff that information into the page in a way that a human wouldn’t readily see. Maybe you cover that text up with an overlay or something. That’s not ideal, and someone browsing using, say, a text-mode browser like lynx might see the poison, but you could probably make that work for most users. That has some nice characteristics:
You don’t have to deal with the question of whether the information rises to the level of copyright infringement or not. It’s still gonna dick up responses being issued by the LLM.
Legal enforcement, which is especially difficult across international borders — The Pirate Bay continues to operate to this day, for example — doesn’t come up as an issue. You’re deterring via a different route.
The Internet Archive can still archive the pages.
Someone could make a bot that post-processes your page to strip out the poison, but you could sporadically change up your approach, change it over time, and the question for an AI company is whether it’s easier and safer to just license your content and avoid the risk of poison, or to risk poisoned content slipping into their model whenever a media company adopts a new approach.
I think the real question is whether someone could reliably make a mechanism that’s a general defeat for that. For example, most AI companies probably are just using raw text today for efficiency, but for specifically news sources known to do this, one could generate a screenshot of a page in a browser and then OCR the text. The media company could maybe still take advantage of ways in which generalist OCR and human vision differ — like, maybe humans can’t see text that’s 1% gray on a black background, but OCR software sees it just fine, so that’d be a place to insert poison. Or maybe the page displays poisoned information for a fraction of a second, long enough to be screenshotted by a bot, and then it vanishes before a human would have time to read it.
shrugs
I imagine that there are probably already companies working on the problem, on both sides.


I’m very far from sure that this is an effective way to block AI crawlers from pulling stories for training, if that’s their actual concern. Like…the rate of new stories just isn’t that high. This isn’t, say, Reddit, where someone trying to crawl the thing at least has to generate some abnormal traffic. Yeah, okay, maybe a human wouldn’t read all stories, but I bet that many read a high proportion of what the media source puts out, so a bot crawling all articles isn’t far off looking like a human. All a bot operator need do is create a handful of paid accounts and then just pull partial content with each, and I think that a bot would just fade into the noise. And my guess is that it is very likely that AI training companies will do that or something similar if knowledge of current news events is of interest to people.
You could use a canary trap, and that might be more-effective:
https://en.wikipedia.org/wiki/Canary_trap
A canary trap is a method for exposing an information leak by giving different versions of a sensitive document to each of several suspects and seeing which version gets leaked. It could be one false statement, to see whether sensitive information gets out to other people as well. Special attention is paid to the quality of the prose of the unique language, in the hopes that the suspect will repeat it verbatim in the leak, thereby identifying the version of the document.
The term was coined by Tom Clancy in his novel Patriot Games,[1][non-primary source needed] although Clancy did not invent the technique. The actual method (usually referred to as a barium meal test in espionage circles) has been used by intelligence agencies for many years. The fictional character Jack Ryan describes the technique he devised for identifying the sources of leaked classified documents:
Each summary paragraph has six different versions, and the mixture of those paragraphs is unique to each numbered copy of the paper. There are over a thousand possible permutations, but only ninety-six numbered copies of the actual document. The reason the summary paragraphs are so lurid is to entice a reporter to quote them verbatim in the public media. If he quotes something from two or three of those paragraphs, we know which copy he saw and, therefore, who leaked it.
There, you generate slightly different versions of articles for different people. Say that you have 100 million subscribers. ln(100000000)/ln(2)=26.57... So you’re talking about 27 bits of information that need to go into the article to uniquely describe each. The AI is going to be lossy, I imagine, but you can potentially manage to produce 27 unique bits of information per article that can reasonably-reliably be remembered by an AI after training. That’s 27 different memorable items that need to show up in either Form A or Form B. Then you search to see what a new LLM knows about and ban the bot identified.
Cartographers have done that, introduced minor, intentional errors to see what errors maps used to see whether they were derived from their map.
https://en.wikipedia.org/wiki/Trap_street
In cartography, a trap street is a fictitious entry in the form of a misrepresented street on a map, often outside the area the map nominally covers, for the purpose of “trapping” potential plagiarists of the map who, if caught, would be unable to explain the inclusion of the “trap street” on their map as innocent. On maps that are not of streets, other “trap” features (such as nonexistent towns, or mountains with the wrong elevations) may be inserted or altered for the same purpose.[1]
https://en.wikipedia.org/wiki/Phantom_island
A phantom island is a purported island which has appeared on maps but was later found not to exist. They usually originate from the reports of early sailors exploring new regions, and are commonly the result of navigational errors, mistaken observations, unverified misinformation, or deliberate fabrication. Some have remained on maps for centuries before being “un-discovered”.
In some cases, cartographers intentionally include invented geographic features in their maps, either for fraudulent purposes or to catch plagiarists.[5][6]
That has weaknesses. It’s possible to defeat that by requesting multiple versions using different bot accounts and identifying divergences and maybe merging them. In the counterintelligence situation, where canary traps have been used, normally people only have access to one source, and it’d be hard for an opposing intelligence agency to get access to multiple sources, but it’s not hard here.
And even if you ban an account, it’s trivial to just create a new one, decoupled from the old one. Thus, there isn’t much that a media company can realistically do about it, as long as the generated material doesn’t rise to the level of a derived work and thus copyright infringement (and this is in the legal sense of derived — simply training something on something else isn’t sufficient to make it a derived work from a copyright law standpoint, any more than you reading a news report and then talking to someone else about it is).
Getting back to the citation issue…
Some news companies do keep archives (and often selling access to archives is a premium service), so for some, that might cover some of the “inability to cite” problem that not having Internet Archive archives produces, as long as the company doesn’t go under. It doesn’t help with a problem that many news companies have a tendency to silently modify articles without reliably listing errata, and that having an Internet Archive copy can be helpful. There are also some issues that I haven’t yet seen become widespread but worried about, like where a news source might provide different articles to people in different regions; there, having a trusted source like the Internet Archive can avoid that, and that could become a problem.


Yeah, that’s something that I’ve wondered about myself, what the long run is. Not principally “can we make an AI that is more-appealing than humans”, though I suppose that that’s a specific case, but…we’re only going to make more-compelling forms of entertainment, better video games. Recreational drugs aren’t going to become less addictive. If we get better at defeating the reward mechanisms in our brain that evolved to drive us towards advantageous activities…
https://en.wikipedia.org/wiki/Wirehead_(science_fiction)
In science fiction, wireheading is a term associated with fictional or futuristic applications[1] of brain stimulation reward, the act of directly triggering the brain’s reward center by electrical stimulation of an inserted wire, for the purpose of ‘short-circuiting’ the brain’s normal reward process and artificially inducing pleasure. Scientists have successfully performed brain stimulation reward on rats (1950s)[2] and humans (1960s). This stimulation does not appear to lead to tolerance or satiation in the way that sex or drugs do.[3] The term is sometimes associated with science fiction writer Larry Niven, who coined the term in his 1969 novella Death by Ecstasy[4] (Known Space series).[5][6] In the philosophy of artificial intelligence, the term is used to refer to AI systems that hack their own reward channel.[3]
More broadly, the term can also refer to various kinds of interaction between human beings and technology.[1]
Wireheading, like other forms of brain alteration, is often treated as dystopian in science fiction literature.[6]
In Larry Niven’s Known Space stories, a “wirehead” is someone who has been fitted with an electronic brain implant known as a “droud” in order to stimulate the pleasure centers of their brain. Wireheading is the most addictive habit known (Louis Wu is the only given example of a recovered addict), and wireheads usually die from neglecting their basic needs in favour of the ceaseless pleasure. Wireheading is so powerful and easy that it becomes an evolutionary pressure, selecting against that portion of humanity without self-control.
Now, of course, you’d expect that to be a powerful evolutionary selector, sure — if only people who are predisposed to avoid such things pass on offspring, that’d tend to rapidly increase the percentage of people predisposed to do so — but the flip side is the question of whether evolutionary pressure on the timescale of human generations can keep up with our technological advancement, which happens very quickly.
There’s some kind of dark comic that I saw — I thought that it might be Saturday Morning Breakfast Cereal, but I’ve never been able to find it again, so maybe it was something else — which was a wordless comic that portrayed a society becoming so technologically advanced that it basically consumes itself, defeats its own essential internal mechanisms. IIRC it showed something like a society becoming a ring that was just stimulating itself until it disappeared.
It’s a possible answer to the Fermi paradox:
https://en.wikipedia.org/wiki/Fermi_paradox#It_is_the_nature_of_intelligent_life_to_destroy_itself
The Fermi paradox is the discrepancy between the lack of conclusive evidence of advanced extraterrestrial life and the apparently high likelihood of its existence.[1][2][3]
The paradox is named after physicist Enrico Fermi, who informally posed the question—remembered by Emil Konopinski as “But where is everybody?”—during a 1950 conversation at Los Alamos with colleagues Konopinski, Edward Teller, and Herbert York.
Evolutionary explanations
It is the nature of intelligent life to destroy itself
This is the argument that technological civilizations may usually or invariably destroy themselves before or shortly after developing radio or spaceflight technology. The astrophysicist Sebastian von Hoerner stated that the progress of science and technology on Earth was driven by two factors—the struggle for domination and the desire for an easy life. The former potentially leads to complete destruction, while the latter may lead to biological or mental degeneration.[98] Possible means of annihilation via major global issues, where global interconnectedness actually makes humanity more vulnerable than resilient,[99] are many,[100] including war, accidental environmental contamination or damage, the development of biotechnology,[101] synthetic life like mirror life,[102] resource depletion, climate change,[103] or artificial intelligence. This general theme is explored both in fiction and in scientific hypotheses.[104]


Now some of those users gather on Discord and Reddit; one of the best-known groups, the subreddit r/MyBoyfriendIsAI, currently boasts 48,000 users.
I am confident that one way or another, the market will meet demand if it exists, and I think that there is clearly demand for it. It may or may not be OpenAI, it may take a year or two or three for the memory market to stabilize, but if enough people want to basically have interactive erotic literature, it’s going to be available. Maybe someone else will take a model and provide it as a service, train it up on appropriate literature. Maybe people will run models themselves on local hardware — in 2026, that still requires some technical aptitude, but making a simpler-to-deploy software package or even distributing it as an all-in-one hardware package is very much doable.
I’ll also predict that what males and females generally want in such a model probably differs, and that there will probably be services that specialize in that, much as how there are companies that make soap operas and romance novels that focus on women, which tend to differ from the counterparts that focus on men.
I also think that there are still some challenges that remain in early 2026. For one, current LLMs still have a comparatively-constrained context window. Either their mutable memory needs to exist in a different form, or automated RAG needs to be better, or the hardware or software needs to be able to handle larger contexts.


If I’m traveling or I wipe my device or get a new one, I would have to add the new key to many servers as authorized keys,
So, I don’t want to get into a huge argument over the best way to deal with things, since everyone has their own use cases, but if that’s your only concern, you have a list of hosts that you want to put the key on, and you still have a key for another device, that shouldn’t be terribly difficult. Generate your new keypair for your new device. Then on a Linux machine, something like:
$ cat username-host-pairs.txt
me@host1
me@host2
me@host3
$ cat username-host-pairs.txt|xargs -n1 ssh-copy-id -i new-device-key-file-id_ed25519.pub
That should use your other device’s private key to authenticate to the servers in question and copy the new device’s pubkey to the accounts on the host in question. Won’t need password access enabled.


In fact, that’s generally what you want to do, since if one device gets lost or compromised, you just revoke access to the key for that device.


So an internet
The highest data rate it looks like is supported by LoRa in North America is 21900 bits per second, so you’re talking about 21kbps, or 2.6kBps in a best-case scenario. That’s about half of what an analog telephone system modem could achieve.
It’s going to be pretty bandwidth-constrained, limited in terms of routing traffic around.
I think that the idea of a “public access, zero-admin mesh Internet over the air” isn’t totally crazy, but that it’d probably need to use something like laser links and hardware that can identify and auto-align to other links.


These guys appear to have a global visualization of the Meshtastic network nodes that they can see.


I don’t run a home automation system, but if you want an open solution and are willing to do some configuration, my understanding is that the main contenders are OpenHAB and Home Assistant.
I’d also suggest !homeautomation@lemmy.world as a more-specialized resource than !selfhosted@lemmy.world, though I imagine that there’s overlap.


Oh, yeah, it’s not that ollama itself is opening holes (other than adding something listening on a local port), or telling people to do that. I’m saying that the ollama team is explicitly promoting bad practices. I’m just saying that I’d guess that there are a number of people who are doing things like fully-exposing or port-forwarding to ollama or whatever because they want to be using the parallel compute hardware on their computer remotely. The easiest way to do that is to just expose ollama without setting up some kind of authentication mechanism, so…it’s gonna happen.
I remember someone on here who had their phone and desktop set up so that they couldn’t reach each other by default. They were fine with that, but they really wanted their phone to be able to access the LLM on their computer, and I was helping walk them through it. It was hard and confusing for them — they didn’t really have a background in the stuff, but badly wanted the functionality. In their case, they just wanted local access, while the phone was on their home WiFi network. But…I can say pretty confidently that there are people who want access all the time, to access the thing remotely.


I mean, the article is talking about providing public inbound access, rather than having the software go outbound.
I suspect that in some cases, people just aren’t aware that they are providing access to the world, and it’s unintentional. Or maybe they just don’t know how to set up a VPN or SSH tunnel or some kind of authenticated reverse proxy or something like that, and want to provide public access for remote use from, say, a phone or laptop or something, which is a legit use case.
ollama targets being easy to set up. I do kinda think that there’s an argument that maybe it should try to facilitate configuration for that setup, even though it expands the scope of what they’re doing, since I figure that there are probably a lot of people without a lot of, say, networking familiarity who just want to play with local LLMs setting these up.
EDIT: I do kind of think that there’s a good argument that the consumer router situation plus personal firewall situation is kind of not good today. Like, “I want to have a computer at my house that I want to access remotely via some secure, authenticated mechanism without dicking it up via misconfiguration” is something that people understandably want to do and should be more straightforward.
I mean, we did it with Bluetooth, did a consumer-friendly way to establish secure communication over insecure airwaves. We don’t really have that for accessing hardware remotely via the Internet.


Have a limited attack surface will reduce exposure.
If, say, the only thing that you’re exposing is, oh, say, a Wireguard VPN, then unless there’s a misconfiguration or remotely-exploitable bug in Wireguard, then you’re fine regarding random people running exploit scanners.
I’m not too worried about stuff like (vanilla) Apache, OpenSSH, Wireguard, stuff like that, the “big” stuff that have a lot of eyes on them. I’d be a lot more dubious about niche stuff that some guy just threw together.
To put perspective on this, you gotta remember that most software that people run isn’t run in a sandbox. It can phone home. Games on Steam. If your Web browser has bugs, it’s got a lot of sites that might attack it. Plugins for that Web browser. Some guy’s open-source project. That’s a potential vector too. Sure, some random script kiddy running an exploit scanner is a potential risk, but my bet is that if you look at the actual number of compromises via that route, it’s probably rather lower than plain old malware.
It’s good to be aware of what you’re doing when you expose the Internet to something, but also to keep perspective. A lot of people out there run services exposed to the Internet every day; they need to do so to make things work.


I am somewhat-cynically wondering if the optimal political strategy is to sit on Twitter (which has more European voters to see one’s actions) and loudly complain about a lack of Twitter alternatives (which probably scores points with European voters) than to actually use a Twitter alternative.


Well…I mean…even assume that they did. Mastodon fits that, and was built specifically to be a Twitter alternative. Heck, even on the Threadiverse, Mbin supports both formats, does both Reddit-style Lemmy/PieFed Threadiverse communities and Twitter-style Mastodon microblogging.


X is no longer a public square
A group of 54 members of the European Parliament called for European alternatives to the dominant social media platforms on Monday.
IIRC the EC actually paid for some of the development of Kbin (now Mbin) with a grant.


The Threadiverse is also social media. I mean, it’s distributed and not owned by a single company, and much of it is funded by donations, but…
EDIT: And Mastodon is a direct competitor to Twitter, and it also runs on the Fediverse.


Altman responds to Musk
Never wrestle with a pig. You’ll get mud all over you, and the pig enjoys it.
¯\(ツ)/¯
I assumed not, but maybe it could be.