r00ty

r00ty@kbin.life · 5 days ago

This was exactly what came to mind when I read the post.

r00ty@kbin.life · 8 days ago

Well, I’d expect that if they allowed 5v through but with a low current limit (I think the default 5v standard states quite a low current allowance). They could catch anything drawing too much and shut the port off until it detects disconnection/other reset.

I mean, if they’re thinking about protecting a downstream device, adding this logic would make more sense than just not supplying any power unless a negotiation is made.

In any case, since standard USB ports on a computer will output 5v without anything being negotiated, then it’s really no less safe than any other USB port in that regard.

r00ty@kbin.life · 8 days ago

But wait. Doesn’t this make them both dumb? I’d expect a modern USB-C charger to still support basic 5v low current lazy devices too. If there’s a USB-A to C cable that works, it must also still be possible to send the basic 5v down a C-C cable.

I also think there’s always going to be a balance between how much a device needs to make and/or how much it needs fast charging to make sense to add the charging circuit for PD/PPS. Even $1-2 on top of the cost can ruin margins in the current electronic market.

r00ty@kbin.life · 2 months ago

Yeah, it shouldn’t happen in a release. But, if I had a penny for every time I’ve seen the last minute development that wasn’t tested yet and not even due for the current release squeezed in. I’d literally have a pound, or dollar or whatever else has 100 pennies in.

r00ty@kbin.life · 2 months ago

In a professional sense my experience is that they’re more often the result of under-staffing and rigid, fixed release schedules.

r00ty@kbin.life · 3 months ago

Yeah. I didn’t understand what they meant by the wtf there. Seemed to me someone wondered if the Action would have a localised version of i (making this stay lowercase on a phone was harder than it should be) or if it used the same i. So made a simple test for it.

Not really sure it’s a wtf unless they expected a different result.

r00ty@kbin.life · 3 months ago

Didn’t have the link to hand. But a search turned this one up: https://reggiodigital.com/blog/nginx-rule-blocking-bad-bots/ it looks to be the same list, and you can see the ones I’ve added to the end of that list.

r00ty@kbin.life · 3 months ago

Hmm, I took an original list and added to it. You got a website I can check? If so I’ll happily remove. I don’t mind slow web crawlers at all.

r00ty@kbin.life · 3 months ago

So on my mbin instance, it’s on cloudflare. So I filter the AS numbers there. Don’t even reach my server.

On the sites that aren’t behind cloudflare. Yep it’s on the nginx level. I did consider firewall level. Maybe just make a specific chain for it. But since I was blocking at the nginx level I just did it there for now. I mean it keeps them off the content, but yes it does tell them there’s a website there to leech if they change their tactics for example.

You need to block the whole ASN too. Those that are using chrome/firefox UAs change IP every 5 minutes from a random other one in their huuuuuge pools.

r00ty@kbin.life · 3 months ago

Yeah, I probably should look to see if there’s any good plugins that do this on some community submission basis. Because yes, it’s a pain to keep up with whatever trick they’re doing next.

And unlike web crawlers that generally check a url here and there, AI bots absolutely rip through your sites like something rabid.

r00ty@kbin.life · 3 months ago

If you’re running nginx I am using the following:

if ($http_user_agent ~* "SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot|ClaudeBot|Bytespider|ImagesiftBot|Barkrowler|DataForSeoBo|Amazonbot|facebookexternalhit|meta-externalagent|FriendlyCrawler|GoogleOther|PetalBot|Applebot") { return 403; }

That will block those that actually use recognisable user agents. I add any I find as I go on. It will catch a lot!

I also have a huuuuuge IP based block list (generated by adding all ranges returned from looking up the following AS numbers):

AS45102 (Alibaba cloud) AS136907 (Huawei SG) AS132203 (Tencent) AS32934 (Facebook)

Since these guys run or have run bots that impersonate real browser agents.

There are various tools online to return prefix/ip lists for an autonomous system number.

I put both into a single file and include it into my web site config files.

EDIT: Just to add, keeping on top of this is a full time job! EDIT 2: Removed Mojeek bot as it seems to be a normal web crawler.

r00ty@kbin.life · 3 months ago

The sun always shines on pc.

r00ty@kbin.life · 3 months ago

I think it’ll be a “we’ll see” situation. This was the main concern for y2k. And I don’t doubt there’s some stuff that was partially patched from y2k still around that is still using string dates.

But the vast majority of software now works with timestamps and of course some things will need work. But with y2k the vast majority of business software needed changing. I think in this case the vast majority will be working correctly already and it’ll be the job of developers (probably in a panic less than a year before as is the custom) too catch the few outliers and yes some will escape through the cracks. But that was also the case last time round too.

r00ty@kbin.life · 3 months ago

You’re right on every point. But, I’m not sure how that goes against what I said.

Most applications now use the epoch for date and time storage, and for the 2038 problem the issues will be down to making sure either tiime_t or 64bit long values (and matching storage) which will be a much smaller change then was the case for y2k. Since more people also use libraries for date and time handling it’s also likely this will be handled.

Most databases have datetime types which again are almost certainly already ready for 2038.

I just don’t think the scale is going to be close to the same.

r00ty@kbin.life · 3 months ago

Not really processor based. The timestamp needs to be ulong (not advised but good for date ranges up to something like 2100, but cannot express dates before 1970). Or llong (long long). I think it’s a bad idea but I bet some people too lazy to change their database schema will just do this internally.

The type time_t in Linux is now 64bit regardless. So, compiling applications that used that will be fine. Of course it’s a problem if the database is storing 32bit signed integers. The type on the database can be changed too and this isn’t hard really.

As for the Y10K problem. It will almost entirely only be formatting problems I think. In the 80s and 90s, storage was at a premium, databases were generally much simpler and as such dates were very often stored as YYMMDD. There also wasn’t so much use of standard libraries. So this meant that to fix the Y2K problem required quite some work. In some cases there wasn’t time to make a proper solution. Where I was working there was a two step solution.

One team made the interim change to adjust where all dates were read and evaluate anything <30 (it wasn’t 30, it was another number but I forget which) to be 2000+number and anything else 1900+number. This meant the existing product would be fine for another 30 years or so.

The other team was writing the new version of the software, which used MSSQL server as a back-end, with proper datetime typed columns and worked properly with years before and after 2000.

I suspect this wasn’t unusual in terms of approach and most software is using some form of epoch datatype which should be fine in terms of storing, reading and writing dates beyond Y10K. But some hard-coded date format strings will need to be changed.

Source: I was there, 3000 years ago.

r00ty@kbin.life · 3 months ago

To be fair, I bought a really good mini pc for a home server. The problem is, it was marketed as a gaming pc. But with the on chip Intel HD graphics.

I can understand why this would upset people. For my uses though it was perfect. Sits in the TV cabinet, is quiet yet still quite powerful (intel 12900).

r00ty@kbin.life · 4 months ago

It’s not how ActivityPub (at least Lemmy/*bin servers) works. There isn’t so far as I’ve ever seen an API that allows for this within ActivityPub (now specific to Lemmy/*bin implementations there’s the API the browser/apps use that must provide this, but that’s not ActivityPub). It actually looks to be cleverly designed to prevent it. It might look like backfilling is happening because old stuff appears, but there are reasons for this.

How it works from my experience (I did some work on the federation in kbin a year or so ago).

Instance A subscribes to community B hosted on Instance C.
Instance C notes this and does nothing. No previous content is sent, only future activities will be.
User on Instance D already subscribed to community B upvotes a comment on a post in community B.
Instance D sends the activity to Instance C.
Instance C sends the activity to Instance A.
Instance A gets the notice of the upvote, but realises it has no context for the upvote. But luckily the upvote has the comment ID of the comment that it was related to. So, now Instance A makes a request for the comment from Instance C.
Instance A receives the response from Instance C. But it turns out that comment was in reply to another comment. But the comment contains the ID of the parent comment. So Instance A requests that comment (and any parent comments until it gets the parent post).
By now Instance A has the information about the like, all comments from the liked comment to the post. These are saved to the database and will appear on the local system.
For each of the likes, comments and posts. If the user isn’t known locally the profile will also be fetched from their instance and stored locally.

And so old posts and comments will begin to appear as activities linked to them happen. But there isn’t a method to ask for “all the posts in community X” using activity pub. I remember because I was specifically looking for this a year or so ago. It let’s you see the parent object but not any children.

Maybe Mastadon etc does it different? No idea.

And all of this is moot because if I block a User Agent, or I block an AS number/IP block. They’re not getting anything either by ActivityPub or scraping unless they change User Agent, AS number, or both.

r00ty@kbin.life · 4 months ago

I don’t think they’re optimising much at all. I think it’s likely just a modified web crawler but without the kind of throttling normal search engine crawlers use. They’re following links recursively. Then probably some basic parsing or even parsing with AI to prepare the data to make another AI model.

r00ty@kbin.life · 4 months ago

But, they aren’t. They’re not after Activitypub specifically. They’re scraping the whole internet, most of them using clear bot User Agents. So, I routinely block their bots because the AI ones are usually hitting you multiple times a second non-stop. If they started making fake Activitypub nodes they would not be scraping as a bot, and they would want specifically fediverse data. Important to note here though, an Activitypub node doesn’t “collect” data, they subscribe (to mastadon users/hashtags or communities) and then get new data delivered to them. So they wouldn’t get the old stuff.

Having said that, I’ve seen some obvious bots using genuine browser user agents on IP addresses from certain very large Chinese companies. For those I just blocked their whole AS number.

r00ty@kbin.life · 4 months ago

I mean personally I did block all the AI scrapers I could find on my instance, around a month or so ago. There were a lot, mostly unscrupulous, some big names included. Probably should look at the logs to see what’s new.

The amount of traffic was quite significant too. I have a theory that they expect legislation soon, so are not playing nice and slow like crawlers do, but are vacuuming as fast as they can.

But you’re right. Everyone would need to do it, to make a difference.