Selfhosted “plagiarism” checker with custom sources?

inspxtr@lemmy.world · 6 months ago

Wonder how the survey was sent out and whether that affected sampling.

Regardless, with -3-4k responses, that’s disappointing, if not concerning.

I only have a more personal sense for Lemmy. Do you have a source for Lemmy gender diversity?

Anyway, what do you think are the underlying issues? And what would be some suggestions to the community to address them?

inspxtr@lemmy.world · edit-2 1 year ago

Hold up, are you sure you can’t view Discussions or Wiki? Which sites can you not view them?

I’m fine viewing them for public repos that I usually visit.

Asking to make sure that Github is not slowly rolling out this lockdown.

inspxtr@lemmy.world · 1 year ago

Isn’t Zipf law supposed to be log-log plot?

inspxtr@lemmy.world · 1 year ago

the whole premise of OP is that this monitors people, and many organizations use TOTP, which one could also use without internet connections or phones AFAIK.

I’m in academia and I wish this is implemented more. Data breaches are getting quite common, and Github is so entwined in software engineering that it is critical to increase security measures.

inspxtr@lemmy.world · 1 year ago

or maybe most of them in a folder? and one file that defines their locations for environment variables

inspxtr@lemmy.world · 1 year ago

what are the other alternatives to ENV that are more preferred in terms of security?

inspxtr@lemmy.world · 1 year ago

These are most, if not all, positive. I’d be curious if you can somehow make it list a mixture of positive and non-positive things before 2021 and then see where it goes from there.

inspxtr@lemmy.world · edit-2 1 year ago

yeah I guess maybe the formatting and the verbosity seems a bit annoying? Wonder what the alternatives solution could be to better engage people from mastodon, which is what this bot is trying to address.

edit: just to be clear, I’m not affiliated with the bot or its creator. This is just my observation from multiple posts I see this bot comments on.

inspxtr@lemmy.world · 1 year ago

I’m curious, why is this bot currently being downvoted for almost every comment it makes?

inspxtr@lemmy.world · edit-2 1 year ago

Thanks for the suggestions! I’m actually also looking into llamaindex for more conceptual comparison, though didn’t get to building an app yet.

Any general suggestions for locally hosted LLM with llamaindex by the way? I’m also running into some issues with hallucination. I’m using Ollama with llama2-13b and bge-large-en-v1.5 embedding model.

Anyway, aside from conceptual comparison, I’m also looking for more literal comparison, AFAIK, the choice of embedding model will affect how the similarity will be defined. Most of the current LLM embedding models are usually abstract and the similarity will be conceptual, like “I have 3 large dogs” and “There are three canine that I own” will probably be very similar. Do you know which choice of embedding model I should choose to have it more literal comparison?

That aside, like you indicated, there are some issues. One of it involves length. I hope to find something that can build up to find similar paragraphs iteratively from similar sentences. I can take a stab at coding it up but was just wondering if there are some similar frameworks out there already that I can model after.

inspxtr@lemmy.world · edit-2 1 year ago

Selfhosted “plagiarism” checker with custom sources?

inspxtr@lemmy.world · 1 year ago

how bout baserow.io or nocodb cloud? Haven’t used them but I think they’re open source. But they don’t have mobile apps AFAIK for editing.

inspxtr@lemmy.world · 1 year ago

Interesting data but in terms of viz, this is an odd choice of color scale - the max and background being very similarly dark-toned. It may look better with flipped color scale, or maybe background should be lighter. Plus, I’m mot sure Greenland being very dark means there’s no data or it’s the max.

inspxtr@lemmy.world · 2 years ago

i’m leaning towards “skull” tho

inspxtr@lemmy.world · 2 years ago

Great explanation. Two question, what’s the likelihood of an SSO page being spoofed? This seems like an all-eggs-in-one-basket sitch, so what are the potential threats to this?

inspxtr@lemmy.world · 2 years ago

This is actually an interesting question. First thing to note is that any estimation is by accounts, not by actual people (one person can have multiple alts on both). Honestly I don’t think it’s possible to have meaningful estimation.

That said, I think the first task is to figure out if we can estimate the number of accounts deleted on Reddit during the controversial period (let’s say April when the API change was starting) up til now.

I’m not aware whether there’s public daily data on it from Reddit, but there have been attempts at archiving reddit during this time and of course before. So one can theoretically use the archives to find out “all” existing users. And check the links now via browser (or curl) to see if they still exist, treat that as a good-enough proxy for deleted account.

One may get an estimate of when they were deleted by checking the links in the archives if possible. If not, there’s also Wayback machine that we may use to get a sense, but there are limitations of that.

Lemmy tracks account registration daily, I believe. I don’t know what stats one needs to run but maybe if we can line up the time series of account creation on Lemmy and account deletion on Reddit, we might have some sense of what a lower bound is for those who jumped ship forever.

inspxtr@lemmy.world · 2 years ago

Here are some options:

crypt.ee: I tried this before, I don’t think it’s selfhostable but quite usable, and nice UI. Encryption available. Ghost folders if you want to. Multimedia available, not sure about storage
joplin: you can use Nextcloud (or many other options like Dropbox) for sync and hence storage depends on your cloud solution. E2EE, has plugins, and simple enough to use.
anytype.io and logseq: I’ve seen these mention in many places but I haven’t used either. But they seem to have very rich features, not sure about selfhosting though.

inspxtr@lemmy.world · 2 years ago

do you know how this compares with other file transfer/sync like syncthing?

inspxtr@lemmy.world · 2 years ago

off topic about the site: does anyone have weird scrolling with it? It kept jumping to different pages for me.

anw, the tool looks really cool. Been looking for something that supports different mobile options like this.

inspxtr@lemmy.world · 2 years ago

One needs to note that not all matrix bridges offer E2EE options yet. But anw, that shouldn’t deter anyone from testing and using these.

inspxtr@lemmy.world · 2 years ago

what would you have done differently to communicate the data then? assuming the numbers are correct.

inspxtr@lemmy.world · 2 years ago

Suggestion for Airtable alternative with mobile options?

inspxtr@lemmy.world · 2 years ago

Comment systems for static pages (Jekyll)?

inspxtr@lemmy.world · 2 years ago

[Question] Most of the current reddit drama stems from spez, where's the other guy?