@AndrasKrigare

AndrasKrigare@beehaw.org · 1 month ago

Has science gone too far?

AndrasKrigare@beehaw.org · 1 month ago

Alternate take: I want something that does B, so I research methods of doing B and find one that’s good. Good thing I’m a smart boy that doesn’t make purchasing decisions based on what the marketing department says things do.

There’s plenty of good reasons to criticize or be concerned about LLMs. You don’t need to make up dumb ones.

AndrasKrigare@beehaw.org · 1 month ago

Sure, but false advertising has nothing to do with how good an invention is, that’s a marketing problem.

AndrasKrigare@beehaw.org · 1 month ago

No? I have a pair of shoes that advertise as being great for running and walking. I love walking in them, but they suck for running. Are you saying the shoes suck and I shouldn’t use them at all, even though I like walking in them?

Tools don’t care about intent, and neither should you. Only things that work and things that don’t. And if it doesn’t work, you should use a different tool.

AndrasKrigare@beehaw.org · 2 months ago

I don’t disagree with your conclusion, but I think part of why it sucks now is all the Search Engine Optimization, of people trying to game Google into showing you their website, and only necessarily the one most pertinent to your search

AndrasKrigare@beehaw.org · 3 months ago

I’m always reminded of https://youtu.be/ZI0w_pwZY3E for Skype

AndrasKrigare@beehaw.org · 3 months ago

I mean, I guess, but that’s only a selling point to the small number of people without smartphones, which isn’t a large enough group to make it a sound business strategy.

AndrasKrigare@beehaw.org · 3 months ago

Also, the “(after federal incentives)” is doing a lot of heavy lifting here. The basic option for the 2023 Bolt comes out to about $20K after federal incentives, but you get way more range and a bunch of those “luxury” features this is missing. Considering how cheap low-end smart phones are, I have a hard time imagining that infotainment systems actually add more than 1-2% of the cost of the vehicle. Feels more like a type of virtue signal than a real cost-saving measure.

AndrasKrigare@beehaw.org · 3 months ago

Corporations cannot create nontoxic social media, the incentives will always be there to make it toxic.

I don’t know that’s true. The incentives to make it toxic come from engagement being the goal, which is a function of advertising being the income. I’m not advocating for it, but if there were a flat subscription and no ads, I don’t think they’d have any economic pressures for toxicity.

AndrasKrigare@beehaw.org · 4 months ago

I haven’t heard of that being what threading is, but that threading is about shared resourcing and memory space and not any special relationship with the scheduler.

Per the wiki:

On a multiprocessor or multi-core system, multiple threads can execute in parallel, with every processor or core executing a separate thread simultaneously; on a processor or core with hardware threads, separate software threads can also be executed concurrently by separate hardware threads.

https://en.m.wikipedia.org/wiki/Thread_(computing)

I also think you might be misunderstanding the relationship between concurrency and parallelism; they are not mutually exclusive. Something can be concurrent through parallelism, as the wiki page has (emphasis mine):

Concurrency refers to the ability of a system to execute multiple tasks through simultaneous execution or time-sharing (context switching), sharing resources and managing interactions.

https://en.m.wikipedia.org/wiki/Concurrency_(computer_science)

AndrasKrigare@beehaw.org · 4 months ago

Correct, which is why before I had said

I think OP is making a joke about python’s GIL, which makes it so even if you are explicitly multi threading, only one thread is ever running at a time, which can defeat the point in some circumstances.

AndrasKrigare@beehaw.org · 4 months ago

If what you said were true, wouldn’t it make a lot more sense for OP to be making a joke about how even if the source includes multi threading, all his extra cores are wasted? And make your original comment suggesting a coding issue instead of a language issue pretty misleading?

But what you said is not correct. I just did a dumb little test

import threading 
import time

def task(name):
  time.sleep(600)

t1 = threading.Thread(target=task, args=("1",))
t2 = threading.Thread(target=task, args=("2",))
t3 = threading.Thread(target=task, args=("3",))

t1.start()
t2.start()
t3.start()

And then ps -efT | grep python and sure enough that python process has 4 threads. If you want to be even more certain of it you can strace -e clone,clone3 python ./threadtest.py and see that it is making clone3 syscalls.

AndrasKrigare@beehaw.org · 4 months ago

I think OP is making a joke about python’s GIL, which makes it so even if you are explicitly multi threading, only one thread is ever running at a time, which can defeat the point in some circumstances.

AndrasKrigare@beehaw.org · 9 months ago

One note on “sick” being slang for “good”: that particular slang started in the 80s, and some of the younger generation consider it to be old person slang.

AndrasKrigare@beehaw.org · 9 months ago

I’d say it’s not just misleading but incorrect if it says “integer” but it’s actually floats.

AndrasKrigare@beehaw.org · 10 months ago

I think to some extent it’s a matter of scale, though. If I advertise something as a calculator capable of doing all math, and it can only do one problem, it is so drastically far away from its intended purpose that the meaning kinda breaks down. I don’t think it would be wrong to say “it malfunctions in 99.999999% of use cases” but it would be easier to say that it just doesn’t work.

Continuing (and torturing) that analogy, if we did the disgusting work of precomputing all 2 number math problems for integers from -1,000,000 to 1,000,000 and I think you could say you had a (really shitty and slow) calculator, which “malfunctions” for numbers outside that range if you don’t specify the limitation ahead of time. Not crazy different from software which has issues with max_int or small buffers.

If it were the case that there had only been one case of a hallucination with LLMs, I think we could pretty safely call that a malfunction (and we wouldn’t be having this conversation). If it happens 0.000001% of the time, I think we could still call it a malfunction and that it performs better than a lot of software. 99.999% of the time, it’d be better to say that it just doesn’t work. I don’t think there is, or even needs to be, some unified understanding of where the line is between them.

Really my point is there are enough things to criticize about LLMs and people’s use of them, this seems like a really silly one to try and push.

AndrasKrigare@beehaw.org · 10 months ago

We’re talking about the meaning of “malfunction” here, we don’t need to overthink it and construct a rigorous proof or anything. The creator of the thing can decide what the thing they’re creating is supposed to do. You can say

hey, it did X, was that supposed to happen?

no, it was not supposed to do that, that’s a malfunction.

We don’t need to go to

Actually you never sufficiently defined its function to cover all cases in an objective manner, so ACTUALLY it’s not a malfunction!

Whatever, it still wasn’t supposed to do that

AndrasKrigare@beehaw.org · 10 months ago

The purpose of an LLM, at a fundamental level, is to approximate text it was trained on.

I’d argue that’s what an LLM is, not its purpose. Continuing the car analogy, that’s like saying a car’s purpose is to burn gasoline to spin its wheels. That’s what a car does, the purpose of my car is to get me from place to place. The purpose of my friend’s car is to look cool and go fast. The purpose of my uncle’s car is to carry lumber.

I think we more or less agree on the fundamentals and it’s just differences between whether they are referring to a malfunction in the system they are trying to create, in which an LLM is a key tool/component, or a malfunction in the LLM itself. At the end of the day, I think we can all agree that it did a thing they didn’t want it to do, and that an LLM by itself may not be the correct tool for the job.

AndrasKrigare@beehaw.org · 10 months ago

Where I don’t think your argument fits is that it could be applied to things LLMs can currently do. If I have an insufficiently trained model which produces a word salad to every prompt, one could say “that’s not a malfunction, it’s still applying weights.”

The malfunction is in having a system that produces useful results. An LLM is just the means for achieving that result, and you could argue it’s the wrong tool for the job and that’s fine. If I put gasoline in my diesel car and the engine dies, I can still say the car is malfunctioning. It’s my fault, and the engine wasn’t ever supposed to have gas in it, but the car is now “failing to function in a normal or satisfactory manner,” the definition of malfunction.

AndrasKrigare@beehaw.org · 10 months ago

It implies that, under the hood, the LLM is “malfunctioning”. It is not - it’s doing what it is supposed to do, to chain tokens through weighted probabilities.

I don’t really agree with that argument. By that logic, there’s really no such thing as a software bug, since the software is always doing what it’s supposed to be doing: giving predefined instructions to a processor that performs some action. It’s “supposed to” provide a useful response to prompts, anything other than is it not what it should be and could be fairly called a malfunction.