@merc

merc@sh.itjust.works · 5 hours ago

It’s also the system administrator and SRE mindset.

merc@sh.itjust.works · 1 month ago

Spoiler

Your spoiler didn’t work.

merc@sh.itjust.works · 1 month ago

I believe that, because test scripts tend to involve a lot of very repetitive code, and it’s normally pretty easy to read that code.

Still, I would bet that out of 1000 tests it writes, at least 1 will introduce a subtle logic bug.

Imagine you hired an intern for the summer and asked them to write 1000 tests for your software. The intern doesn’t know the programming language you use, doesn’t understand the project, but is really, really good at Googling stuff. They search online for tests matching what you need, copy what they find and paste it into their editor. They may not understand the programming language you use, but they’ve read the style guide back to front. They make sure their code builds and runs without errors. They are meticulous when it comes to copying over the comments from the tests they find and they make sure the tests are named in a consistent way. Eventually you receive a CL with 1000 tests. You’d like to thank the intern and ask them a few questions, but they’ve already gone back to school without leaving any contact info.

Do you have 1000 reliable tests?

merc@sh.itjust.works · 1 month ago

That’s the problem. Maybe it is.

Maybe the code the AI wrote works perfectly. Maybe it just looks like how perfectly working code is supposed to look, but doesn’t actually do what it’s supposed to do.

To get to the train tracks on the right, you would normally have dozens of engineers working over probably decades, learning how the old system worked and adding to it. If you’re a new engineer and you have to work on it, you might be able to talk to the people who worked on it before you and find out how their design was supposed to work. There may be notes or designs generated as they worked on it. And so-on.

It might take you months to fully understand the system, but whenever there’s something confusing you can find someone and ask questions like “Where did you…?” and “How does it…?” and “When does this…?”

Now, imagine you work at a railroad and show up to work one day and there’s this whole mess in front of you that was laid down overnight by some magic railroad-laying machine. Along with a certificate the machine printed that says that the design works. You can’t ask the machine any questions about what it did. Or, maybe you can ask questions, but those questions are pretty useless because the machine isn’t designed to remember what it did (although it might lie to you and claim that it remembers what it did).

So, what do you do, just start running trains through those tracks, assured that the machine probably got things right? Or, do you start trying to understand every possible path through those tracks from first principles?

merc@sh.itjust.works · 1 month ago

LGTM: Large Glitches, Test More!

merc@sh.itjust.works · 1 month ago

A funny, but incredibly subtle joke to do would be to do a post like this, but get the indentation subtly wrong somewhere, so something that’s supposed to be inside a loop is outside according to indentation, but is inside according to braces.

merc@sh.itjust.works · 1 month ago

Yeah, I love that one.

“Try” is too hopeful. “fuck_around” makes it clear that you know what you’re doing is dangerous but you’re going to do it anyhow. I know that in some languages wrapping a lot of code in exception blocks is the norm, but I don’t like that. I think it should be something you only use rarely, and when you do it’s because you know you’re doing something that’s not safe in some way.

“Catch” has never satisfied me. I mean, I know what it does, but it doesn’t seem to relate to “try”. Really, if “try” doesn’t succeed, the corresponding block should be “fail”. But, then you’d have the confusion of a block named “fail”, which isn’t ideal. But “find_out” pairs perfectly with “fuck_around” and makes it clear that if you got there it’s because something went wrong.

I also like “yeet”. Partly it’s fun for comedic value. But, it’s also good because “throw” feels like a casual game of catch in the park. “Yeet” feels more like it’s out of control, if you hit a “throw” your code isn’t carefully handing off its state, it’s hitting the eject button and hoping for the best. You hope there’s an exception handler higher up the stack that will do the right thing, but it also might just bubble all the way up to the top and spit out a nasty exception for the user.

merc@sh.itjust.works · 1 month ago

Be concise. If someone misinterprets, apologize. Continue to be concise.

merc@sh.itjust.works · 1 month ago

The problem is that too often people interpret tight emails as being rude or angry. But, LLMs aren’t the solution. The solution is to adjust people’s expectations.

merc@sh.itjust.works · 1 month ago

Summarizing requires understanding what’s important, and LLMs don’t “understand” anything.

They can reduce word counts, and they have some statistical models that can tell them which words are fillers. But, the hilarious state of Apple Intelligence shows how frequently that breaks.

merc@sh.itjust.works · 2 months ago

I hate these. You don’t need to program for very long before you see one of these. And, you get used to the idea that when it says there’s an error on a blank line, that it means something isn’t properly terminated on one of the previous lines. But, man, I hate these.

At the very least, you’d hope that by now compilers/interpreters would be able to say “error somewhere between line 260 and 265”. Or, more usefully “Expected a closing ‘)’ before line 265, opening ‘(’ was on line 260”.

Error on <blank line> just pisses me off because the compiler / interpreter should know that that isn’t true. Whoever wrote the compiler is a seasoned developer who has been hit by this kind of error message countless times. They must know how annoying it is, and yet…

merc@sh.itjust.works · 7 months ago

Your well come.

merc@sh.itjust.works · 7 months ago

I miss Allie’s blog alot.

merc@sh.itjust.works · 7 months ago

Noone should of aloud this code to go out the door. Atleast alot of other people people probably complained aswell, so your apart of a bigger group, incase you were worried.

spoiler

And yes, this was painful to type.

merc@sh.itjust.works · 7 months ago

COBOL

merc@sh.itjust.works · 8 months ago

Or do it the other way around, the amount of money that the government can spend is limited by voter turnout.

Clearly if you can’t get people excited enough to vote for your policies, you don’t have a mandate.

merc@sh.itjust.works · 8 months ago

Yeah, and some of the biggest states (like California) solidly go for one party. So, the non-voters really don’t affect the presidential race there.

merc@sh.itjust.works · 11 months ago

I mean alledgedly ChatGPT passed the “bar-exam” in 2023. Which I find ridiculous considering my experiences with ChatGPT and the accuracy and usefulness I get out of it which isn’t that great at all

Exactly. If it passed the bar exam it’s because the correct solutions to the bar exam were in the training data.

The other side can immediately tell that somebody has made an imitation without understanding the concept.

No, they can’t. Just like people today think ChatGPT is intelligent despite it just being a fancy autocomplete. When it gets something obviously wrong they say those are “hallucinations”, but they don’t say they’re “hallucinations” when it happens to get things right, even though the process that produced those answers is identical. It’s just generating tokens that have a high likelihood of being the next word.

People are also fooled by parrots all the time. That doesn’t mean a parrot understands what it’s saying, it just means that people are prone to believe something is intelligent even if there’s nothing there.

ChatGPT refuses to tell illegal things, NSFW things, also medical advice and a bunch of other things

Sure, in theory. In practice people keep getting a way around those blocks. The reason it’s so easy to bypass them is that ChatGPT has no understanding of anything. That means it can’t be taught concepts, it has to be taught specific rules, and people can always find a loophole to exploit. Yes, after spending hundreds of millions of dollars on contractors in low-wage countries they think they’re getting better at blocking those off, but people keep finding new ways of exploiting a vulnerability.

merc@sh.itjust.works · 11 months ago

Yeah, that’s basically the idea I was expressing.

Except, the original idea is about “Understanding Chinese”, which is a bit vague. You could argue that right now the best translation programs “understand chinese”, at least enough to translate between Chinese and English. That is, they understand the rules of Chinese when it comes to subjects, verbs, objects, adverbs, adjectives, etc.

The question is now whether they understand the concepts they’re translating.

Like, imagine the Chinese government wanted to modify the program so that it was forbidden to talk about subjects that the Chinese government considered off-limits. I don’t think any current LLM could do that, because doing that requires understanding concepts. Sure, you could ban key words, but as attempts at Chinese censorship have shown over the years, people work around word bans all the time.

That doesn’t mean that some future system won’t be able to understand concepts. It may have an LLM grafted onto it as a way to communicate with people. But, the LLM isn’t the part of the system that thinks about concepts. It’s the part of the system that generates plausible language. The concept-thinking part would be the part that did some prompt-engineering for the LLM so that the text the LLM generated matched the ideas it was trying to express.

merc@sh.itjust.works · 11 months ago

The “learning” in a LLM is statistical information on sequences of words. There’s no learning of concepts or generalization.

And what do you think language and words are for? To transport information.

Yes, and humans used words for that and wrote it all down. Then a LLM came along, was force-fed all those words, and was able to imitate that by using big enough data sets. It’s like a parrot imitating the sound of someone’s voice. It can do it convincingly, but it has no concept of the content it’s using.

How do you learn as a human when not from words?

The words are merely the context for the learning for a human. If someone says “Don’t touch the stove, it’s hot” the important context is the stove, the pain of touching it, etc. If you feed an LLM 1000 scenarios involving the phrase “Don’t touch the stove, it’s hot”, it may be able to create unique dialogues containing those words, but it doesn’t actually understand pain or heat.

We record knowledge in books, can talk about abstract concepts

Yes, and those books are only useful for someone who has a lifetime of experience to be able to understand the concepts in the books. An LLM has no context, it can merely generate plausible books.

Think of it this way. Say there’s a culture where instead of the written word, people wrote down history by weaving fabrics. When there was a death they’d make a certain pattern, when there was a war they’d use another pattern. A new birth would be shown with yet another pattern. A good harvest is yet another one, and so-on.

Thousands of rugs from that culture are shipped to some guy in Europe, and he spends years studying them. He sees that pattern X often follows pattern Y, and that pattern Z only ever seems to appear following patterns R, S and T. After a while, he makes a fabric, and it’s shipped back to the people who originally made the weaves. They read a story of a great battle followed by lots of deaths, but surprisingly there followed great new births and years of great harvests. They figure that this stranger must understand how their system of recording events works. In reality, all it was was an imitation of the art he saw with no understanding of the meaning at all.

That’s what’s happening with LLMs, but some people are dumb enough to believe there’s intention hidden in there.