• 0 Posts
  • 112 Comments
Joined 2 years ago
cake
Cake day: June 9th, 2023

help-circle
  • I think “I don’t know” might sometimes be found in the training data. But, I’m sure they optimize the meta-prompts so that it never shows up in a response to people. While it might be the “honest” answer a lot of the time, the makers of these LLMs seem to believe that people would prefer confident bullshit that’s wrong over “I don’t know”.


  • No, I’m sure you’re wrong. There’s a certain cheerful confidence that you get from every LLM response. It’s this upbeat “can do attitude” brimming with confidence mixed with subservience that is definitely not the standard way people communicate on the Internet, let alone Stack Overflow. Sure, sometimes people answering questions are overconfident, but it’s often an arrogant kind of confidence, not a subservient kind of confidence you get from LLMs.

    I don’t think an LLM can sound like it lacks in confidence for the right reasons, but it can definitely pull off lack of confidence if it’s prompted correctly. To actually lack confidence it would have to have an understanding of the situation. But, to imitate lack of confidence all it would need to do is draw on all the training data it has where the response to a question is one where someone lacks confidence.

    Similarly, it’s not like it actually has confidence normally. It’s just been trained / meta-prompted to emit an answer in a style that mimics confidence.








  • No, I don’t think so. It’s true that many of the earliest programmers were female, but there were very few of them, and that was a long time ago.

    In a way, Ada Lovelace was the first programmer, but she never even touched a computer. The first programmers who did anything similar to today’s programming were from Grace Hopper’s era in the 1950s.

    In the late 1960s there were a lot of women working in computer programming relative to the size of the field, but the field was still tiny, only tens of thousands globally. By the 1970s it was already a majority male profession so the number of women was already down to only about 22.5%.

    That means that for 50 years, a time when the number of programmers increased by orders of magnitude, the programmers were mostly male.


  • Saying we can solve the fidelity problem is like Jules Verne in 1867 saying we could get to the moon with a cannon because of “what progress artillery science has made during the last few years”.

    Do rockets count as artillery science? The first rockets basically served the same purpose as artillery, and were operated by the same army groups. The innovation was to attach the propellant to the explosive charge and have it explode gradually rather than suddenly. Even the shape of a rocket is a refinement of the shape of an artillery shell.

    Verne wasn’t able to imagine artillery without the cannon barrel, but I’d argue he was right. It was basically “artillery science” that got humankind to the moon. The first “rocket artillery” were the V1 and V2 bombs. You could probably argue that the V1 wasn’t really artillery, and that’s fair, but also it wasn’t what the moon missions were based on. The moon missions were a refinement of the V2, which was a warhead delivered by launching something on a ballistic path.

    As for generative AI, it doesn’t have zero fidelity, it just has relatively low fidelity. What makes that worse is that it’s trained to sound extremely confident, so people trust it when they shouldn’t.

    Personally, I think it will take a very long time, if ever, before we get to the stage where “vibe coding” actually works well. OTOH, a more reasonable goal is a GenAI tool that you basically treat as an intern. You don’t trust it, you expect it to do bone-headed things frequently, but sometimes it can do grunt work for you. As long as you carefully check over its work, it might save you some time/effort. But, I’m not sure if that can be done at a price that makes sense. So far the GenAI companies are setting fire to money in the hope that there will eventually be a workable business model.


  • If you use it basically like you’d use an intern or junior dev, it could be useful.

    You wouldn’t allow them to check anything in themselves. You wouldn’t trust anything they did without carefully reading it over. You’d have to expect that they’d occasionally completely misunderstand the request. You’d treat them as someone completely lacking in common sense.

    If, with all those caveats, you can get this assistance for free or nearly free, it might be worth it. But, right now, all the AI companies are basically setting money on fire to try to drive demand. If people had to pay enough that the AI companies were able to break even, it might be so expensive it was no longer worth it.





  • The privacy issues are nasty, but a smart toilet could actually be an incredibly useful device.

    Can you imagine if every time you went to the bathroom, your toilet could do some of the basic stool / urine tests you get at the doctor’s office? Certain diseases could be caught extremely early, and you wouldn’t have to do anything different.

    And then there are bidet functions. Forget smearing poop all over your ass with paper, wash the poop off with nice warm water every time.

    I wouldn’t want to have to use a smartphone app for that, but there’s no reason you couldn’t have a simple set of buttons on the toilet itself. You could keep the manual flush lever and only use that if you preferred, but if you wanted an even better experience and a better clean, that option would be available.





  • I believe that, because test scripts tend to involve a lot of very repetitive code, and it’s normally pretty easy to read that code.

    Still, I would bet that out of 1000 tests it writes, at least 1 will introduce a subtle logic bug.

    Imagine you hired an intern for the summer and asked them to write 1000 tests for your software. The intern doesn’t know the programming language you use, doesn’t understand the project, but is really, really good at Googling stuff. They search online for tests matching what you need, copy what they find and paste it into their editor. They may not understand the programming language you use, but they’ve read the style guide back to front. They make sure their code builds and runs without errors. They are meticulous when it comes to copying over the comments from the tests they find and they make sure the tests are named in a consistent way. Eventually you receive a CL with 1000 tests. You’d like to thank the intern and ask them a few questions, but they’ve already gone back to school without leaving any contact info.

    Do you have 1000 reliable tests?


  • That’s the problem. Maybe it is.

    Maybe the code the AI wrote works perfectly. Maybe it just looks like how perfectly working code is supposed to look, but doesn’t actually do what it’s supposed to do.

    To get to the train tracks on the right, you would normally have dozens of engineers working over probably decades, learning how the old system worked and adding to it. If you’re a new engineer and you have to work on it, you might be able to talk to the people who worked on it before you and find out how their design was supposed to work. There may be notes or designs generated as they worked on it. And so-on.

    It might take you months to fully understand the system, but whenever there’s something confusing you can find someone and ask questions like “Where did you…?” and “How does it…?” and “When does this…?”

    Now, imagine you work at a railroad and show up to work one day and there’s this whole mess in front of you that was laid down overnight by some magic railroad-laying machine. Along with a certificate the machine printed that says that the design works. You can’t ask the machine any questions about what it did. Or, maybe you can ask questions, but those questions are pretty useless because the machine isn’t designed to remember what it did (although it might lie to you and claim that it remembers what it did).

    So, what do you do, just start running trains through those tracks, assured that the machine probably got things right? Or, do you start trying to understand every possible path through those tracks from first principles?