• 0 Posts
  • 311 Comments
Joined 2 years ago
cake
Cake day: August 9th, 2023

help-circle






  • Some time ago, I heard a story of CS and Econ professors having lunch together. The Econ professor was excited that Excel was going to release a version that blew out the 64k row limit. The CS professor nearly choked on his lunch.

    Dependence on Excel has definitely caused bad papers to be published in the Econ space, and has had real world consequences. There was a paper years ago that stated that once a country’s debt gets above 120% of GDP, its economy goes into a death spiral. It was passed around as established fact by the sorts of politicians who justify austerity. Problem was, nobody could reproduce the results. Then an Econ undergrad asked the original author for their Excel spreadsheet, and they found a coding error in the formulas. Once corrected, the conclusion disappeared.











  • The argument here is that checking complex validation is a fool’s errand. Yes, you can write a fully validating regex for RFC email. In fact, it should be possible to write a regex shorter than the one that gets passed around since the 90s, because regular expression engines support recursive patterns now. (Part of the reason that old regex is so complicated is because email allows nested comments (which is insane (how insane? (Lisp levels insane)))).

    However, it doesn’t get you much of anywhere. What you really want to know is if it’s a valid email or not, and the only way to do that is to send an email to that address with a confirmation. The only point of the regex is to throw away obviously bad addresses. For that, checking that there’s an @ symbol and something for the user and domain portions is sufficient. I’d add needing a dot in the domain portion, but it’s not that important.

    Classically, it was argued that emails don’t even need a domain portion when things are done for internal systems, or that internal domains don’t need a tld. In my personal experience, this is rarely done anymore and can be safely ignored. Maybe some very, very old legacy systems, and if you’re working on one of those, then sure. For everyone else, don’t worry about it. You’re probably working on publicly accessible systems, and even if you’re not, most users are going to prefer using their fully spec’d out email address, anyway.




  • A whole lot. Too much to cover in one post in any kind of detail.

    A modern relational database management system (RDBMS) is a highly optimized beast. How it accesses storage is very carefully considered. It has a whole mini language for defining relations between data. There are tools for debugging specific queries to make them faster. They index data with tradeoffs between read and write speeds. There are sophisticated locking mechanisms so multiple users can read and write at the same time. They have transactions where many alterations can be packed up together and written efficiently at once. Those transactional alterations are atomic, meaning there are guarantees that all of them happen or none of them happen. The entire thing is based on set theory, and it has survived attacks by many other pretenders to the throne for decades.

    And if you’re using Oracle, you can get all that while paying a highly optimized pricing model set up by the best financial advisors Larry Ellison can find to maximize value extraction from your company.