• 0 Posts
  • 36 Comments
Joined 2 years ago
cake
Cake day: June 13th, 2023

help-circle
  • Ugh this just reminded me that I ran into this exact issue a couple years ago. We were running jobs every hour to ingest data from an API into our data warehouse. Eventually we got reports from users about having gaps in our data. We dug into it for days trying to find a pattern, but couldn’t pinpoint anything. We were just missing random pieces of data, but our jobs never reported any failures.

    Eventually we were able to determine the issue. HTTP 200 with “error: true” in the response. Fml







  • It’s kind of funny, but we all do this to some extent. I used to think most people on Reddit were super smart. If someone says stuff with authority, then it’s easy to believe what they’re saying and assume they know what they’re talking about.

    But then every once in a while, I’d come across a topic that I know deeply about - and the comment would just be blatantly wrong, but still have tons of up votes. It really made me start second guessing all the other comments I had read and thought were smart, but it’s an easy trap to fall into.

    I guess what I’m really saying, is that you all are a bunch of morons, probably.



  • Same. Especially since I’ve been building EDWs for most of my career. People are always surprised that it actually takes time to integrate with different systems.

    “What do you mean you can’t just pull all the data out of this system that we don’t have database access to and are still building out the APIs?”

    I kid… The people asking for stuff don’t know what backend databases and APIs are.



  • I got pulled into a meeting with a team from AWS. I was told they were looking to implement a new solution, so I had to explain in detail how our data lake and data warehouse solution worked. I showed them how we pull data from all these different sources, how we have different integration patterns, etc.

    At the end of my presentation, I asked “does that give you what you guys need? Or do I need to go into any more detail about anything specific? I don’t know what you all are actually building, so I’d be happy to provide more detail where you need it.”

    Their response was “yeah that was all great info. We’re looking to build an app using AI and ML that allows you to run the business with a click of a button.”

    I’m glad it was a remote meeting without cameras, because I literally face palmed. They didn’t have an actual use case or problem they were trying to solve. They were literally just selling a solution built on AI and ML. They didn’t know what it was gonna do, but by God they were committed to selling it.


  • The problem I ran into was the response returned a JSON body, but then had an “error” attribute that was returned in it that had the error details. So we were parsing the JSON and loading elements into our database. We were hitting the API passing in a datetime of when the last success job was run, so basically saying “give me everything that’s changed since I last called you.”

    So yeah, eventually we noticed we were missing small chunks of data. It turned out that every time the API errored out, we’d get a valid JSON response that contained the error message, but it didn’t have the attributes we were looking for. So didn’t load anything, but updated our timestamp to say when our last successful call was.

    Huge pain in the ass to troubleshoot, because the missing data was scattered with no distinguiable pattern.









  • I just discovered that while the ServiceNow APIs return all times in UTC, they use the user’s default time for all times passed in as a parameter.

    So if your account is set up in PDT and you say “give me this item that I just created”, it will say “here your item, this was created at 17:00”.

    But if you say, “cool let me see all items created in the last hour, so anything greater than 16:00”, then it will respond “got nothing for ya, chief.”