A Brief History of Screwing Up Software

bsod

Unless you live in a cave you have heard by now about the recent massive AWS outage and how it kind of broke the Internet for a lot of people.  Amazon posted an account of what went wrong, and the root cause is the sort of thing that makes you cringe.  One typo in one command was all it took to take a huge number of customers and sites offline.  If you have been a software developer or administrator or in some other way have had your hands on important production systems you can’t help but feel some sympathy for the person responsible for the AWS outage.  Leaving aside the wisdom of giving one person the power and responsibility for such a thing, I think we have all lived in fear of that moment.  We’ve all done our fair share of dumb things during our tech careers.  In the interest of commiserating with that poor AWS engineer, here are some of the dumbest things I’ve done during my life in tech:

  1. Added four more layers of duct tape to the “infrastructure” that holds the internet together with several bad routing table choices.
  2. Had my personal site hacked and turned into a spam spewing menace.  Twice.  Pay attention to those Joomla and Drupal security advisories folks, those that would do you harm sure do!
  3. Relied on turning it off and then back on again to fix a deadlock I couldn’t find a root cause for.  Embedded systems watchdog FTW.
  4. Wrote my own implementation of an HTTP server.  I recommend everybody do this at least once just so you can see how good you have it.  Mine ended up being vulnerable to a directory traversal attack.  Thankfully a friend caught it before somebody evil did.
  5. Used VB6 for a real project that ended up serving 100x as many users as it was intended to.  Actually, let’s just expand that to “used VB6”.
  6. Done many “clever” things in my projects that came back to bite me later.  Nothing like writing code and then finding out a year later that you can’t understand what you did.  Protip:  Don’t try to be clever, be clear instead.
  7. Ran a query with a bad join that returned a cartesian product.  On a production database that was already underpowered.  With several million rows in each table.
  8. Ran another query that inadvertently updated every row in a huge table when I only actually needed to update a handful.  Where’s that WHERE clause again?  Backups to the rescue!

Anybody that spends decades monkeying around with servers and code will have their own list just like this one.  If they tell you they don’t they are either too arrogant to realize it or are lying to you.  I’m happy to say that I learned something valuable from each and every example above and it’s made me better at my job.

What’s your most memorable mistake?