Learning to Fail Forward

by | Community Blog | 0 comments

A few weeks ago the Amazon S3 Cloud service had some issues. Ok, maybe a lot of issues. According to The Guardian, “The Amazon Simple Storage Solution (S3) is used by tens of thousands of web services for hosting and backing up data….” What this means is that a lot of internet sites and mobile apps were not working, for a good portion of the day. This was not an ideal situation when having a site up and running is vital to your existence as a company. Assumptions were made all over the place about what was happening on the inter-webs; which was easy when people couldn’t use the internet to actually do work.

Without getting extremely technical, what happened the day the internet crashed (at least that is what it felt like), was an authorized Amazon S3 team member made a keystroke error that created a domino effect on Amazon’s server system. I don’t know what happened to the employee; I hope they were kept and not fired. However, what interested me the most was what happened as a result of this. Check out a portion of the statement from Amazon Web Services on this issue:

“….We are making several changes as a result of this operational event. While removal of capacity is a key operational practice, in this instance, the tool used allowed too much capacity to be removed too quickly. We have modified this tool to remove capacity more slowly and added safeguards to prevent capacity from being removed when it will take any subsystem below its minimum required capacity level. This will prevent an incorrect input from triggering a similar event in the future. We are also auditing our other operational tools to ensure we have similar safety checks. We will also make changes to improve the recovery time of key S3 subsystems….We will do everything we can to learn from this event and use it to improve our availability even further.”

Because someone made a typo a process was reviewed, modified, changed, and improved.

Sometimes it is through our mistakes, our failures, and our mess-ups that we learn to make what we do even better.

So, what can we learn from the keystroke error by an Amazon employee that took down half the internet?

First: We aren’t perfect; we are human. Because of this there will be days we make a mistake on a post or tweet. We will add the wrong image, create the wrong target audience, and forget to include the link that people need. It’s ok. We aren’t perfect; we are human. Learn to be ok with that and move forward.

Second: Review. For better or worse Amazon was forced to review their method and procedures to the debugging of the S3 billing process. This review caused an immediate modification. Sometimes when we have a post that doesn’t go as planned, a campaign that didn’t get the reach we wanted, or a strategy that doesn’t seem to be gaining traction we fail to review the process. Instead we often just do the same old thing over and over again. When something doesn’t go as planned or desired, figure out why – review the process, see where changes can be made, then modify.

Third: Learn. If we look at the case study from Amazon they learned in the process of modifying their system. They had to. They had to learn in order to create a new system and process in order to fix what went wrong. Be willing to learn. Sometimes this is the hard part. When we learn we are openly admitting to ourselves we don’t know it all, yet. This can be a humbling process, but a very crucial one. There is so much that changes in the world of digital communications that if we do not learn we will fall behind quickly. Be willing to learn how and why. Be willing to learn to improve. Be willing to learn in order to figure out what didn’t work.

Fourth: Make Changes. One of the worst things you can do is to review something, learn from it, and then never do anything about it. In order to move from good to great, successful companies will constantly review, learn and make changes. This process is necessary for success. Amazon immediately made changes to their process so that the error that occurred would be less likely to happen in the future. We need to do the same. If we have a communication mishap, review, learn, and then make a change so that it doesn’t happen again.

Fifth: Repeat. The likelihood that you will be void of errors, mistakes, typos, wrong audiences, communicating via the wrong twitter handle, etc. is pretty low. We all make mistakes; it is a part of life. When we do we cannot forget the process. We aren’t perfect. We need to review. We must learn. We need to make changes. Repeat.

It is important that we not fear failure or the what if’s, but rather embrace that mistakes will happen and learn from them.

Have you made mistakes before? If so, how did you learn from them and move forward?





Meghan is one of our regular contributors and is the Associate Pastor of Fairborn UMC in Fairborn, Ohio where she oversees all digital and social media communications. She also works as a digital communications consultant for churches, local businesses, and non-profits.

Read more articles from Meghan.