People tend to ignore how a thing works until it, all of a sudden, stops working. After a 4-hour outage, Amazon’s AWS cloud computing division came under scrutiny from major players affected. Now, after a stint of blind trust, Amazon’s cloud services are stained with an indelible question mark.
Grammarly, Medium, Slack, and Trello were all collaterally affected by the outage. Users flocked to Twitter for signs of life: some mildly curious, others increasingly panicked. One of the most interesting aspects of the ordeal was how some companies handled customer relations.
Grammarly played the finger pointing game. Rather than issuing the prototypical, Sorry but we are experiencing technical issues error page, a few of the companies decided to displace blame. For example, Grammarly stated: “Grammarly runs on Amazon Web Services, and they are currently experiencing an outage.”
This shift in responsibility was a watershed moment. For as long as I can remember, organizations struggled with the responsibilities of hosting services in-house. Now, many CEO’s are paying to unload that burden on someone else. This change makes it easier to point the finger when services are compromised. If responsibility indeed rests on the third-party provider, does that mean they must float the bill?
An analysis by Cyence shows that Tuesday’s 4-hour disruption amounted to a loss of roughly $150 million to $160 million for S&P 500 companies. That’s a wake-up call for cloud service providers. And a considerable aspect of this fiasco as executives pour into conferences to convene and discuss.
Overreliance on Others
The narrative of collaboration has generated tremendous benefits. Cloud services are an extension of collaboration culture, facilitated by our ever more networked market landscape. That is why the outage highlighted a fundamental market problem: the perils of overreliance.
A handful of companies handed over the keys to the kingdom when they signed up to AWS, assuming things would always work swimmingly. The outage proved them wrong. Amazon remedied the issue as quickly as possible, but now they have a marketing crisis on their hands. What’s more, Amazon was one of the companies unaffected by the outage. That raised eyebrows among 54 of the top 100 Internet retailers that went offline at the same time.
The Hype and the Myth
It is not uncommon that a site crashes once in a while. No site works 100% of the time. Nevertheless, any major outage will draw widespread media attention and prompt deeper analysis. I don’t know if it’s because people like to see the mighty fall, or if it’s that people expect technology will work 100% of the time, but this event was a bit overhyped.
The cost vs. performance balance is predicated on the understanding that nothing is perfect. There will be failures. How often, or how big, is a separate matter. Bottom line: when those 54 companies signed on the dotted line, they accepted 99% availability.
Amazon boasts a slim margin of error. Few companies can compete with Amazon’s reliability. That being said, what I glean from this occurrence is that if Amazon’s AWS systems can crash, then surely other comparable services can as well.
That is why everyone who is reliant on a third-party service should have a backup plan. Then finger pointing wouldn’t be necessary, making it seem like an outage is a surprising anomaly when it’s a fact of digital business.
The most sensible approach is to balance internal and external servers to create a robust infrastructure. That’s in an ideal world. The truth is that the cost vs. performance argument will likely prompt continued risk-taking for the sake of savings—which will just bring more finger pointing in the future.