Cloud network outages are wrecking balls, and if this happens with a dominant market player like the Amazon Web Services (AWS), it raises a lot of eyebrows (and temperatures). The latest in line is an extensive outage brought on by a human error at an AWS data center in Virginia. Many consider it to be the worst hit in four years, considering the mammoth customer base and disparate service providers associated with the cloud offering.
To have an outage on this scale during the initial establishing years is still understandable, but not when AWS enjoys a privileged position of dominance and respect in the public cloud arena. AWS had a rough start with its public cloud crashing at least twice or thrice in a year. Though infrequent today, the scale and impact of such an outage is unacceptable, and the sooner AWS fixes these prickly issues, the better it is for its leadership position.
We revisit some of the cloud failures that AWS has been subjected to over the years.
- June, 2016: The storms that battered Sydney in June, 2016, also shook AWS services. An extensive power outage led to the failure of a number of Elastic Compute Cloud (EC2) instances and Elastic Block Store (EBS) volumes, many of which hosted critical workloads for big brands. The result was that a number of prime websites and online presence went down for 10 hours on the weekend, hitting businesses severely.
- November, 2014: A failure of the AWS CloudFront DNS server for a period of two hours in November 2014 led to some websites and online services being disabled. The reason was that the content delivery network failed to fulfil DNS requests.
- September, 2013: Infamously called the â€œFriday the 13th outage,â€ a load balancing issue led to some regional customers being hit for a period of two hours across one availability zone in Virginia. This time though, the AWS response was quick, and the company resolved the issues and increased provisioning times to prevent recurrence going ahead.
- December, 2012: The Christmas of 2012 was not so merry after all, especially for those affected by the much-talked-about AWS failure. As a result of the outage, Netflix was down on Christmas Eve, depriving many Americans of the much-needed Christmas cheer brought on by live streaming of entertainment. Netflix, not surprisingly, laid the unavailability of its services during such primetime squarely on AWS’s shoulders.
- June, 2012: The Virginia data center appeared to be jinxed, with yet another outage hitting services in this availability zone. A service disruption halted operations for about six hours, putting businesses into misery. This served as one of the first wake-up calls for how things could be impacted if the cloud went wrong.
- August, 2011: This was amongst the firsts, and although not as impactful, it was an eye-opener to the negatives of over reliance on the cloud. Thirty minutes of downtime impacted high-thoroughfare sites such as Netflix, Quora, Reddit, Foursquare, and social networks. The reason was pinpointed to connectivity issues between three of Amazon’s availability zones with the Internet. The damage was done both to EC2 and the Relational Database Service (RDS), as also to AWS’ reputation.
- April, 2011: A major disruption that forced many of Amazon’s bigger customers to remain offline for days together, this again was among the first rude awakeners to the reality of cloud. This may go down in AWS’ history as its blackest day, since AWS chose to hide behind silence rather than face the challenge head-on. The public cloud major took a painfully long week to awaken from its stupor and issue a public wordâ€”a highly technical and wordy explanation blaming the outage on a â€œremirroring storm.â€ A half-hearted apology followed, but this inappropriate handling of the issue left it in bad light.
Some of these errors are technical in nature, while others are related to grossly mishandled disaster management and communication response. Either way, it must be a lesson learnt for cloud service providers on what not to do, if they want the world to embrace the cloud.