Lessons on IT Resilience from a Hurricane

essidsolutions

Besides battening down the hatches in the face of a hurricane or any other disaster, organizations need to worry about the day after – getting back online if they experience an outage. Here are some ideas on how to accomplish that.

It’s hurricane season once again – ask the victims in the Carolinas who just bore the brunt of an angry and wet Florence – and like the inevitability of the big storms, there are other inevitable features of the season that you can bank on as well. For example, the media will trot out its stock footage of the crowds at Walmart grabbing everything they can off the shelves in order to stock up before the worst.

Scenes of the crushing inland-bound traffic will be broadcast from coast to coast, as people seek to get away from the storm. And, there will always be the holdouts who refuse to leave their homes, despite dire warnings from authorities that they need to evacuate.

And another inevitability of hurricane season are the outages that will ensue. Power plants, ISPs, cloud service providers, banks, insurance companies, airlines – all are potential victims of the storm that could flood their facilities, short out their electricity, or create a personal vacuum, with workers unable to reach their offices in order to maintain facilities.

The inevitable result of a big storm, then, is the inconvenience, losses – and suffering – of the victims, whether they are residents of affected areas or companies that lie in the path of the storm. For individuals or families, those losses could be in the form of a house that needs major repairs due to flooding or wind damage – while for companies, the losses could amount to millions of dollars.

According to a research by IDC, infrastructure failure can cost large companies as much as $100,000 per hour, while the failure of critical applications could cost as much as a million Opens a new window dollars an hour.

Question: If those losses and that suffering is inevitable – as in, we know they are going to happen – why aren’t we better prepared for them? Why do companies allow themselves the risk of losing millions of dollars, when they know a storm is coming – or will inevitably come at some point? What should they be doing to better prepare for an IT emergency that is all but guaranteed to hit?

Why is a question we can’t really answer; no doubt the reason is different for each organization. We know that they do invest in preparing business continuity plans and various disaster recovery technologies. The question, of course, is how effective those plans are, and what they entail. There are many issues that need to be considered, and anticipating them all in advance is a science unto itself.

“Prepping” for disaster recovery means different things, depending on the situation. To prepare for the possibility of a power failure, organizations might want to invest in generators to keep their core operations going; off-site backups and data replication, a good DR (disaster recovery) plan to restore data, replicating services off-site, IT resilience validation tools that can automatically determine where a point of failure is, and other strategies.

It’s true that implementing these stopgap measures will cost money and require the dedication of time and personnel, but the money spent will be a drop in the bucket in the event – probably an inevitable one – that an organization is off-line for hours, or even days.

While many outage-causing disasters can’t be avoided, some can. To avoid the configuration errors that can cause outages, organizations should implement tools that automate resilience validation, parsing through systems to determine where a point of failure is.

Key capabilities to look for in these tools include support for a wide set of IT layers and technology stacks; the ability to connect to existing ITSM and CMDB tools and to provide business awareness; and built-in libraries of industry best practices.

Based on how such systems work in some of the most sophisticated IT firms in the world, the most critical considerations are systems that provide:

  • A clear and realistic resilience strategy
  • Clear KPIs for everything related to resilience and data protection
  • Automating the quality check process – not just the deployment

Taking a holistic approach, such as this for resilience and quality will ensure that an organization can survive the next hurricane – or any other disaster.