10 Best Practices for Disaster Recovery Planning (DRP)

essidsolutions

Disaster recovery planning is defined as the process of creating a comprehensive plan that helps your organization resume work after the loss of data or IT equipment due to natural or human-made disasters. A good disaster recovery plan will make sure that this is done with minimal business disruption. This article introduces you to disaster recovery planning, key steps involved in creating it, and ten best practices to develop and implement a DRP template.

Table of Contents

What Is a Disaster Recovery Plan (DRP)?

A disaster recovery plan is defined as a well-defined set of actions that helps an organization recover its technology and operations based on its business policies. It is a component of security planning and a subset of business continuity planning. 

If there is anything 2021 has taught the world, it is that disaster strikes with no warning. Come pandemic or wildfires, businesses must be equipped to provide the services it has committed to, with zero to little disruption. One way of doing this is planning — figuring out which resources are essential and how they can be protected and backed up.

Importance of having a stable DRP:

  • Disaster management: No business can run successfully without substantial tech-based infrastructure. To put things in perspective, a 2018 BCI Survey Report says that the top supply chain disruptors are IT outages, cyberattacks, and transport network disruption, which are caused by natural or human-made disasters such as hurricanes, floods, wildfires, cyberattacks, power outages, and even acts of terrorism.
  • Cost of disruption: According to Dell’s 2021 GDPI snapshot, cyberattacks and disruptive events are rising meteorically. 82% of the organizations reported an unplanned disruption in the last one year, a number which was only 76% in 2018. In fact, these disruptions cost an estimated total of $810,018 — up from $526,845 the previous year. Hefty figures aside, business continuity is also a matter of reputation and trust for customers and stakeholders. 

Now that it is clear why every business needs a disaster recovery plan, let us take a look at the steps involved in creating a viable disaster recovery plan template.

Also Read: What Is Disaster Recovery? Definition, Cloud and On-premise, Benefits and Best Practices

8 Key Steps for a Disaster Recovery Plan

Let’s look at the step-by-step breakdown of the tasks required to build a robust and adaptive DRP.

1. Gather a team of experts and stakeholders

Creating a disaster recovery plan is not a one-person job. It involves input from various internal employees and external vendors. A good DRP team consists the following roles: 

    • Infrastructure SMEs: Creating a DRP requires an in-depth knowledge of all hardware, software, data, and network connectivity. This means that the corresponding domain experts from the organization’s IT department should be a part of the DRP team.
    • Individual department heads: While every business unit has its set of critical assets and functionalities, these are governed by compliance and legal regulations. It is, therefore, important to have someone representing each business unit.
    • Senior management: Since DRP is a part of business continuity planning (BCP), the organization’s business objectives and strategies are essential to setting DRP goals. Senior management must be involved to make these policy-level decisions.
    • Human resources: An HR representative must be present to enable smooth internal communication in case of work disruption.
    • Public relations officers: Having PROs in the team would be a plus for positive media outreach. This is important for keeping customers and stakeholders informed. 

Apart from these internal members, property managers, law enforcement contacts, and emergency responders must be added to the final disaster recovery plan. These are variables that need to be constantly updated at regular intervals.

2. Take inventory and analyze business impact

Business impact analysis (BIA) is the foundation of good DRP. In this step, the business is broken down into individual assets, services, and functions. Each asset and service is then evaluated based on how long the company can run without facing financial losses, reputational losses, or regulatory penalties if this asset fails. 

Inventory typically includes individual assets that drive the functioning of the organization. These include assets such as:

This step produces an inventory list along with cost, legal and regulatory requirements, details such as operating systems, configuration settings, version numbers, license keys, and criticality of each. Mission-critical assets — the breakdown of which can bring significant services of the company to a halt — are marked. 

Also Read: Top 8 Disaster Recovery Software Companies in 2021

3. Identify the disaster recovery planning metrics

Once the BIA is done, the business’s IT infrastructure and processes are broken down and quantified in terms of the cost of downtime and criticality. We can create formal and tangible goals of recovery for each function of the business.

    • Goal 1 — Determine the recovery time objective (RTO)

This is the amount of time a particular service can be offline without a significant business impact. For example, for an e-commerce website, the ‘Add to Cart’ functionality cannot be down for more than a few minutes. But the ‘Customer Care chat history’ option can be down for a couple of hours without significant impact.

    • Goal 2 — Determine the recovery point objective (RPO)

When we talk about addressing vulnerabilities during disasters, we are usually talking about security changes or data backup. The best way to prevent data loss would be to back up critical information into tiered servers or the cloud. The RPO determines how frequently this needs to be done for each asset or function. This essentially tells you how outdated your data can afford to be when an unplanned incident occurs.

For example, marketing and sales data can be more than 24 hours old without causing any real damage. But banking transactions need to be as recent as five minutes ago.

Keep in mind that these metrics do not depend on just business impact alone. Industry regulations need to be taken into account too. For instance, hospitals that lose patient electronic health records are subject to HIPAA penalties.

4. Conduct a risk assessment and identify the scope of the DRP

The BIA stage takes stock of what the business has to lose. The risk assessment stage looks into possible reasons for the loss. During risk assessment, make sure that you: 

    1. Analyze all potential threats to the functioning of the business. These threats include natural disasters, national emergencies and shutdowns, regional disasters, regulatory changes, application failures, data center disasters, communication breakdowns, and cyberattacks. To tackle these, make sure your contingency management includes hardware and other maintenance, protection from power outages, and security from ransomware.
    2. Evaluate business vulnerability for each threat. Quantify each threat with the time and resources it would take to address each threat. The potential cost of leaving each risk unaddressed must also be considered.
    3. Come up with a response plan for each vulnerability. These are preventive measures taken to minimize the damage caused by each threat. These preventative measures include:
      • Upgrading hardware and software
      • Putting security controls in place
      • Improving security policies
    4. Create a risk management plan based on associated costs and potential losses. Also, consider the frequency and probability of each threat. One way of documenting risk assessment is by using the risk assessment matrix. This strategy allows you to rank each disaster based on the likelihood of occurrence, how much it would impact business, and how prepared you are to face it. Based on these numbers, you can prioritize which risks to focus on while creating your disaster recovery plan template. 

Also Read: 5 Step Guide to Business Continuity Planning (BCP) in 2021

5. Decide on the type of disaster recovery plan

All businesses cannot use a one-size-fits-all disaster recovery plan template. Based on the results of the previous steps and the DRP budget, you can opt for one of the following types of DRP:

  1. Datacenter disaster recovery plan: A data center DRP involves investing in and maintaining a whole other data center building as a backup. This is usually called a disaster recovery site. When the primary operation goes down, this site is expected to be fully operational and kick in without delay. There are three types of data recovery sites: 
    • Cold site: Cold sites are infrastructural backups — office spaces with power, cooling, and communication systems. They do not house any hardware or have a network configured. In the case of primary system failure, the operational teams will need to migrate servers and set everything up from scratch. It is the least expensive option. However, it requires extra labor after the fact and may not meet the organization’s RTO goals if not executed properly.
    • Hot site: A hot site is the exact copy of the primary data center setup. It has all the necessary hardware, software, and network configured. Data is backed up based on RPO goals. In case of outages, the operations connect to the hot site without delay and continue with minimal downtime. Since this requires a constantly functioning setup, this is the most expensive option. It is also the most effective.
    • Warm site: A warm site is one that houses the necessary hardware with some pre-installed software and network configuration. Only mission-critical assets are backed up at less frequent intervals. This is a good option for organizations with less critical data and higher RPOs. A cost-benefit analysis may be required to decide between a hot site and a warm site. 

2. Virtualization based DRP: Virtualization based DRP works with virtual machines rather than actual hardware and recovery sites. Images of the primary infrastructure are stored and updated at regular intervals. A virtual machine can be that of the database, server, or application setup. While virtualization-based DRPs are considerably cheaper than the first option, a recovery strategy is essential for it to work. Identifying recovery software and the backup medium is crucial. This type of DRP requires extensive testing. 

3. Cloud-based DRP: Cloud-based DRP involves backing up critical assets or even the entire primary setup with a cloud provider. This type of planning requires extensive coordination with the cloud managers in terms of security, testing, and meeting the RTO and RPO goals. It is best to pick a cloud provider that allows you to pick the location of the physical and virtual servers. This option is cheaper than data center recovery planning but can be more expensive than virtualization-based DRP.

4. Disaster recovery as a service (DRaaS): If an organization lacks the expertise and resources to create their own DRP, they can enlist the services of a third-party service provider. These providers are referred to as DRaaS companies. It is important to make sure that the service level agreement (SLA) with these companies is in line with the organization’s DRP vision. DRaaS costs vary based on disaster recovery planning goals. Some DraaS solutions also offer artificial intelligence, machine learning, and predictive analysis based disaster recovery plan templates. These help with pre-emptive strikes by automatically detecting ransomware, predicting data loss, hardware failure, and application downtime in case of disaster. 

Also Read:  Will Extreme Weather Events Affect Your Business? Lessons From the Texas Winter Storm

6. Create a disaster recovery playbook

A disaster recovery plan must consist of an RTO and RPO for each service and a step by step recovery plan based on the type of disaster recovery plan chosen. A completed disaster recovery playbook doesn’t just end with that. Other mandatory information includes:

    1. List of employees in charge of each service, along with their contact information.
    2. Information packets for each person in charge, with required passwords, access grants, and other configuration information gathered during inventory analysis.
    3. Point of contact who oversees the smooth transition of operations after the disaster occurs, and for troubleshooting the DRP in case of issues.
    4. Contact information of software vendors and third party services. In case a DRaaS vendor is involved, including their contact information and steps to trigger their services.
    5. Information about Emergency Responders
    6. Contact information of facility owners and property managers.
    7. In case of data center DRP, a diagram of the entire IT infrastructure, with recovery sites and directions to access them.
    8. In case of virtualization-based DRP, information of the VMs’ storage medium and recovery steps.

7. Test the disaster recovery plan

By this stage, the first draft of the disaster recovery plan is ready. A good DRP is defined by how well-tested it is. Considering the magnitude of this operation, this can be tricky and time-consuming. It might also be an expensive affair, so make sure to include this while budgeting for your DRP efforts.

There are many ways of testing a disaster recovery plan:

      1. Walk-through test: Sit with the DRP team members and stakeholders, and just read through the playbook. Make any corrections or updates necessary. This does not disrupt existing business operations in any way.
      2. Simulation test: Simulate the disaster and see how well the DRP executes. This does not disrupt existing operations either.
      3. Parallel test: Recreate the setup for the key services using the backed-up assets and see if they process real-world transactions. This is done in parallel to the actual system, which continues to process data as normal.
      4. Full interruption test: This test assumes that the primary system is completely down, and all of the incoming load is directed to the failover systems created as part of the DRP. This completely disrupts the existing system by making it go offline.

Like every other testing activity, DRP testing must be carried out at regular, scheduled intervals. Keep in mind that all of these tests need to be carried out in every testing cycle. Different tests can be carried out at different points of time in the cycle.

It is also not necessary to test the entire system in every cycle. Individual components can be tested based on any changes made in the system or routine maintenance. Make sure the person in charge is in the loop. Combining multiple components for a narrow test run is also an option.

Success metrics are how you conclude if a DRP was a success or failure. A successful test isn’t just a playbook implementation that runs without errors. Any holes captured during the testing and marked to be fixed without delay are considered successes too. Success metrics need to be detailed in the DRP too. In the case of DRaaS, testing frequency and success metrics are included in the SLAs.

Also Read: Offsite Data Replication: A Great Way To Meet Recovery Time Objectives

8. Establish a communication plan

Apart from automated tests, employee awareness training sessions must be conducted by the HR department. The people in charge of different services in the DRP must be walked through the different scenarios covered in the playbook at different intervals. Their contact information and their roles and responsibilities must be readily available in case of emergencies. Disaster recovery exercises and drills need to be carried out at regular intervals.

Since an outage can cause panic and outrage, it is prudent to have a PR team in place. Exact information about how long it would take for the system to come up and the cause of failure will be easy to gather, thanks to the DRP. This makes stakeholder appeasement easier.

Following these 10 steps will definitely result in a fail-proof disaster recovery plan. There are multiple checklists available online to make sure that you do not skip over any of them. Remember — a good DRP focuses on managing the crisis, restoring business-critical functions, and recovering, all while communicating with your stakeholders, as explained by Tom Roepke and Steven Goldman in the Disaster Recovery Journal. 

Top 10 Best Practices To Create and Implement a Disaster Recovery Plan (DRP) in 2021

Best Practices To Create and Implement a Disaster Recovery Plan (DRP) in 2021

1. Focus on the assets and vulnerabilities, rather than the disaster

Picking particular disasters and focusing only on risks associated with them can draw attention away from other threats. A better approach would be to identify core assets and services and then working up to the associated vulnerabilities.

2. Keep iterating the process

Disaster recovery planning is not a one-time process. Business requirements keep changing, new infrastructure is added every day and industry regulations are updated all the time. This means that the DRP also needs to keep changing. It is best to have scheduled sessions, ideally three to four times a year. It can also be based on certain milestones or triggers — like adding a new service or making major changes in an existing one. A good DRP grows with the business.

3. Maintain a readily accessible disaster recovery playbook

A Disaster recovery playbook is meant for multiple stakeholders at different business levels and professions. It must be written in a clear and concise language understood by all. Once a playbook has been approved and tested, a hard copy must be placed in a readily accessible area, while a soft copy is loaded onto the cloud or a portable medium. A DRP must also be easily modifiable since it is subject to change with every iteration. Any changes made in the plan must be reflected in all storage and communicated to all stakeholders and team members.

Also Read: What Is Password Management? Definition, Components and Best Practices

4. Do not forget the processes

DRP is not just about the hardware and the software. There are people and processes involved in each step too. It is important to make sure that the recovery team has a backup work location to operate from. If employees are logging in from home, do they have secure access points to reach your systems? Remember to include these work-process solutions in the playbook.

5. Have a testing schedule and stick to it

A disaster recovery plan is only as good as its testing schedule. A 2014 Global Benchmark Study showed that poor planning, testing, and technological deficiencies led to more than a $5 million loss by critical application failure, data center outages, and data loss. An untested plan leads to a false sense of security. Usually, DRP tests are scheduled three to four times a year, though some bigger enterprises with complex systems carry them out monthly.

6. Create comprehensive post-test reports

A testing activity must always result in a comprehensive report detailing the following points:

    • The type of tests carried out
    • Frequency of testing
    • Success Factors — predetermined details that help evaluate the testing. A successful test isn’t just one that comes up error-free. A successful test is also one that catches an error that might have made it to the final cut.
    • Test procedures followed
    • Post-test analytics

Also Read: 10 Best Password Managers for 2021

7. Keep up employee awareness, training, and drills

All concerned people must always be kept in the loop, and DRP drills need to become part of the company culture, just like fire drills. Training must be frequent and contact information updated.

8. Supplement your DRP with security and data protection solutions

Replicating a whole new secondary setup means replicating security concerns as well. Any cyberattacks or ransomware demands must be curtailed within the primary system and cannot permeate the WAN while duplicating data for backup.

9. Protect the everyday software

Any SaaS applications used, like MS Office or Salesforce, need to be considered in the inventory logging stage. While they might not directly be involved with the company’s services, losing contact information with potential clients might have a long term effect. Even email suites come into play here because the loss of important communication can be a major business impediment.

10. Ensure good reporting

On-ground reporting is just as, if not more, important than test reports. When a disaster actually strikes, and the DRP is set in motion, provisions must be made for documenting each step. It is the best way to figure out what works best and what needs tweaking.

With the number of natural and human-made threats increasing daily, creating, adopting, and maintaining a well-thought-out disaster recovery plan makes good business sense. A good DRP goes a long way in creating a confident and resilient business.

Did this article help your research for a disaster recovery plan that suits your business? Tell us on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We would love to hear from you!