Business Continuity vs. Disaster Recovery
Although they are often confused and intermixed in the discussion, a business continuity plan and a disaster recovery plan are two very distinct and separate exercises. A business continuity plan (BCP) is a process in which an environment is evaluated for points of failure and, where appropriate, these points of failure are mitigated. The expected result is that the environment can maintain continuous operations. A good example of this is where a redundant piece of equipment is installed so that it can take over if the primary equipment fails. On the other hand, a Disaster Recovery Plan (DRP) is the process of restoring operations when an event happens. A good example of this is restoring a server from a backup if it gets infected with ransomware.Why Does a Business Need One?
It is naive to believe that events won’t happen. The question is, once an event happens, what is the impact to the business? A simple system failure could escalate to a catastrophic situation very quickly. An outage is not the time to realize your hardware is out of warranty and a replacement part is a week away. By understanding critical data paths for the business and incorporating resilience into those systems, a business can reduce or avoid penalties and other costs.
It is important that the DRP is communicated to all stakeholders and routinely practiced, reviewed and updated. Every person should be able to follow a well-documented workflow to minimize the impact of an event. Panic, confusion, and miscommunication are your enemy. Create communication templates that are vetted by your legal team and cyber insurance provider in advance. Know who is authorized to declare an “emergency” and what criteria are used for that decision. Again, costly delays can be avoided with some proactive measures.What Factors Need to Be Considered?
Not all data is created equal. When designing a BCP and DRP, it is unrealistic to protect everything at the same level. This is not only cost prohibitive but often unnecessary. It is important for a business to understand the types of data they process and store as well as how to rate each type of data related to its confidentiality, integrity, and availability requirements. For example, log data may be important, but it may not have the same availability requirement as transaction data. If a log were not available for a period of time, the company could still function. However, customer transaction data could be the primary revenue stream for the company, and it has a quantitative loss by the minute. Other areas of consideration when designing the BCP and DRP could include
- Customer SLA’s (service level agreements) or other contractual obligations
- Cyber Insurance “due diligence” and “due care” expectations
- Data Criticality (confidentiality, integrity, and availability)
- Retainers, standby resources, and equipment costs
- ITIL – maturity and culture of the organization (ability to execute the plan)
- The risk tolerance of the company
- Risk Assessment (What events are being considered?)
Designing and building your BCP/DRP should not be done in a silo. Engage stakeholders and leadership teams from the beginning. This will avoid unnecessary roadblocks and push-back during the implementation. It is also advisable to seek external support. Although designing the BCP and DRP does not have to be difficult, it is important that certain elements are considered. By reaching out to people or organizations that have implemented these plans, you can learn from their experiences.
At its most basic, there are two items that are imperative to a DRP. These are the Recovery Point Objective (RPO) and the Recovery Time Objective (RTO). The RPO is the amount of data that can be lost without significant impact to the company. An example of this is designing the frequency of backups. If a company would not be able to recover if it lost four hours of transactional data, then the RPO maybe two hours, with backups being performed every hour to make sure that a full restoration can be done from backup with only an hour of data being lost.
The Recovery Time Objective (RTO) is the amount of time that can elapse without significant impact to the company. If possible, this figure should be based on hard data. Accounting should be able to help quantify the impact of a one-hour outage. Don’t forget any SLA penalties or fines that incur. From a qualitative perspective, brand damage and customer “ill will” should also be considered.
Part of the BCP/DRP process should also include a risk assessment to determine what risks should be protected against (mitigated). A comprehensive list of what could go wrong will help to implement adequate and appropriate controls. A risk assessment also includes a Business Impact Analysis (BIA) that quantifies the severity (or impact) should one of the risks be compromised, as well as the probability of an event happening. You can’t protect against everything, so make the most of a limited budget and resources by protecting against the most likely risks with the highest impact.How Should a BCP or DRP Be Constructed?
Fortunately, you do not need to re-invent the wheel. There are several online templates for business continuity and disaster recovery plans that are available for free. One of the more popular and widely referenced documents is from the U.S. Government National Institute of Standards and Technology (NIST):
A word of caution, though: Don’t try to force your organization to comply with every part of the template. One size does not fit all. Although some changes to processes are inevitable, it is unrealistic for most public companies to meet every government requirement. Modify the template to reflect reality and your company's risk tolerance and ability to process change. Take an iterative approach. The first version of your BCP/DRP does not have to be perfect.When and How Should Employees Be Trained on the Plan?
A business continuity plan is more of a shift in culture than a point in time. A BCP is an ongoing process that is tied to the business requirements of any new hardware or software. A system’s data criticality rating, backup requirements, system owner, and dependencies need to be considered prior to implementation. This information should be stored within a central repository and made available during change control requests, maintenance windows, patching, and other activities.
A disaster recovery plan, on the other hand, should be a formal event and performed at least annually. A DR exercise can vary from a “tabletop exercise” to a full-blown failover to backup systems. The important thing to emphasize is that a DR exercise is not an IT thing. It involves participation from almost every organization within a company. Having the CEO sit in on the exercise not only shows executive support but also emulates reality. Having executives involved also reduces the possibility of them undermining the process in a real event. A well-meaning CEO can derail and prolong an outage with “knee-jerk” reactions if he is not confident that the DRP is solid and if he doesn't trust the process.
A critical step within a DRP exercise is the “retrospective” to go over lessons learned. Every exercise will have “hiccups” that should be addressed, and the plan updated. In addition, environments are dynamic. Hardware and software change, as do people in critical positions. Keeping the DRP updated ensures that the necessary information is available when a real event occurs.
In summary, a Business Continuity Plan (BCP) and a Disaster Recovery Plan (DRP) are separate items that rely heavily on each other. Every company should go through the process of identifying the critical data flows to its organization and how to apply the appropriate protections. These plans don’t need to be huge or overly complex, but they certainly need to be in place.