9 hours 49 minutes
we just finished talking about redundancy. And that leads into talking about disaster recovery and business continuity in the event that a disaster strikes. Redundancy is going to be key to getting back online, back up and running and enabling the continuity of the business.
We hear the terms BCP for business continuity, planning and DRP for disaster recovery planning.
We often hear them together, sometimes used interchangeably. We want to make sure that we know the difference between them. A business continuity plan is an overarching sort of umbrella document that includes many other plans that help sustain the organization in case of a disaster. The DRP is more of a short term document that is focused on the immediacy of the disaster. I've heard people saying the DRP is the sky is falling.
The BCP is this guy's falling How do we keep going? Disaster recovery is really focused on restoring I T services to operation based on their criticality as quickly as possible. When we talk about criticality, we mean time sensitivity. There are certain services that while they're offline, you lose money.
If we have an echo commerce site, for instance, the longer the echo commerce site is unavailable. The less money I'm generating that would be a very critical service. There are seven stages or phases of business continuity plan and lots of different organizations have their own documents they use. This is N I s t 834 I, so to 70 31 has a framework of business continuity.
There are various plans available and they're all performing the same functions. We start out with Project Initiation writing a business continuity plan as a project and it should be managed as such.
You start the project that we move into the business impact analysis.
This is probably the most critical step because it is where we identify what elements are critical and how critical they are. That's going to be the driver for what we recover in what we recover first and how quickly we do. So we identify our recovery strategies, then get our design and development. We look to implement the plan, we test it and maintain it. Those are the seven phases I got. This goes back to N I s t 134. There are other frameworks out there in the support business continuity planning.
If we look at the Project initiation, you're going to manage this as a project we have to have support and buy in from senior management. A business continuity plan isn't something that you write one afternoon over margaritas at Gilley's. This is a lengthy process that needs funding and support. Senior management is going to put there by and in writing, and they're going to sign off. That's committing to support and funding, and the project manager should be named.
It's going to be the person who coordinates the business continuity planning processes. We figure out the scope of the plan. We select members of the BCP team. The Business Continuity Planning Team should come from a diverse background. You should have representation from throughout the organization, including senior management
on our next phase. This is the big one because this is the business impact analysis. This is where we do our research and identify and prioritize all of our business processes. Based on the criticality again, criticality is time sensitivity.
This document is going to give us metrics to determine how quickly these critical devices need to be up online.
We'll talk about things like recovery point objectives, service level objectives. We've already talked about MTV, F and M T T R. Let me just take a minute and talk about service level objectives. SLS not to be confused with SLS service level objectives. The idea is that if we're in some sort of disaster operations, we're not going to be providing 100% of our normal service to our customers.
What we might say is, in the event that these services are unavailable, we at least like to operate at 80%. That's a service level objective. It takes into consideration that you can't operate at 100%. What are we looking for striving for and a reduced capacity? A recovery point objective is tolerance for data loss.
How current state w If I say I have an archaeo one hour,
you need to restore all files up until an hour ago. How much data am I willing to lose? Recovery time Objective. Rto and MTD are sometimes used interchangeably. Recovery time objective, maximum tolerable downtime. This is what's the maximum amount of time we can be without the service
before we suffer loss. It's unacceptable.
What's our maximum time? We've already talked about mean time between failures, the amount of time that the device will run. We repair it, then it fails, Then we repair than it fails. Empty tr is again that mean times to prepare just what it sounds like. Also, we need to determine minimum operating requirements because when we restore these devices, for instance, if I have software that has to be up and running within nine minutes, you better make sure I have the hardware that will run that software, so to speak. Any sort of environmental or application type requirement should also be in the B I. A, uh,
the next phase. Identify my recovery strategies in the event of a disaster assuredly their husbands and loss. Let me just say that it should go without saying that if we always place the physical safety of our employees about anything else, there were ever to be a decision process to make. Where human life may be at risk, we have to choose something different. Always after human life, we start to think about our facility because that would be an area that would cost us a great loss if our facilities damaged or is unavailable for a period of time, we may need somewhere to work. Maybe our employees can work from home, but maybe not. If not, we generally lease an off site facility. We might least a cold site. A cold site is really just a bare bones facility that has heating and air conditioning.
There's nothing beyond somewhere to work.
It was an empty building or an empty space.
Obviously, coming into a cold site is going to take a while to get back up and rattling. Cold sites are the cheapest things, with a warm site there the basis. But there's also furniture, their computer systems, their telephones again. That's just generic equipment, nothing on my own that will still take a bit of time to get back up and rolling. Speaking of rolling, it's a rolling hot site. Sometimes you see these. They pull up in the event of a disaster, like a little mobile home on wheels containing computer equipment, perhaps, but something that we can process some other data center operations. It's really kind of a short term solution. We could pay for a hot site that's a location that is under our ownership, not ownership. But we have exclusive use to its fully configured and as my equipment, and we just really need to come in and restore from the latest backup. You can get back up and running pretty quickly. Mirrored site is usually under our ownership. It's a branch office. We can switch operations to the Northwest region. It got access to our data. They're staffed. If all the equipment that they need in order to make sure that it's fully redundant in every way, that could be very expensive. There are certainly some recovery strategies in relation to our facilities.
There are certainly some recovery strategies in relation to our facilities. We also have to think about personnel where job rotation and training would help in any of our processes as well.