Business Continuity and Disaster Recovery Part 2 and Wrap-Up
9 hours 49 minutes
So after we've initiated the project, we've done our business impact analysis, identified recovery strategies and written the plan.
The next thing we have to do is test it.
What we're checking here is accuracy and completeness of the plan itself to figure out if it's practical, if it will work. And if we thought of everything.
When we talk about testing, we're evaluating the plan.
When you conduct exercises and drills, you're focusing on employee response. But here we're looking at the plant.
We need to maintain this plan by revisiting and testing it at least once per year or in the event of a major change.
We want to keep up the plan to date, and we're capable of managing a disaster.
Senior management must sign off on the results of the test because they are ultimately responsible for ensuring the safety and well being of our staff as well as protecting the assets of our organization.
There are different types of test, starting with the checklist test very basic. This is paper based. We create a checklist and pass it out to department managers and ask if we thought of everything.
A check, yes or let us know what we forgot.
We don't get a lot of information from a checklist test, but it's a place to start.
The next step is to bring the managers in with the checklist for their discussion.
This gives us a better understanding of interdependencies and how things work as a part of the big plan.
It's still a paper based test,
but this is called a tabletop test.
It can also be called a structured walkthrough.
Where we actually do get up and go through the motions is our simulation test.
In a simulation test, we go through things like, Can we get to the H Vac system? Do we have keys to the generator? Are the doors unlocked as they should be for evacuation?
We're moving through the phases of the plan to get a better idea of whether it will work.
We then may go to a parallel plan.
Not all organizations go through this test.
We'll set aside a certain amount of processing to take place at the off site facility.
The majority will happen at our primary facility. What? We're doing a live test in a parallel.
In a full interruption test, we shut down our main facility and bring up operations at the off site facility.
This is by far the most risky test.
Often we perform these tests in sequence, starting with the checklist to tabletop to simulation.
We may or may not go to parallel and full interruption test because they are risky.
Once the plan has been tested, it's our job to maintain it
again. You come back once a year or in the event of a major change, to keep it up to date
our key takeaways from this module.
We talked a lot about network operations, which are the day to day things we have to do to keep up the networking up and running.
We have to maintain network diagrams and documents that we we know, where the various elements are and how they're configured.
We discussed policies and best practice everything from changing configuration management to separation of duties.
Our policies have to be set up to date, and we have to make sure they're providing the administrative control. They're set to provide
scanning, monitoring and patching.
This is the maintenance and evaluation of our network.
Our system is performing as they should
do. We have rogue systems on the network. Are we operating and save this out of the norm? Are we in compliance with our baselines?
We should get all this information from monitoring and scanning
when it comes to patching. We know vendors frequently release updates to their operating systems to their applications.
We need to make sure we're patched because that will help us secure systems.
We then moved in default management and discuss redundancy, data backups, clusters and Web servers.
When we implement fault management, it does have to be all inclusive.
It doesn't do me any good to back up my data if the server fails. So we want to be thoughtful about what we implement redundancy with.
Then we moved in discussing what happens when we have disasters.
These are notable sizeable reductions in operations. We need disaster recovery plan to respond to the immediacy of a disaster.
The business continuity plan allows us to continue operations long after the disaster