Business Continuity and Disaster Recovery

Video Activity
Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or

Already have an account? Sign In »

Course
Time
8 hours 25 minutes
Difficulty
Advanced
CEU/CPE
9
Video Transcription
00:00
>> Now our next section very obviously
00:00
goes hand in hand with risk management.
00:00
We're going to look at business continuity
00:00
and disaster recovery.
00:00
I always feel like in information security,
00:00
you start with risk management,
00:00
and then you're marching towards business continuity.
00:00
We're going to keep the organization going
00:00
despite disruptions of any type of scale.
00:00
That's the real challenge here.
00:00
All types of disruptions,
00:00
we have to plan,
00:00
in order to address how we're going to
00:00
respond with each of those.
00:00
Now, just to start out with the difference between
00:00
business continuity planning and
00:00
disaster recovery planning,
00:00
business continuity planning, first of all,
00:00
is for the organization as a whole.
00:00
It's not IT specific,
00:00
and it's all of those processes that we have to
00:00
have in place so that we can sustain the organization,
00:00
until normal business conditions are restored.
00:00
This says after a disaster,
00:00
I would say after a major disruption
00:00
or a disaster, absolutely,
00:00
this can be something
00:00
depending on the scale of disruption,
00:00
this maybe something that we're six months in and still
00:00
working underneath the heading of
00:00
a business continuity plan.
00:00
If you take a look at everything
00:00
that's gone on with COVID.
00:00
This hasn't been a short-term disaster.
00:00
This has been a long-term pandemic,
00:00
and very few organizations are working at
00:00
100 percent permanent capacity as in
00:00
the way they're going to begin working
00:00
or continue working forever.
00:00
We're all operating in some form of reduced capacity.
00:00
The quality of this business continuity plan,
00:00
the ability to sustain us until those operations are
00:00
secured are really essential for us being
00:00
able to withstand this type and scale of a disaster.
00:00
Now, the DRP,
00:00
the disaster recovery plan,
00:00
that is more specific to IT.
00:00
The DRP is short-term focused.
00:00
It's the immediacy of the disaster.
00:00
It is the sky is falling,
00:00
what do we do about it?
00:00
Our real focus and priority with
00:00
disaster recovery planning is going to be to get
00:00
those most critical services,
00:00
processes, backup, in running as quickly as we can.
00:00
With criticality, we have to
00:00
remember that criticality is about time sensitivity.
00:00
Disaster recovery planning is about
00:00
getting the most time sensitive,
00:00
systems, processes, elements back online.
00:00
Those are the systems without whom we lose money.
00:00
We have to have some way of
00:00
prioritizing what our most critical systems are,
00:00
and that's going to come up in
00:00
the business impact analysis in just a few minutes.
00:00
Now, the BCP process start
00:00
out with scope and plan initiation.
00:00
This is going to be managed as a project,
00:00
so we began it as so.
00:00
Then we have the business impact assessment or analysis.
00:00
You'll hear it called both.
00:00
I'll probably call it
00:00
the same thing using
00:00
assessment and then analysis another moment.
00:00
For our purposes, that's fine.
00:00
Business impact assessment is
00:00
the same thing as business impact analysis.
00:00
Then we have the actual development of
00:00
the plan based on the previous two steps,
00:00
and then after the plan is written,
00:00
we take it to senior management for sign-off,
00:00
and then we go through a testing process.
00:00
If we look at this first phase, the project initiation,
00:00
writing a business continuity plan
00:00
is very much a project.
00:00
Now business continuity planning is not a project,
00:00
but writing a BCP is.
00:00
Good project management requires
00:00
that I have a project sponsor.
00:00
In this case, it's likely going to be senior management.
00:00
I want them to put their commitment
00:00
and support in writing.
00:00
I want them to,
00:00
in writing name me as
00:00
the BCP coordinator or whoever that's going to be.
00:00
I think for the exam,
00:00
assume you're the business
00:00
continuity planning coordinator.
00:00
But it needs to be named specifically in
00:00
the policy that the senior management provides.
00:00
They may provide that policy in a project charter.
00:00
If you've gone through project management classes,
00:00
you know how important a project charter is,
00:00
it's what really authorizes the project.
00:00
That's where they commit to support and money
00:00
and lay out the high level requirements of the project.
00:00
Also in initiation, we determine the scope of the plan.
00:00
We may very well
00:00
have an overarching business continuity plan,
00:00
but usually individual departments
00:00
will have their own DRPs.
00:00
The disaster recovery plans are
00:00
just part of the business continuity plan.
00:00
But with any plan that we're writing,
00:00
we need to figure out what the scope
00:00
is for the organization.
00:00
Is it for branch office,
00:00
for a department, what's that going to be?
00:00
Also, we're going to choose the members of
00:00
our business continuity planning team.
00:00
We should make sure that all departments
00:00
within the organization are represented on the team.
00:00
We want the people that are going to be carrying out
00:00
the work to be involved on the team,
00:00
and we get buy-in,
00:00
we have a much better chance of our strategies being
00:00
accurate and being able to be carried out,
00:00
if we have those folks that
00:00
are actually writing the policies.
00:00
Now once we have initiated this project,
00:00
we've gotten started, we've gotten support,
00:00
the first real action item
00:00
that we're going to have is the BIA,
00:00
the business impact analysis or assessment.
00:00
This is going to be initiated by
00:00
the business continuity planning committee.
00:00
Again, unless you hear differently on the exam,
00:00
I would assume you're the BCP coordinator.
00:00
You're the project manager of
00:00
the business continuity planning project
00:00
and you'll have a team that you work with.
00:00
Now the job of the BIA,
00:00
that's the most important document.
00:00
This is what is going to identify and
00:00
prioritize all business processes based on criticality.
00:00
It's not going to be a system by system,
00:00
it's not going to be IT focused,
00:00
we're going to focus on the business.
00:00
For instance, if I'm a large organization that has
00:00
a storefront that brings in most of my income,
00:00
then we're going to prioritize
00:00
our web presence as being the most critical,
00:00
and then we're going to
00:00
take that all the way down to systems.
00:00
But we're going to start by
00:00
focusing on the business processes.
00:00
Again, we're looking for those processes
00:00
that cause the greatest impact when they're unavailable.
00:00
Impact, [NOISE] we're talking about financial impact.
00:00
>> Impact to our reputation,
00:00
impact on our customers,
00:00
impact comes from a lot of different directions.
00:00
Now, in the business impact analysis,
00:00
there are also going to be some metrics that are defined.
00:00
Remember, the BIA is going to be
00:00
the basis for our disaster recovery plan.
00:00
We're going to specify some terms in
00:00
the BIA and some metric so that we can
00:00
make sure we get these critical services back up
00:00
and running within these timeframes.
00:00
The first is an RPO,
00:00
a recovery point objective.
00:00
This is the point to which the data must be current.
00:00
You can think of it as how much tolerance
00:00
you have for data loss.
00:00
Now of course, immediately we think,
00:00
well, we can't lose any data.
00:00
Well, you can, and if you can,
00:00
it's going to cost a whole lot of money to have
00:00
a recovery point objective of a second or less.
00:00
So we have to think about it realistically.
00:00
What is the point to which we need to be
00:00
able to recover our data?
00:00
It may be an hour, may be a day.
00:00
If you think about it, if an organization
00:00
is only doing a nightly backup,
00:00
backup every night, what they're really saying
00:00
is they're willing to lose a day's worth of data.
00:00
So that would be their recovery point objective.
00:00
Now, most organizations are not
00:00
willing to lose a day's worth of data,
00:00
so they're going to have other means
00:00
in place than just a nightly backup.
00:00
Another term, maximum tolerable downtime.
00:00
You could also hear this as recovery time objective.
00:00
This is the maximum amount of
00:00
time that I can withstand a loss of
00:00
this service or system before
00:00
the loss becomes unacceptable.
00:00
Long story short, how
00:00
quickly do I have to have it back up and running?
00:00
RPO is about the data,
00:00
MTD is about the process.
00:00
How quickly does the process
00:00
have to be back up and running?
00:00
Now, metrics that
00:00
are important to understand when you're developing
00:00
your plans is what are
00:00
the risks associated with specific devices?
00:00
How likely is one device to fail over another?
00:00
If it fails, can I get it back up and running quickly?
00:00
That's where your MTBF and MTTR come in.
00:00
Your mean time between failures.
00:00
How long do I expect
00:00
this component to run before failing?
00:00
Once it fails, how
00:00
quickly can I get it back up and running?
00:00
That's your mean time to repair.
00:00
Then minimum operating requirements.
00:00
Like for instance, let's say I
00:00
have a database application.
00:00
Well, if I need to restore that on a different server,
00:00
what requirements must that server meet
00:00
before I know it's going to be
00:00
capable of hosting that application?
00:00
Does it have to have so much RAM,
00:00
such a size hard drive,
00:00
what type of operating systems?
00:00
Those would all be part of
00:00
the minimum operating requirements.
00:00
We want to keep in mind also that we
00:00
need to be aware of our facility strategies as well.
00:00
Our facility may actually be damaged in
00:00
this disaster or we may
00:00
not be able to continue operations there.
00:00
It may be very short-term
00:00
or we may have to plan in the longer-term.
00:00
Here, when we think about recovering the facility,
00:00
we can lease hot, warm,
00:00
and cold sites from a provider.
00:00
The cold site has just bare bones facility,
00:00
the hot site is already up and operational,
00:00
you just need to come in and bring your data,
00:00
and warm sites, of course,
00:00
somewhere there in the middle.
00:00
Now quite honestly, if my organization has
00:00
a branch office and most of
00:00
our resources are Cloud-based anyway,
00:00
then I may no longer really need to
00:00
consider leasing these off-site facilities.
00:00
You also have business continuity and
00:00
disaster recovery as a service out in the Cloud.
00:00
So we've got the Cloud
00:00
changing the way we think about disaster recovery.
00:00
Now, once I have the plan developed,
00:00
basically I've looked at the BIA,
00:00
I've looked at my strategies
00:00
and I've put them in writing,
00:00
I have defined and documented
00:00
what are the strategies for recovery,
00:00
then the next thing I need to do
00:00
is pass them off to senior leadership.
00:00
Now, quite honestly, senior leadership should have
00:00
been involved all along the process.
00:00
We should have representation from
00:00
senior leadership on our team and there should
00:00
be an open door communication
00:00
as far as the BCP is concerned.
00:00
But at this point in time,
00:00
we're ready, we've written the plan out,
00:00
we have created a plan
00:00
that will operate within the metrics provided to us,
00:00
now senior management signs off,
00:00
and at this point,
00:00
they accept all risks
00:00
associated with the business continuity plan.
00:00
Obviously, this isn't something they're going to
00:00
glance at and sign off on.
00:00
At that point in time,
00:00
they've read the plan,
00:00
they've analyzed the plan,
00:00
there will have been a degree of
00:00
testing prior to that they can review,
00:00
and we'll talk about the types of
00:00
tests in just a moment,
00:00
and ultimately, when they sign off,
00:00
they are accepting the risks,
00:00
they are accepting the plan,
00:00
they are accepting the metrics like RPO and RTOs,
00:00
and they're saying, yes,
00:00
this meets our needs,
00:00
we will move forward and implement this plan.
00:00
Now, as I mentioned, we have to have
00:00
some testing and senior management make sure that there
00:00
have been test and that they've reviewed
00:00
the test and the test fall within the acceptable ranges.
00:00
So when we talk about testing,
00:00
we're talking about verifying the plan
00:00
for accuracy and completeness.
00:00
This should happen at least once a year,
00:00
or if there are major changes that go on.
00:00
We redo the network infrastructure,
00:00
we have a realignment,
00:00
we merge with another organization, we demerge.
00:00
Any of those, of course,
00:00
would be a major change,
00:00
and so as the environment changes,
00:00
we have to be flexible with our plan.
00:00
Now, when we talk about testing,
00:00
we're always looking to improve.
00:00
It's never about finding fault or blame,
00:00
though we may document what didn't work,
00:00
our goal is always going to be to get better each time.
00:00
Senior management again has
00:00
that responsibility to make sure the plans are
00:00
tested and that they
00:00
review the results before signing off.
00:00
Now, lots of different types of tests.
00:00
We have a checklist test,
00:00
and the checklist is paper-based,
00:00
and it's exactly what it sounds like.
00:00
Here's a list of everything that I think of it all.
00:00
I'm going to pass that checklist
00:00
out to department heads,
00:00
and say, hey,
00:00
did I get it? They're going to check off.
00:00
Yeah, got this, missed something.
00:00
It's a way that they can give some input.
00:00
This is a very basic assessment.
00:00
Just because we're okay on a checklist test doesn't by
00:00
any means mean that I'm ready to go live,
00:00
but it's a starting point.
00:00
Now, we take those managers and
00:00
their checklists and bring them in,
00:00
we sit them around the table and we discuss.
00:00
That's called a structured walkthrough
00:00
or a tabletop test.
00:00
Now, I got to tell you,
00:00
I don't love the name
00:00
structured walkthrough because anytime
00:00
you use the word walkthrough to me,
00:00
I'm imagining we're walking through the motions.
00:00
This is a paper-based test, we're still discussing.
00:00
Sometimes in my mind,
00:00
I call it a structured
00:00
talkthrough because that just seems more accurate.
00:00
But anyway, now we're getting a little closer.
00:00
We're talking about interdependencies
00:00
and how things might look.
00:00
But we really don't go through
00:00
the motions till we hit the simulation test.
00:00
With the simulation test,
00:00
this is where we walk through and we say, hey,
00:00
can I figure out
00:00
where we need to go to turn off the HVAC system?
00:00
Is our generator available?
00:00
Can we get to it or is it under lock and
00:00
key and nobody's seen the key in six months?
00:00
We're walking through to make sure that
00:00
the plan can actually be carried out.
00:00
Now, this is different than an exercise or a drill.
00:00
What we're doing again,
00:00
we're verifying the plan,
00:00
we're making sure the plan is accurate.
00:00
When we talk about conducting exercises or drills,
00:00
we're testing the response of our team.
00:00
We're testing, hey, do
00:00
our people have the capacity to carry out this plan?
00:00
We've got to start by making sure the plan is accurate,
00:00
and then another time,
00:00
we'll conduct exercises and drills.
00:00
Now, after the simulation test,
00:00
we have the parallel test.
00:00
The parallel test means that we're going to perform
00:00
a portion of our operations at the offsite facility.
00:00
Most of the operations are at our permanent facility,
00:00
but we're going to attempt to
00:00
bring up the offsite facility,
00:00
send a portion of transactions there.
00:00
This starts to bring in risk,
00:00
because if the offsite facility isn't ready,
00:00
then we may lose some of our processes.
00:00
But where we really get
00:00
risky is with the full interruption test.
00:00
We take down the original site,
00:00
bring up the offsite,
00:00
the backup facility,
00:00
and perform all operations at the backup facility.
00:00
Now that's not something I'm going to try frequently.
00:00
Again, depending on criticality and
00:00
depending on my other backup strategies in place,
00:00
but I've seen a lot of companies shutdown on Friday and
00:00
then come up at the offsite facility
00:00
on Monday and resume operations.
00:00
But really, it depends on how quickly we
00:00
have to be up and running the extent to which we test.
00:00
With business continuity planning,
00:00
we're trying for the long-term
00:00
sustainability of the organization,
00:00
we're trying to plan for that.
00:00
Disaster recovery planning is
00:00
more short-term and it's
00:00
a part of business continuity planning.
00:00
We start out by initiating our project,
00:00
then we get a business impact analysis,
00:00
then we write and develop the plan,
00:00
then we get senior management sign off,
00:00
and then we move to testing.
Up Next