Did you know Cybrary has FREE video training? Join more than 2,500,000 IT and cyber security professionals, students, career changers, and more, growing their careers on Cybrary.
Now we are ready to explore Risk Management Best Practices and we start with Business Impact Analysis. You'll learn how and why it's essential to identify mission critical systems, and the loss impact of those critical systems to the business should a failure of some kind occur. For example, we discuss power sources to and within your business operations. You'll look at what happens when power outages occur, how extended outages adversely impact a business operations an why planning for those occurs, particularly those out of your control is a top level risk management best practice your business suffers minimal harm. [toggle_content title="Transcript"] Now we will be dealing with section 2.8. This section deals with summarized risk management best practices. The first thing we look at is the business impact analysis. Organizations should do business impact analysis to identify critical systems and components within their enterprise, where you identify systems or processes that are critical. You are better able to understand how to prioritize repair or a restore should those components be attacked or there could be a failure. Without understanding the impact on your business, you have no knowledge of what the loss would be should you lose one of these critical components or systems. The essence of doing business impact analysis also allows us to see if certain components, maybe they are automated processes, do we have manual methods of doing this? Do we know how to carry out such components manually? Without such knowledge in the incidence we have a downtime for a server or some other things, nobody might know how to carry out automated processes manually. If we do a business impact analysis it allows us to see, such criticality and put in measures to address it. We could also identify and ensure that multiple people know how to carry out a specific function. When we do a business impact analysis it could allow us understand that certain individuals are critical. Their absence might introduce a vacuum in the operation of your technology so that way you are able to ensure that multiple people know how to carry out their roles. These are technologies we put in place to limit risk to an organization. Another best practice is to remove a single point of failure. By removing a single point of failure you are providing redundancy such that a failure of a component does not bring down the entire system. It could be you're an organization, you have your web hosting organization, and you have service from a service provider to host web pages for multiple organizations. The loss of signal from that service provider means you are going to be down. No more hosting but if you have redundancy, you possibly have service from another service provider. That way, you have removed that single point of failure. By definition, the topic says, remove a single point of failure, but in reality you are putting something in place. You put in redundancy to limit the single point of failure. This is very important to ensure availability on our networks. Some organizations have a domain controller within a network. We know that authentication takes place on the domain controller. If you have only one domain controller, what if the domain controller were to crash, what if it were to fail? By having multiple domain controllers you have 1 as your primary domain controller and the other as your backup domain controller. If the primary domain controller fails, the backup domain controller takes over. Users are able to authenticate into the network but if you have only 1 domain controller that is a single point of failure. By putting in the backup domain controller, we have removed the single point of failure. It is best practice that you avoid single points of failure by putting in redundancy. You always have alternatives to support the primary technology. Business continuity planning and testing, naturally, within our business environments, disasters will happen. We should create business continuity plans. A business continuity plan as the name dictates, a plan with which we would continue the business after a disaster shows up, a business continuity plan. We can come up with a plan but not test the plan. A plan that has not been tested could be a plan for failure, yes you can plan but it is not successful, you can plan and it is successful. The best way to ensure a successful business continuity plan is you document the plan and then you test the plan, you test the plan periodically to see that it meets the requirements to continue the business after an incident or an event or a disaster. Should there be any changes to the facility, should there be any modification to your procedures. Should there be any change or increase or decrease in the number of people that access that facility? Your business continuity plan is a living document, those changes should reflect on the plan, those changes must reflect on the plan so that the plan can accommodate for everybody. Maybe the facility has been re-modified an exit at the rear is now being taken out and put on the side. You have to change the plan so people know that in the case of a disaster, nobody goes that way, otherwise you might have a stampede, nobody can get out and then it results into loss of lives. Any modification should also affect your business continuity plan. The business continuity plan should be sufficient and robust to cater for the needs of everybody after the incident has taken place. Periodic testing should be carried out. We don't just test once and forget about it, we test periodically. We could use drills, fire drills, evacuation drills so that people understand what to do, how to do it and yet we can measure their responses to see, is the plan sufficient for the enterprise? Periodically we also should do risk assessment. We could do risk assessment for our facility. We could do risk assessment for the network. We could do risk assessment for the personnel. Periodically, risk would show up in the environment, risk will show up on the networks, risk will show up to our personnel. If we do risk assessment, we are being proactive. We find the risks before threat agents exploit them. A risk assessment could involve checking your personnel, sending fake attempts to them such that you can identify who needs or who requires training. You could do that assessment to find where does risk exist or where possibly could risk exist in your facility, it could be a risk to your server, maybe they are exposed to certain networks. It could be a risk to your personnel or a risk to your facility or your networks. You do risk assessments, you measure how much risk these things are exposed to. By doing the risk assessment, you are able to generate reports as to how you want to respond to that risk. Do you want to mitigate the risk? Do you want to deter the risk? Avoid the risk or transfer the risk? These are measures you could then take in response to the risk but you can't take the responses, the right responses could not be taken if you've not measured the risk. You need to do a risk assessment to assess the state of the risk. Next we also talk about continuity of operations. Disaster will show up, how do you continue operations after a disaster? To ensure continuity of operations, certain technologies could be put in place. One, we talk about the UPS, uninterruptable power supply. A UPS is simply a box containing batteries, about the size of a shoe box. They could also be bigger, they come in different sizes, different capacities. When you receive them, best practice; let them charge because you don't know how long they've been in the store. Let them charge for about 8 hours or 10 hours and then you deploy them to the network, such that should there be a failure, intermittent power outage these outages intermittently occurring do not shut down your machines because there is a UPS in place, the power to support the machines will then be provided by the UPS. This will be very important so that your users can save their work and gracefully shut down the computers. Without the UPS, every one minute outage can shut down your server and this could be damaging to your server. In the instance of extended outages, we could also have generators, such that we could support users and PC, computers and servers on the network for extended time of power outage. These are some technologies we could have for continuity of operations. We also need to do disaster recovery. Disaster recovery planning, disaster recovery strategies, the objective is that we put a plan in place as to how we are going to recover from the disaster. This plan should be put in place before the disaster shows up. If we are trying to plan after disaster shows up it could be too late. The disaster recovery effort results in a disaster recovery plan and this disaster recovery plan should be documented, tested and periodically updated should there be changes within the organization. This is a document that individuals within the organization know to follow, should a disaster take place. That is, the disaster recovery plan. How are we going to recover from this disaster if it happens? We must have a plan in place that tells us what to do. Who is responsible for what and where else business operations can take place should we lose the primary location. We should also do IT contingency planning. When we do IT contingency planning we are asking what if, what if, what if, what if. What if the server fails? Do we have a backup server? What if Joe fails to show up for work? Is there somebody else to do his job? What if a network component fails? By asking ourselves these what if questions, we are able to suggest alternatives. We are able to suggest solutions that could be put in place to mitigate the effect of a risk or a threat taking place on the network. We should ask what if questions and this is where we talk about contingency planning. We are putting in some other measures to ensure availability of resources on the network. The next item is succession planning. Organizations should ensure that they do proper succession planning. Individuals within the organization should be trained over time such that should one person decide to resign, retire, or just leave the organization, there is somebody else to step in their shoes. By carrying out succession planning, we leave no vacuum in the helms of affairs. Somebody else would always be available to take over the management of the operations within the company. High availability, organizations put strategies in place to ensure high availability of their network resources. This availability we talk about is not just availability as in the confidentiality, integrity and availability trend of security. We talk about high availability, you put in strategies and technologies to ensure that whatever happens, your systems are up and running. Solutions like raid, solutions like redundancy and other measures are there to ensure high availability. In case a problem happens, your systems are still up and running. The simple problem could not bring down your systems or your networks. This is what we mean when we talk about high availability. When we talk about redundancy, it means that we have a spare. We don't rely on only one form of technology. We always have a spare, and it could be a spare server, a spare staff, a spare solution. By putting in redundancy we are able to limit a single point of failure. If you lose one component there is always another one to serve as a backup should that component fail. This is where we talk about redundancy. The next topic is fault tolerance. What is fault tolerance? The ability of your technology to experience a fault, yet continue functioning. Some technologies experience a fault and that's it, they shut down, they are disabled. When you introduce fault tolerance, it might be expensive but it is a price to pay for availability. Your technology would experience a fault yet they continue functioning. The functioning might be limited but at least it's still available, not everything is shut down and that is what we mean by fault tolerance. You could have fault tolerance in form of hardware. You have hardware that are very robust and can withstand a fault. If they experience a fault the entire machine does not shut down, only those components that are damaged might suffer or the services that are related might suffer. We could also do load balancing to ensure availability. When you do load balancing, we use a load balancer to balance the load across multiple systems. In this strategy what we have is, we could have multiple systems on a network and the use of a load balancer to spread the traffic across multiple systems or servers. What we have here is all the traffic coming from hundreds or thousands of computers, go to the load balancer. The load balancer then distributes the work across multiple systems to ensure that not one system is overwhelmed by too much work. That way, the loss of that system could be no availability to data but in this instance with a load balancer in place, you have multiple Servers. If the server A is working, server B is also taking some work, server C is also taking some work. The load balancer ensures that our load is evenly distributed amongst the multiple servers to ensure availability. The last topic for section 2.8 discusses disaster recovery concepts. Here we talk about the backup plan and the policy. First there should be a policy to dictate who is to back up? When is the backup to be carried out? What is to be backed up and what media do we back up to? The backup policy should dictate clearly who is responsible, otherwise we would think, everybody will think somebody is doing it. Who is doing the backup? Joe should be doing it. Joe would think Mary is doing it, Mary thinks been is doing it. Everybody thinks somebody is doing the backup. At the end of the day we'll realize that nobody is doing the backup. Backup is very important so the policy should dictate clearly who is doing the backup, what is being backed up, when to take the backup, what media, where the backup will be stored, the policy should dictate all of this, otherwise we would have a poor backup plan. The backup plan should involve policies that are clearly set to dictate who is doing the backup, what is being backed up, when the backup is being taken, what media do we store the backup and where to store the backup. When we do backup, our policy should also govern that we test the backup, periodic test of your backup is very important, otherwise how do you know your backup can function if you don't test it. What if there is a failure? These should all be included in our backup plans, our backup frequency, because when you do a backup, essentially you are just making a copy of your data. Certain times, some data do not change, we have to be strategic as to how we do our backup. The backup frequency could involve, we have something called a normal backup or full back up, you have differential backup and you have incremental backup. When you have your system do a backup, the organization should decide on what sort of backup they would do. When you do a full back up or normal backup you backup everything. When you do a differential backup you are interested in files that have changed since the first backup. When you are doing an incremental backup you are also interested in files that have changed or new files from the previous backup. Usually when you do a backup, the archive bit is an attribute of a file that your operating system would look at to determine if a file needs to be backed up or not. The archive bit would be inspected by the operating system in some of these backup strategies. The archive bit would be cleared by the operating system and in others it does not clear the archive bit. Alternative disaster recovery locations could be the use of a hot site, the use of a warm site or a cold site. These sites have nothing to do with temperature rather these are classifications we give regarding the state of readiness of the site. A hot site is an alternative processing site where in the case you lose your primary site you can have your personnel report to a hot site. What is a hot site? This is an alternative location where the servers are already in place, the networks are in place, and everything is in place. All that is missing is your personnel and your latest backup. Some of your backup is even there. A hot site is the fastest to activate. Everything is already there. However, it is the most expensive alternative solution because it requires all the security solutions, electricity, servers and infrastructure, exactly as your primary sites. Some organizations decide to follow a less expensive alternative, which is the use of a warm site. A warm site would have partial infrastructure in place. When you get there you have to finish up the setup of the computers, your data, your backup and your people. Partial implementation is put in place at a warm site. It takes slightly longer to activate a warm site because you still have to bring all to the state of readiness. A cold site is almost virtually empty. A cold site takes the longest to activate. In terms of running cost, a cold site might be the least expensive but the cost would still build up over time, you still need to buy the computers, you still need to set up the computers, you still need to put all the technologies in place. A hot site is a site that directly mimics your original site. It is the fastest to activate, however it is the most expensive alternative because that site is up and running exactly like your real site. Organizations that cannot afford any downtime will subscribe to hot sites and those that can afford some down time will decide to use a cold site. Organizations that cannot afford a downtime will subscribe to hot sites that are already up and running, all they do is if they lose operations at the primary site they ship their latest backup and the people to the hot site and business continues. A warm site takes much longer to activate, requires more infrastructure than the hot site and the cold site is almost, no existing technology at that site. You have to build from scratch. It is the least expensive option but even the expense still builds up with time because you still have to build up everything so that it mimics gradually over time your original site. This is it for section 2.8 of the security plus syllabus. [/toggle_content]