IT Operations Management

FacebookTwitterGoogle+LinkedInEmail
Description
This lessons covers IT operations management. IT operations management consists of the following:
  • Management of the IT department
  • IT asset management
  • Systems lifestyle
  • IT policies
This unit also covers IT Functional Objectives:
  • IT procedures
  • IT job descriptions and responsibilities
  • IT risk management process
  • IT service to the user
This unit also covers the IT Infrastructure library (ITIL). The core functions of the ITIL are:
  • Service support
  • Service delivery
  • Functional planning and management
  Participants in this lesson also learn about the IT department, specifically the roles of the:
  • IT director
  • IT Operations Managers
  • Systems architects
  • Information security manager
  • Information Systems Security analyst
  • Change control manager
  • Applications manager
  • Systems programmer
  • Software quality assurance tester
  • Systems analyst
  • Data entry staff
  • Media librarian
  • Help desk
  This lesson also covers using metrics to measure performance of an IT system, service level agreements (SLA) and outsourcing IT resources. [toggle_content title="Transcript"] So, thinking about the management of operations, we’re trying to grow the organization in the direction which promotes the idea of effective and efficient management of the available resources. Those resources are not just systems. That includes people as well. So managing the IT department, we’re trying to provide the basic security requirements of confidentiality, integrity and availability. When you get down to it, that is the foundation of any good operation. We need to also understand how to manage those assets that the IT department manages. That could be servers. It could be workstations, laptops, routers, switches, software licenses. This could be a significant percentage of the total gross sales of the organization. So that can potentially be a lot of money. So we want to be able to make sure that we’re managing that expenditure and the effort properly. We need to pay respect to the systems development life-cycle, the SDLC. Also we talked in earlier sections about the capability maturity model, or the CMM. These are good frameworks to use and good measuring sticks to use to see that an organization is progressing along the desired path to maturity and to efficient operation. Lastly we’re talking about our IT policies. Knowing that policies exist is one thing. Having them being available to everybody and enforcing them correctly is a more desirable goal. Our procedures need to be detailed enough in order for someone, as I was saying, that might be new to the position, to be able to pick up the documentation and do their job, without having to have their hand held the whole time. That’s the mark of good documentation. Showing step-by-step what’s required to get something done. We should be paying some attention to how we deal with our software licenses, how we deal with requests from users or trouble tickets that get opened by users. We also want to make sure that we have accurate and up-to-date descriptions of all of the roles and responsibilities. If those job descriptions are not accurate, then there could be a disconnect between what management expects and what the actual workers are doing. They need to understand what’s required, what’s expected and they should be able to refer to the documentation for their job description to double-check themselves. We need a mature risk management process. This means that risk is continually monitored, new risks are identified and studied and adaptations and all their modifications to the organization’s way of doing things are then undertaken in order to address that risk. We have to remember that risk is never completely eliminated. There’s always some level of risk in the various levels and operations of a given organization. Ultimately we have to support our users. This could be a generic end user but it could also be users that are running the business. They have similar requirements: the business user might have more authority to get things done quickly but the same requirements are in-place. We want our users to be happy and respected and we want to be able to make them feel like their voice is being heard when there’s problems. One of the other areas where an organization can expand its maturity level is to implement programs like ITIL: the Information Technology Infrastructure Library. This means that you’re trying to mature the organization’s delivery of its services, trying to get those services to the highest level possible. So there’s different audiences for ITIL. We have IT service providers, those directors which manage IT, our managers and other executives, like the CIO. They’re interested in trying to get the most value for the budget that they expend on those IT services. Trying to get the most bang for the buck, if you will. We think about our core functions of IT. First on the list would be our service support. Whatever the services IT provides to the organization should be its highest priority, to keep those services running, especially since they may be critical to the overall mission, critical to the profit cycle. So we want to know that we can refer to different standards for doing this; perhaps the ISO 27000 series, our information security management system, ISMS. And then if we support those services, then the delivery becomes a natural flow for areas where we can look for room for improvement and think about how the delivery could be more effective, maybe from a time perspective. So capacity management is a factor here. Continuity of services. That sort of ties-in to the business continuity ideas or disaster recovery ideas. We also have to remember that there is a financial component to all of these things we’re talking about. It’s one thing to have really good ideas, but if the budget isn’t there to implement them then we’ve got a gap. Then we consider our functional planning and management. This means that we’re taking into account all of the resources; software, software licensing, our different hardware and facilities considerations, staffing, making sure that we’ve got some understanding of what’s required for these different areas in the organization. Moving on to the personnel roles and responsibilities: we’ve got a lot of roles to discuss here and we’ll just touch on each one of these a little bit to see what’s involved. So, starting with our IT director, this person has the day-to-day responsibilities of managing IT. They are trying to exert their authority to make decisions for the group based on the requirements of the business and the available resources. Then we have operations managers. These are the people that are managing the help desk. Managing the network administration team, perhaps also managing the security team. So they’ve got their own day-to-day responsibilities, keeping everything running smoothly. Of course being overseen by the IT director. Then we have the systems architect. So systems architects work with the system analysts and they decide, at the big picture level, what needs to be done within the environment to deal with current challenges as well as planning for future capacity requirements. So they create the layout that best fits the goals of the business and also enhances the organization’s ability to generate revenue. Moving on to our information security manager: the ISM. Typically this person has some professional credentials such as a CISSP, a CISM, and they try and apply the standards of management of information that they learned by gaining those credentials. Then we have a systems security analyst. This could be someone that’s maybe a team lead or manager of your IT security team. Trying to make sure that they are expending their efforts and their budget appropriately to address the security goals of the organization. We can’t forget about our change control manager. The change control manager has responsibility to make sure that the process of change control is managed effectively, that all of the appropriate people are present at change control meetings, and that there is a feedback loop to say that if a change is being rejected, “Here’s why.” And then there should be mechanisms in-place to resubmit a change and revise as needed in order to get things done. We have applications programmers. These are the people who developed the software that the organization may run on. Maybe they integrate that with commercial software. They’re trying to work with systems analysts to understand what the requirements are for processing certain kinds of information. The information gets processed, it gets stored, it gets transmitted. This all comes together when the applications programmer and the systems programmer are working together, trying to get the best utilization of the available resources. So a systems programmer is more concerned with the foundation that the applications run on, but there’s still some feedback between that group and the applications programming group. We have a software quality assurance tester, or testing group. Of course working with programmers and developers to make sure that the products they develop are actually working as expected, and providing some mechanism to address shortcomings or deficiencies or security risks. Then we have our network administrator; managing routers, switches, and so on, keeping the network running smoothly, dealing with the challenges of future capacity requirements. Server administrators or system administrators are dealing with server resources. You know, monitoring their performance knowing if they’ve got enough storage, enough processor and memory resources. Then we have database administrators that are obviously maintaining the databases. Typically, database administrators might have some experience as system administrators, or even systems programmers. So they build on that foundation to now manage all the data properly that the organization generates and uses to support its products and services. Then we have computer operators. These are more of junior positions. So they help systems administrators and database administrators by doing those lower-level tasks which they can then gain experience and knowledge to move up the ladder to a more advanced position. Systems analysts: this is a group that might work with the business side of the house. So they’re trying to understand how to best utilize applications or how to best provide input to application design so that those tools and software packages can be used most effectively to generate revenue for the organization. Data entry staff play an important role. Sometimes this is done by the end user on a small scale, or you might have dedicated staff who are involved in data entry on a large scale. Media librarians: that’s pretty self-explanatory. This is someone or a group who’s taking care of all the back-up media or software media that are used, keeping track of where the information is stored, what it might contain and dealing with data retention policies. Then lastly we have our help desk. This is the first line of support when users or customers experience a problem. They call the help desk, get a trouble ticket opened and then the process begins to move towards resolution. Alright, so now let’s think about how we use metrics. This is kind of a broad topic. There’s lots of different considerations here. What we want to think about is what are the metrics that are most important to the organization? How do we generate them? How do we organize that information? How do we use it once it’s been gathered? We have to think about ideally generating metrics automatically. The more automated metrics we can generate, the better. When we have to do this work manually, that means that we’re going to be dealing with various human error problems. People get tired, they make mistakes, they’re having a bad day, or they’re not available, so reliance on a manual process is to be avoided. So what kind of metrics are we talking about? We have things like implementation metrics. Maybe you’re rolling out some updates to a bunch of systems, or you’re upgrading the operating system of a bunch of your servers. So you can say, ‘Today we’ve got five out of 100 servers completed. Tomorrow it’s seven. The day after it’s twelve.’ These are the kind of things that management’s going to want to look at to make sure that you’re making progress, you’re reaching your milestones, and so on. There’s also things like efficiency metrics. How long does it take to close the average trouble ticket? When someone calls the help desk, how long does it take before they get an email response saying that the ticket is being worked on? What kinds of ways can you measure the performance of your problem resolution? How many tickets are aged five days or more, ten days or more, 30 days or more? These are things you’d want to look at, as ways, of course, to look for room for improvement. We could then think about things like effectiveness, so how many tickets are open and closed in a given time period? That’s a good example. Maybe users provide feedback to say that the ticket resolution was satisfactory or unsatisfactory. You might want to measure things like this. Then we have impact metrics. So having some way to deal with different types of incidents, maybe you decide to count them as an absolute number, or you say that there’s a percentage, so 10% of our incidents deal with malware, 15% deal with unavailable access to resources, and so on. So coming up with reasonable categories here might be another way to look at your numbers and decide if the organization’s doing a good job or not. Then we have purpose metrics. So what is the overall function of some process or procedure or service within your organization? We could have performance goals. These are, of course, generated by management or team leads, or possibly even as individuals. You might have your own personal goals. But performance goals at the higher level are going to be generated by management, since they are going to monitor all of these different factors and decide, ‘This is a reasonable goal to go for,’ and when it’s reached, they can decide to possibly raise that goal, or keep it where it is and congratulate everyone on a job well done. We have performance objectives. This ties in with our performance goals to some extent. Maybe it says that if a ticket can’t be closed within two hours, then we need to escalate. If it can’t be closed within 12 hours, we need to escalate again, and so on. This ties the performance of the workers to some kind of a timeframe. Then we think about how the measurement units are actually derived. Do we use percentages? Do we use dollars? Do we use minutes and hours? days and weeks? These are all things that might be used interchangeably here and there, but picking the correct measurement type is important so that everyone knows what to expect. We have to consider our data sources. Where has this data come from? Can we rely on this source of data? Is it credible? Is it reliable? What about available evidence or the frequency of the collection of that data? Certain kinds of information might be collected much more frequently than others. So that needs to be understood and applied appropriately to the different requirements. An indicator is something that provides value to the reader of this data. So they can know that if this indicator is above a certain level, or below a certain level, this means it’s good or this means it’s bad. Maybe you even show some formulas involved in doing certain calculations. That helps the reader of the metrics to understand more fully what this metric really means. Then we can move on to our service level agreements. One of the first things you typically define with a service level agreement is what is acceptable in terms of downtime? If we remember we talked previously about the Six Sigma standard. 99.999% uptime. Or 99.999% defect-free. However you want to use that. That’s an extremely tough standard to meet. For maintenance of a typical system, maybe you have a four-hour window once a week, or a four-hour window twice a week. That’s pretty typical. We have to define our services. Users need to understand if they have a requirement to do their job or a requirement to get a problem fixed. What services are available from the available catalogue or menu of services that the organization provides? We have to think about the security requirements, as the organization is doing its job. How do we know if the processes that are being followed are secure? Are they promoting our goals of confidentiality, integrity and availability? Speaking of integrity, how do we ensure that the data that we store has integrity that we know that it’s correct, that it hasn’t been tampered with or corrupted? How often do back-ups get performed? That may vary depending on what the system does. Very high criticality systems might be backed-up continuously throughout the day. Other systems might be backed-up every evening, or maybe once a week. It just depends on what kind of data is there and who needs it and how available it needs to be. So a little bit more on SLAs: we have to think about the performance of the SLA. So when the SLA kicks in, meaning that there’s some escalation required, or some activity that needs to happen, because some event has triggered it, how well does the SLA perform? How well does the staff perform in its duties? If the goal is to resolve all problems of a certain category within a short time-frame, like three hours, or two days, whatever it might be, that needs to be measured so that we can judge whether the SLA is effective. Another important thing, especially dealing with third parties or partners is trying to preserve a right to audit. This might be more difficult sometimes than others. It depends on what the service is that’s being provided. But it’s an important consideration because the primary organization wants to be able to audit the services it gets from other parties when it needs to verify that that party is doing their job correctly. What about if we want to change the SLA? How do we deal with that? Does it go through the normal change control process? Is there some kind of consensus reached with all the parties concerned? That’s probably more practical. And then what does the service actually cost? Sometimes charges are direct from one organization to another. Other times you’ve got different departments within an organization cross-billing each other for different services like server hosting or application development, and so on. When we decide to outsource some IT functions, there are some important questions to ask. Why should we do this to begin with? It’s possible that you don't have the expertise within the organization to address certain issues, so maybe bringing in an outside expert or a subject matter expert makes sense. It might be more expedient than bringing someone up to speed internally and it might be a more effective use of financial resources to do that. Maybe your current staff isn’t getting the job done to the satisfaction of the managers rather. So they might decide that they’re going to outsource certain things. We all know that help desk phone support is one of the more typical things that gets outsourced because it’s possible that you can find competent people with the right kind of knowledge that will work for less because they’re in another country where the standard of pay is lower. That may be a sad reality for some organizations. By moving jobs overseas, but ultimately if it supports the bottom line that may be a wise decision. Then we have to think about whether or not management has the proper procedures and processes in-place, but is just looking for a way to do things more cheaply. Again, moving some functions overseas, or to other workers that may not be as expensive is always an option. It may not be a popular option politically for instance, but the organization’s goal is to generate revenue, so sometimes these types of decisions need to be made. Alright, so ISACA has some of their own recommendations for dealing with outsourcing contracts as far as what should be done and how that should be managed. First of all we think about what needs to be done before negotiations begin. Who are all the parties that are going to be part of this contract? How do we know what the legal jurisdiction will be? Especially when you’re dealing with people in multiple countries, there could be issues with different types of laws and different regulations that the organization is now subject to because of where their workers are. As far as details go, we need to know what the services are that are being provided. The more detail the better in this case. How long does a contract actually last? What are the deliverables? What are the costs involved? Is it a fixed price contract? Is it time and materials? Are there provisions for dealing with unexpected expenses? Some more things to think about. Once the contract is underway, how do we measure the performance? Do we know that the invoicing and payment is going according to schedule? How do we deal with the right to audit requirement? Some organizations that are providing services may not have the resources to provide timely answers to the questions about audits. So that may be something that needs to be resolved during the contract phase to make sure that there’s some compensation being considered there. How do we deal with non-performance? What kind of penalties might be involved? How do we deal with dissolving the contract if it’s no longer needed, that service is no longer needed, or the organization decides they want to use a different vendor because the one they’ve chosen just isn’t getting the job done to their satisfaction. We need to think about resolving problems. When we’ve got complaints from customers, complaints from users, how does that factor-in to what the organization does in relation to an outsourced service provider? What we don't want is too many layers of bureaucracy so that when a user complains, they’ve got to go through their own help desk and that gets transferred to another help desk and that gets moved to somebody else and finally they get to someone who can answer their question or solve their problem. That creates a lot of frustration and reflects poorly on the organization. [/toggle_content]
Recommended Study Material
Learn on the go.
The app designed for the modern cyber security professional.
Get it on Google Play Get it on the App Store

Our Revolution

We believe Cyber Security training should be free, for everyone, FOREVER. Everyone, everywhere, deserves the OPPORTUNITY to learn, begin and grow a career in this fascinating field. Therefore, Cybrary is a free community where people, companies and training come together to give everyone the ability to collaborate in an open source way that is revolutionizing the cyber security educational experience.

Cybrary On The Go

Get the Cybrary app for Android for online and offline viewing of our lessons.

Get it on Google Play
 

Support Cybrary

Donate Here to Get This Month's Donor Badge

 
Skip to toolbar

We recommend always using caution when following any link

Are you sure you want to continue?

Continue
Cancel