Lesson 2 Part 1 - Configuring High Availability in a Cluster

Video Activity

Configuring High Availability in a Cluster This lesson focuses on configuring high availability in a cluster. A cluster is a collection of hosts and VMs in which high availability and distributed resource schedulers (DRS) can be enabled. DRS allows you to determine to best placement for VMs based on the available resources of hosts in the cluster. ...

Join over 3 million cybersecurity professionals advancing their career
Sign up with
Required fields are marked with an *
or

Already have an account? Sign In »

Time
14 hours 13 minutes
Difficulty
Intermediate
Video Description

Configuring High Availability in a Cluster This lesson focuses on configuring high availability in a cluster. A cluster is a collection of hosts and VMs in which high availability and distributed resource schedulers (DRS) can be enabled. DRS allows you to determine to best placement for VMs based on the available resources of hosts in the cluster. It also allows for load balancing and power management.

Video Transcription
00:04
Hello, I'm Dean Pompilio and welcome to cyber ery virtual ization configuration installation and management course. Ron Module 11. Now Lesson number two. This lesson will be talking about configuring high availability. More specifically, how to configure high availability in a cluster.
00:23
So first of all, we think about what a cluster is.
00:26
This is just a collection of hosts and viens
00:30
that we can enable. High availability can also enable distributed resource scheduler.
00:35
We'll get into what distributed resource Scheduler is a little bit later in the course,
00:39
but basically DRS allows you to
00:43
automatically determined where the best places to put the EMS
00:47
based on the available resource is of your different hosts in the cluster.
00:51
It also allows for automatic or or some level of automated load balancing.
00:58
And for power management
01:00
you can even do. I distributed
01:03
resource scheduling for your power of your host themselves. It's kinda interesting concept,
01:11
So if we create a cluster, we get this interesting icon. It sort of looks a little bit like a BM icon, but they're taller, shapes their host shapes.
01:21
So this case you've got two hosts and to be EMS in this cluster.
01:26
Now I can
01:29
talk a little bit about what some of the options are here.
01:32
For instance,
01:34
if I want to turn on high availability, I right click by cluster in the inventory goto at its settings. There's a checkbox for enabling a change. There's also a checkbox for enabling. DRS.
01:46
Once you turn on H A,
01:48
then you have to think about some of the A J settings.
01:51
For instance, we have a checkbox for host monitoring.
01:53
This could be, ah, it's enabled by default, but it could be disabled if you're doing maintenance. For instance, if I was taking one of the hosts off line to do some work on it, maybe add more memory or something of that nature, I could disable monitoring once that monitoring is disabled. Now, when I take that host,
02:13
shut that hose down.
02:15
There will be no automatic movement of the EMS.
02:17
If you want to keep the PM's running, you would move them manually from one host to another two of the motion,
02:23
and that way you could take the second host down, and that wouldn't cause any problems. So this monitoring setting allows that
02:30
to either be on automatic behavior or something that you manually control.
02:37
Next, we have admission control.
02:38
So this, by default anyway, will disallow powering on of'em if it violates
02:46
the policies that you set up.
02:47
So if I have to host into a couple of the EMS,
02:51
I basically can decide what kind of resource is I need to have in reserve in order for the second VM, let's say to power up on the other host in the cluster.
03:02
That will make more sense when you see the lab and see how the sun is work.
03:07
But if emission controls enabled,
03:09
that means that
03:10
if,
03:12
for instance, if hosted two goes away
03:16
and I want A and V M one was already running on host one. If VM two running out host one would violate the allowed amount of resource is, I'd like to keep in reserve
03:27
that when this is enabled, the VM won't be allowed to start.
03:30
If I disabled admission control now, it can let this be empower up on the other host, even if it does violate the policy I've chosen.
03:39
You might need to do that in some circumstances because that's only your only option to maintain up time.
03:46
But in general you might want to carefully consider leaving this option enabled so that you can properly control resource allocation between the hosts and your cluster.
03:57
As far as the admission control policy,
04:00
what we can do is set the number of host failures.
04:02
If I had a cluster of only two hosts and I could only tolerate a failure of one host
04:06
if I had a cluster with five hosts, I potentially could tolerate four host failures.
04:13
So that's kind of gives you an idea. And the default
04:16
setting would be one less than the number of hosts in your cluster.
04:21
But you could set that other ways to you could say, I've even if you had a five holes cluster you could only tolerate to host is failing,
04:28
so it's very configurable.
04:30
Then you have a percentage of resource is that you'd like to keep in reserve.
04:33
This is dealing with CPU and memory,
04:36
so going back to this example, if
04:39
if the VM one is running out, host one and it's using 60% of its Bram,
04:45
the M two wants to power up here and it wants to use 60% that would violate the policy. So going back to Mission Control. If the Michigan trolls enabled, the second BM would not be allowed to power on.
04:57
If I disabled mission control, then both PM's could try to use half of the memory on the host and because we can overcome it memory that might be possible
05:08
in most cases.
05:10
I can also specify fail over hosts specifically.
05:14
So let's say I've got ah, three host Cluster
05:16
and the 1st 2 hosts are the same. They're relatively low powered.
05:21
I might have 1/3 host,
05:27
which actually would appear here, but there's really no room to draw it. The third host might have a lot more memory and processing capacity than hosts one and two
05:36
S O. If there's a problem with Host one or two, I could specify always fail over a host three because it's got the most capacity.
05:43
So you've got a lot of really flexible options here for deciding how to deal with failures.
05:49
Uh huh.
05:50
Now
05:53
we have a little note here about slot size calculation. You'll see this in the lab as well.
05:58
But if I
05:59
look at a particular host and look at its resource allocation tab
06:03
or in other ways, you can see this, too. You can see how many slots
06:08
your host can support
06:10
or how many. How big the slots are,
06:13
and what this means is a given. VM
06:15
has a certain amount of CPU and memory overhead,
06:19
and ah, typical VM will use one slot or two slots or three slots, depending on what it's resource allocation requirements are.
06:28
So that's the way that the the resource is on your host are subdivided
06:32
so that the admission control and other H A algorithms can figure out where to move things around and what's possible. A ce faras, which hosts get which of'em is when there's a problem,
06:45
so we'll look at that in the lab. That'll make a little more sense when you see how that works and will change some settings and see how the slot size changes. Based on the resource allocation settings that we're modifying
06:59
now regarding the virtual machine configuration options,
07:01
this could be done at the cluster level or the VM level.
07:05
For instance, I can start with the restart priority of a virtual machine.
07:11
This could be disabled, which means that all the PM's will start at the same time
07:15
or I can pick low, medium or high priority on a pervy um basis,
07:20
and that's basically gives me a way to order the relative startup of the EMS. You might give high priority, for instance, to your most critical virtual machines, and then medium or low priority to those things that you don't need right away. But you still want to power up eventually.
07:36
If you get high priority to your critical, the EMS, then they get the most resource is and should boot the quickest.
07:44
So another way to adjust the prioritization of your boot time when a V M has to be restarted.
07:50
In the last lesson, we talked a little bit about the host isolation response.
07:55
If I've got a cluster built
07:57
and one of the hosts and the cluster loses its connection to the management network,
08:01
now we decide what has to happen here.
08:05
I could leave the host powered on.
08:07
If the mantra network is the only thing that's failed,
08:11
then leaving the host power down is a good idea because that means the PM's will continue to run
08:16
and we can maintain that status until the the management network gets restored. Maybe
08:20
cable got unplugged accidentally or network interface has
08:26
died on the switch, and maybe someone needs to move the two different switch port. These are things that might happen, right?
08:31
We could also power off the host if it gets isolated.
08:37
That's more of a drastic response, but it might make sense in certain circumstances to do that, or we can shut the host down.
08:45
It's more of a graceful shutdown,
08:46
and that might also make sense if the if the PM's don't get automatically moved somewhere, maybe what? We want to shut the host down and try to troubleshoot the problem. You bring it back up
08:56
so you get some flexibility there as well.
09:01
Then, within the cluster configuration screen, we also have another link for Vienna monitoring.
09:07
So we have some options. If VM were tools, if the heartbeat stops that Veum rituals provides,
09:13
then we can reset the BM.
09:16
Maybe maybe the more tools has crashed or stop for some other reason. Maybe someone accidentally shut it down
09:22
so we can restart the BM and that should bring up the tools when every boots
09:28
you can also adjust the monitoring sensitivity.
09:31
We're a little slider here from low to high
09:33
you might. You might want to adjust this and play around with different settings in your environment to find setting. That gives you the response you want without having the
09:43
The restarting of the EMS happened under circumstances that you don't wish it to happen.
09:48
Leaving in the middle might be a good place to begin before you start doing your testing.
09:54
Then within out of individual VM basis, I can adjust
10:00
how I want this BM to restart. I can use the cluster settings,
10:03
which means that I've got a global setting for all the PM's in that cluster.
10:07
Or you can go high, medium or low
10:09
for the priority for into individual VM or disable that.
10:15
So once you see the VM is running on your cluster in this configuration window would make sense is to right click
10:20
and select from the drop down menu. What your BM setting restart priority should be,
10:28
Then we have to think about some of the other the best practices here.
10:31
One thing to consider is having a redundant heartbeat network,
10:35
so the harpy gets sent between the master and slave hosts.
10:37
That way, they can each keep track of the other hosts on the cluster to know if that host is healthy and operational.
10:46
If ah host determines a determined to have a failure, this could be because no heartbeats are being received. The ping response is not working, and there's no heartbeat in the in the data store because you can your storage heartbeats as well.
11:01
In the lab, we use an NFS partition for that purpose.
11:05
So the heartbeat Network takes care of these heartbeats, signals and our packets network packets.
11:13
Oh, you do have to have a V M Colonel Port configured for their for management network. In order for this to work
11:18
and we'll see how that gets set up in the lab.
11:20
And if I can create a
11:22
a redundant harpy network now, I can have a even more reliable way to detect failures.
11:31
So a couple ways, you can do this. You can use Nick teeming
11:33
for the for the Heartbeat Network. Now I've got multiple physical interfaces in case I lose one of those. The other interfaces that remain can continue to send those heartbeats signals back and forth between all of the hosts,
11:45
or I can set up multiple Harper. The networks
11:48
and have a nick teeming and more redundant networks. I might have to nix for this harpy network one another to Nick's for Heartbeat Network. To Not I can tolerate three failures of those network cards and still maintain my heartbeat network.
12:03
All right, stay tuned for a part two of lesson to where we talk about configuring a J. Thank you.
Up Next