9 hours 53 minutes
Hi, guys. Welcome back. I'm Katherine McKeever. And today, in our lean six Sigma Green Belt lesson, we're gonna go over basic descriptive statistics.
So in our last module, we talked about how there are two different types of statistics. There's descriptive and there's inferential, and they both need each other descriptive tens to come first on because it sets the foundation. But they're both very valuable from a lean six sigma perspective.
At this moment, I'm going to call out explicitly that I'm not going to teach you how to perform the formulas For those of you who have had a states a six class in the past, congratulations on being able to do this by hand. For those of you who have not,
no worries are next modules actually going to be about how to use Excel to get this.
And this day and age, there are so many different ways that you can calculate these numbers. It's more important to me that you understand the concepts and the use below the numbers as it applies to lean six sigma than being able to tally it all up on a piece of paper or a tablet.
So with that there are two types of descriptive statistics.
There are measures of central tendency, which is where does my data fall on the spectrum or measures the middle? And then there's measures of dispersion, which is how wide or tall it are. My, um is my distribution were more specifically how close together or how far apart
is my data.
So both of these relate back to the number line. If you remember in a middle school mouth where we had zero in the middle and positive numbers and negative numbers, if we imagine our data as a number line or spoiler alert an X axes, Um,
what we're going to see is where does it fall and how spread out is? It s so you can look at these a couple of different ways, but the concepts are fundamentally the same. So for measures of central tendency, there are three major measures that you need to be really comfortable with.
The 1st 1 is your mean. This is your average,
that you should not surprise you. This is your get down and your go to. If you remember, our normal distribution curve is driven by our average are parameters for CPK and PPK. Driven by our average, you're going to be really comfortable with us because this is going to be probably
the most common
in addition to standard deviation with us the next life, the most common descriptive measure you're going to use, However,
take it with a grain of salt because the next most common so, like, you know, 97 2 means 98% of the time is going to be your median. This is the midpoint in your data. So if you have 20 different measures, your median is the line between 10 and 11 that cuts your data
exactly in half. Your median is really valuable when we talk about fast analysis or I say
poor man's analysis. Because if you compare your mean and your median just with those two numbers, you can determine in most cases the shape of your distribution.
So, in an ideal, perfect normal distribution, your mean equals your median. If you do these two numbers, they're exactly the same. However, if your mean does not equal your median, it gives you an indication of skewed nous so left cited or right sided or a um,
non normal distribution in your data.
We'll talk quite a bit more about this when we go into distributions in depth but mean and median. These were going to be your poor man. We're gonna look at it if you're greater than, um, if your median is greater than your mean, you have a left sided skew. And if your median is less than your mean you have a right side. It's Cube. You don't need to know this. It's on some table. Somewhere In this course,
the last item up is your mode. So your motives your most commonly occurring number.
This is not necessarily any measure. It could be your mode could be one of your outliers if you have a little cluster of outliers, um, special cause variation. But what this is very valuable for, um, even though it's kind of looked down upon and the descriptive statistics world
mode is really great for categorical data.
So if you think about if you gave your client satisfaction scores ah liker scale. So a 0 to 5 score from most satisfied to least satisfied. What mode could tell you is what are your clients most frequently answering, and that's really valuable because that gives you a sense of where you perform.
So even though it doesn't pop up in a lot of analyses,
it is valuable when you're actually wanting to count categories. Mode is what we look at in some form. When we look at our history, grams or bar charts, we're looking at the count or the frequency of specific types of measures. Measures of central tendency are good for understanding your data positioning.
So on that numbers line are you hanging out around
10 or you hanging out around 10,000 eso? It gives you where you fall in the absolute value of your measurements. Like I talked about. It's great for fast analysis. You should be able to look at mean and median at least get a sense of if it is not equal.
It means that there's something more for you to do is a greenbelt practitioner or something to understand. So
meaning median being equal. This is our ideal state measures of central tendency, not great for outliers. So we talked a little bit with our mean in our median. Mean is very susceptible
to outlier data. So if you imagine you have 20 data points each one of those data points attributes to 5% of your total number.
So we don't like it because we have this ability. It's very, very easy for the data to become skewed or the results toe present something that isn't necessarily reflective of the majority.
Thankfully, that's why we have our measures of dispersion. So our measures of dispersion shows us how far apart are our numbers? So there are three major measures in here. You have a range variance and standard deviation. In all honesty, standard deviation is going to be your get down. This is the one that we use because it's a sense of
that. It is, um,
formulaic leak get it is formulaic. Lee derived to represent the aggregate of all of the variants. So if you had really, really tight numbers and then one crazy outlier standard deviations gonna account for that and give us you give you a sense of how wide your distribution really is,
So we like it. These are get downs. Standard deviation is also called sigma Andi. If you remember, we are six sigma, so that gives you an idea of we're talking about processes that perform within six Standard deviation. 100% of your data is within six standard deviations of your mean,
which we like. That's a very, very, very tiny number. This is good eso measures of dispersion. The Big Six signal ones that you're going to use,
of course, mean because it tells you the middle of your distribution. Standard deviation tells you how tall or flat, or why do your distribution is. Dispersion is really great for normalizing out layers. So if we have a couple of crazy numbers,
it doesn't always get shifted all over the place on day release. Helpful for understanding variation which, if you remember back
Yellow belt ideologically six Sigma is about reducing variation. So this is where we start getting a really strong sense of it. Not great for understanding your distribution or where your distribution is hanging out on the number line. So
none of these values can tell me if this is a normal distribution or skewed distribution.
Nor can it give me an idea of Are we talking about a mean of 10 with a standard deviation of 10? Were we talking about a mean of 10,000 with a standard deviation of 10 very different numbers to be working with.
So with that today we went over an introduction to descriptive statistics. You know that there are central tendency and there's dispersion and that you want to get really, really comfortable from a green vote for perspective, of understanding and using them. You know, that mode meaty mode median and mean each have an application.
So we're not gonna dismiss them in favor of me.
The next video I'm going to make up for my lacking and teaching you how to do this by hand by teaching you guys how to do this in excel. So I will see you guys there.